langchain-ai / chat-langchain

https://chat.langchain.com
MIT License
5k stars 1.16k forks source link

ingest, embedding, faiss error #106

Open lewsutt29 opened 11 months ago

lewsutt29 commented 11 months ago

$ ./ingest.sh --2023-07-29 15:07:25-- https://langchain.readthedocs.io/en/latest/ Resolving langchain.readthedocs.io (langchain.readthedocs.io)... 104.17.32.82, 104.17.33.82, 2606:4700::6811:2152, ... Connecting to langchain.readthedocs.io (langchain.readthedocs.io)|104.17.32.82|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://api.python.langchain.com/en/latest/ [following] --2023-07-29 15:07:25-- https://api.python.langchain.com/en/latest/ Resolving api.python.langchain.com (api.python.langchain.com)... 104.17.32.82, 104.17.33.82, 2606:4700::6811:2152, ... Connecting to api.python.langchain.com (api.python.langchain.com)|104.17.32.82|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: ‘langchain.readthedocs.io/en/latest/index.html’

langchain.readthedocs.io/en/latest/in [ <=> ] 1.57K --.-KB/s in 0s

2023-07-29 15:07:25 (30.1 MB/s) - ‘langchain.readthedocs.io/en/latest/index.html’ saved [1612]

FINISHED --2023-07-29 15:07:25-- Total wall clock time: 0.3s Downloaded: 1 files, 1.6K in 0s (30.1 MB/s) /home/lichen/environments/langchain_documentation_chatbot/venv/lib/python3.8/site-packages/langchain/document_loaders/readthedocs.py:48: GuessedAtParserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 48 of the file /home/lichen/environments/langchain_documentation_chatbot/venv/lib/python3.8/site-packages/langchain/document_loaders/readthedocs.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.

_ = BeautifulSoup( /home/lichen/environments/langchain_documentation_chatbot/venv/lib/python3.8/site-packages/langchain/document_loaders/readthedocs.py:75: GuessedAtParserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 75 of the file /home/lichen/environments/langchain_documentation_chatbot/venv/lib/python3.8/site-packages/langchain/document_loaders/readthedocs.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.

soup = BeautifulSoup(data, self.bs_kwargs) Embeddings: client=<class 'openai.api_resources.embedding.Embedding'> model='text-embedding-ada-002' deployment='text-embedding-ada-002' openai_api_version='' openai_api_base='' openai_api_type='' openai_proxy='' embedding_ctx_length=8191 openai_api_key='...apikeyhere...' openai_organization='' allowed_special=set() disallowed_special='all' chunk_size=1000 max_retries=6 request_timeout=None headers=None tiktoken_model_name=None show_progress_bar=False model_kwargs={} Traceback (most recent call last): File "ingest.py", line 34, in ingest_docs() File "ingest.py", line 26, in ingest_docs vectorstore = FAISS.from_documents(documents, embeddings) File "/home/lichen/environments/langchain_documentation_chatbot/venv/lib/python3.8/site-packages/langchain/vectorstores/base.py", line 413, in from_documents return cls.from_texts(texts, embedding, metadatas=metadatas, kwargs) File "/home/lichen/environments/langchain_documentation_chatbot/venv/lib/python3.8/site-packages/langchain/vectorstores/faiss.py", line 578, in from_texts return cls.from( File "/home/lichen/environments/langchain_documentation_chatbot/venv/lib/python3.8/site-packages/langchain/vectorstores/faiss.py", line 522, in from index = faiss.IndexFlatL2(len(embeddings[0])) IndexError: list index out of range

sharrajesh commented 11 months ago

+1

ionescofung commented 11 months ago

+1

ailyfeng commented 11 months ago

me too

tymrtn commented 11 months ago

me three

East196 commented 10 months ago

the website is null https://langchain.readthedocs.io/en/latest/