Open louria opened 1 year ago
i also got the same error
same
also seeing this:
│ ❱ 1 import_docs() │
│ 2 │
│ │
│ in import_docs:33 │
│ │
│ 30 │ │
│ 31 │ documents = text_splitter.split_documents(langchain_documents) │
│ 32 │ embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, ) │
│ ❱ 33 │ vectorstore = FAISS.from_documents(documents, embeddings) │
│ 34 │ │
│ 35 │ # Save vectorstore │
│ 36 │ with open("./openlegalquerybase/embedded_docs.pkl", "wb") as file: │
│ │
│ /home/cw/miniconda3/lib/python3.10/site-packages/langchain/vectorstores/base.py:116 in │
│ from_documents │
│ │
│ 113 │ │ """Return VectorStore initialized from documents and embeddings.""" │
│ 114 │ │ texts = [d.page_content for d in documents] │
│ 115 │ │ metadatas = [d.metadata for d in documents] │
│ ❱ 116 │ │ return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs) │
│ 117 │ │
│ 118 │ @classmethod │
│ 119 │ @abstractmethod │
│ │
│ /home/cw/miniconda3/lib/python3.10/site-packages/langchain/vectorstores/faiss.py:345 in │
│ from_texts │
│ │
│ 342 │ │ │ │ faiss = FAISS.from_texts(texts, embeddings) │
│ 343 │ │ """ │
│ 344 │ │ embeddings = embedding.embed_documents(texts) │
│ ❱ 345 │ │ return cls.__from(texts, embeddings, embedding, metadatas, **kwargs) │
│ 346 │ │
│ 347 │ @classmethod │
│ 348 │ def from_embeddings( │
│ │
│ /home/cw/miniconda3/lib/python3.10/site-packages/langchain/vectorstores/faiss.py:307 in __from │
│ │
│ 304 │ │ **kwargs: Any, │
│ 305 │ ) -> FAISS: │
│ 306 │ │ faiss = dependable_faiss_import() │
│ ❱ 307 │ │ index = faiss.IndexFlatL2(len(embeddings[0])) │
│ 308 │ │ index.add(np.array(embeddings, dtype=np.float32)) │
│ 309 │ │ documents = [] │
│ 310 │ │ for i, text in enumerate(texts): │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: list index out of range
same error
It is because document is not created on your local, may be your html file is not downloaded. Check the document length.
Is there a resolution to this problem? I am also experiencing this in a larger doc.
same problem here...
same problem
I am also getting this error.
I experienced the same issue using the DirectoryLoader
until I set the loader_cls
parameter.
loader = DirectoryLoader("C:\\directory",loader_cls=TextLoader,
recursive=True, show_progress=True,
use_multithreading=True,max_concurrency=8)
raw_documents = loader.load()
_Don't forget to update the import statement: from langchain.document_loaders import DirectoryLoader,TextLoader
_
I also encountered this mistake. Because the file I loaded was empty. When I added some text, the problem was solved.
same error
Same exact error
yes, the same error
I encountered the same error.
I first tried to downgrade to langchain==0.0.120 and replace the docs URL but it still doesn't work. ReadTheDocsLoader
is trying to parse the HTML, but the structure has since changed so it couldn't pares and load the docs.
If you want to try the demo, here is a quick workaround:
# Just manually copy some document text,
# and modify `ingest.py` a little so it doesn't require the `ReadTheDocsLoader`:
text = """the langchain docs copied from the website"""
def ingest_docs():
doc = Document(page_content=text, metadata={"source": str("https://api.python.langchain.com/en/latest/api_reference.html")})
raw_documents = [doc]
#...
Same error . My pdf is 92 pages for the source document .Used Faiss vectorstore. langchain==0.0.251
Using this on a bunch of pdfs: loader = DirectoryLoader(DATA_PATH_PDF, glob='*.pdf', loader_cls=PyPDFLoader, show_progress=True)
They load without error but then get the same error: index = faiss.IndexFlatL2(len(embeddings[0])) IndexError: list index out of range
i get the same error when creating a new vectorstore
I think I might have found the issue. One of the pdfs I was reading did not have any text (it was scanned text) and the reader read the page_content text as blank string '' for all pages. You may need to do some OCR on your pdf to get the text out first before sending to FAISS
I got the same error when creating a new FAISS vectorstore with texts and embeddings. I resolved it by passing a list(zip(texts_list, embeddings_list)) instead of zip(texts_list, embeddings_list).
The following example in the FAISS API document in LangChain does not work and will lead to the "out of index" error. https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.faiss.FAISS.html?highlight=faiss#langchain.vectorstores.faiss.FAISS.from_embeddings
from langchain import FAISS
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
text_embeddings = embeddings.embed_documents(texts)
text_embedding_pairs = zip(texts, text_embeddings)
faiss = FAISS.from_embeddings(text_embedding_pairs, embeddings)
The following code works for me:
from langchain import FAISS
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
text_embeddings = embeddings.embed_documents(texts)
text_embedding_pairs = zip(texts, text_embeddings)
text_embedding_pairs_list = list(text_embedding_pairs)
faiss = FAISS.from_embeddings(text_embedding_pairs_list, embeddings)
Re-check your documents. They might need OCR recognition before being read as text. They might not be readable by your current PDF loader. So there was no text in the first embeddings[0] to actually retrieve from, thus the error. That was my problem.
I had the same error thrown again function scrape_kbs
:
File "KBScrapeVectorize.py", line 130, in scrape_kbs
db = FAISS.from_documents(docs, embeddings)
...
index = faiss.IndexFlatL2(len(embeddings[0]))
IndexError: list index out of range
This function was supposed to scrape relevant details from some knowledge base articles, create documents and use them to create a vectorstore. However, when I checked the documents created from the scraped KBs, there were all empty as the auth had failed to the site. Once I corrected the auth and documents had some content, FAISS.from_documents
worked fine.
Can anyone got solution? please help me to solve the error
File "/usr/local/lib/python3.11/site-packages/langchain/vectorstores/faiss.py", line 347, in __from index = faiss.IndexFlatL2(len(embeddings[0]))
~~^^^ IndexError: list index out of range
Not able to solve the error please help me to solve the error
I posted what I think is the answer already. You have an empty document that you are trying to put into FAISS. Remove any docs that are empty
On Oct 14, 2023, at 9:49 AM, MuvvaThriveni @.***> wrote:
File "/usr/local/lib/python3.11/site-packages/langchain/vectorstores/faiss.py", line 347, in __from index = faiss.IndexFlatL2(len(embeddings[0]))
~~^^^ IndexError: list index out of rangeNot able to solve the error please help me to solve the error
— Reply to this email directly, view it on GitHub https://github.com/langchain-ai/chat-langchain/issues/68#issuecomment-1763037422, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIHCIQ3G53VQSWTEM6CU3DX7K7BTANCNFSM6AAAAAAXQUITZI. You are receiving this because you commented.
Thanks for reply
here you can see what i am doing I am loading some urls and then splitting the data and creating embeddings using openai and lastly using faiss to store my embeddings but facing the list index out of range. could you please help me to solve the problem....
loader = UnstructuredURLLoader(urls=urls) data = loader.load()
# split data
text_splitter = CharacterTextSplitter(separator='\n', chunk_size=1000,
chunk_overlap=200) docs = text_splitter.split_documents(data)
# embeddings
embeddings = OpenAIEmbeddings()
# save embeddings to FAISS index
vectorstore_openai = FAISS.from_embeddings(docs,embeddings)
On Sun, 15 Oct 2023 at 01:26, Skisquaw @.***> wrote:
I posted what I think is the answer already. You have an empty document that you are trying to put into FAISS. Remove any docs that are empty
On Oct 14, 2023, at 9:49 AM, MuvvaThriveni @.***> wrote:
File "/usr/local/lib/python3.11/site-packages/langchain/vectorstores/faiss.py", line 347, in __from index = faiss.IndexFlatL2(len(embeddings[0]))
Not able to solve the error please help me to solve the error — Reply to this email directly, view it on GitHub < https://github.com/langchain-ai/chat-langchain/issues/68#issuecomment-1763037422>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABIHCIQ3G53VQSWTEM6CU3DX7K7BTANCNFSM6AAAAAAXQUITZI>. You are receiving this because you commented.
— Reply to this email directly, view it on GitHub https://github.com/langchain-ai/chat-langchain/issues/68#issuecomment-1763163697, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2OUNOLKOASD2R7WV23IOHTX7LU6JAVCNFSM6AAAAAAXQUITZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRTGE3DGNRZG4 . You are receiving this because you commented.Message ID: @.***>
Yes I found docs are empty... Now I am facing other issue
023-10-17 21:43:14.139 Uncaught app exception Traceback (most recent call last): File "C:\VS_CODE\langchain\Equity Research Analysis\venv\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 565, in _run_script exec(code, module.dict) File "C:\VS_CODE\langchain\Equity Research Analysis\app.py", line 52, in
If anyone has this trouble, try iterating each chunk whether that check has the value or not. I checked that if the given chunk is empty, it fails like this (exit 0)
index = faiss.IndexFlatL2(len(embeddings[0])) IndexError: list index out of range
This looks like an ERROR def merge_from in faiss
Also Shows AttributeError: type object 'FAISS' has no attribute 'IndexFlatL2' when tried to print. IndexFlatL2.
I get what a lot of you are saying about the files being empty but I am using the same files he is using in the video and still the same error, I also already try the suggestionsin the things trend and still the same
@MuvvaThriveni did you get it how to solve this error.Pls help me
I had the same error but when I changed the pdf file. The issue was resolved. So I think all pdfs can't be processed so OCR based tools will come handy.
Same error
Those who are here from the codebasics LLM video
you can use this code -> if process_url_clicked:
loader = UnstructuredURLLoader(urls=urls)
main_placefolder.text("Data Loading...Started...✅✅✅")
data = loader.load()
if data:
# split data
text_splitter = RecursiveCharacterTextSplitter(
separators=['\n\n', '\n', '.', ','],
chunk_size=1000
)
main_placefolder.text("Text Splitter...Started...✅✅✅")
docs = text_splitter.split_documents(data)
if docs:
# create embeddings and save it to FAISS index
embeddings = OpenAIEmbeddings()
vectorstore_openai = FAISS.from_documents(docs, embeddings)
main_placefolder.text("Embedding Vector Started Building...✅✅✅")
time.sleep(2)
# Save the FAISS index to a pickle file
vectorstore_openai.save_local(file_path)
else:
main_placefolder.text("Text Splitter produced empty documents. Check data.")
else:
main_placefolder.text("Data loading failed. Check URLs or network connection.")
Also Note: in the Requirements.txt folder there is python-magic and libmagic which is not required, so uninstall them from pycharm by doing pip uninstall libmagic pip uninstall python-magic pip uninstall python-magic-bin pip install python-magic-bin==0.4.14
Now run the code. Hope it'll be helpful
If you get this error when using "FAISS.from_documents(docs, emb_model)", please make sure that input docs is not empty. Hope it helps !
my docs is not empty . but its showing error in TypeError: cannot pickle '_thread.RLock' object like this
my docs is not empty . but its showing error in TypeError: cannot pickle '_thread.RLock' object like this
just downgrade your open ai and langchain versions to the specified ones.
then you may encounter an error where it says davinci -002 is a deprecated version then just change the model as
llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0.9, max_tokens=500)
you should be fine and good to go... hope this helps
Once you installed all necessary libraries. If same error still persist , it is because of no data in the data folder.
File "/usr/local/lib/python3.11/site-packages/langchain/vectorstores/faiss.py", line 347, in __from index = faiss.IndexFlatL2(len(embeddings[0]))
~~^^^ IndexError: list index out of range
yaa , i got the solution ... Try using pdf with less page & check , it won't go out of index .,
Those who are here from the codebasics LLM video
you can use this code -> if process_url_clicked: # load data loader = UnstructuredURLLoader(urls=urls) main_placefolder.text("Data Loading...Started...✅✅✅") data = loader.load()
if data: # split data text_splitter = RecursiveCharacterTextSplitter( separators=['\n\n', '\n', '.', ','], chunk_size=1000 ) main_placefolder.text("Text Splitter...Started...✅✅✅") docs = text_splitter.split_documents(data) if docs: # create embeddings and save it to FAISS index embeddings = OpenAIEmbeddings() vectorstore_openai = FAISS.from_documents(docs, embeddings) main_placefolder.text("Embedding Vector Started Building...✅✅✅") time.sleep(2) # Save the FAISS index to a pickle file vectorstore_openai.save_local(file_path) else: main_placefolder.text("Text Splitter produced empty documents. Check data.") else: main_placefolder.text("Data loading failed. Check URLs or network connection.")
Also Note: in the Requirements.txt folder there is python-magic and libmagic which is not required, so uninstall them from pycharm by doing pip uninstall libmagic pip uninstall python-magic pip uninstall python-magic-bin pip install python-magic-bin==0.4.14
Now run the code. Hope it'll be helpful
dude love you!!!! your code solved my index out of list range error but now it is showing error for pickle file being corrupted or incompatible here is it UnpicklingError: invalid load key, '\xef'.
Traceback:
File "C:\Users\Param Jethwa\AppData\Roaming\Python\Python311\site-packages\streamlit\runtime\scriptrunner\exec_code.py", line 88, in exec_func_with_error_handling
result = func()
^^^^^^
File "C:\Users\Param Jethwa\AppData\Roaming\Python\Python311\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 590, in code_to_exec
exec(code, module.dict)
File "C:\Users\Param Jethwa\langchain\2_news_research_tool_project\main.py", line 67, in
I have changed the api key in env file but it's showing wrong key always please someone know how resolve it please let me know
Thanks for reply here you can see what i am doing I am loading some urls and then splitting the data and creating embeddings using openai and lastly using faiss to store my embeddings but facing the list index out of range. could you please help me to solve the problem.... loader = UnstructuredURLLoader(urls=urls) data = loader.load() # split data text_splitter = CharacterTextSplitter(separator='\n', chunk_size=1000, chunk_overlap=200) docs = text_splitter.split_documents(data) # embeddings embeddings = OpenAIEmbeddings() # save embeddings to FAISS index vectorstore_openai = FAISS.from_embeddings(docs,embeddings) … On Sun, 15 Oct 2023 at 01:26, Skisquaw @.> wrote: I posted what I think is the answer already. You have an empty document that you are trying to put into FAISS. Remove any docs that are empty > On Oct 14, 2023, at 9:49 AM, MuvvaThriveni @.> wrote: > > > File "/usr/local/lib/python3.11/site-packages/langchain/vectorstores/faiss.py", line 347, in __from index = faiss.IndexFlatL2(len(embeddings[0]))
~~^^^ IndexError: list index out of range > > Not able to solve the error please help me to solve the error > > — > Reply to this email directly, view it on GitHub < #68 (comment)>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABIHCIQ3G53VQSWTEM6CU3DX7K7BTANCNFSM6AAAAAAXQUITZI>. > You are receiving this because you commented. > — Reply to this email directly, view it on GitHub <#68 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2OUNOLKOASD2R7WV23IOHTX7LU6JAVCNFSM6AAAAAAXQUITZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRTGE3DGNRZG4 . You are receiving this because you commented.Message ID: @.***> I got a error in that api key is wrong I also changed the key in env but it's showing the same please let me know if you know the answer
File "/usr/local/lib/python3.11/site-packages/langchain/vectorstores/faiss.py", line 347, in __from
index = faiss.IndexFlatL2(len(embeddings[0]))