tonykipkemboi / ollama_pdf_rag

A demo Jupyter Notebook showcasing a simple local RAG (Retrieval Augmented Generation) pipeline to chat with your PDFs.
MIT License
178 stars 84 forks source link

ValueError: Expected IDs to be a non-empty list, got 0 IDs #7

Closed michalcharvat closed 2 months ago

michalcharvat commented 3 months ago

Hi, do you have any hint about the error "ValueError: Expected IDs to be a non-empty list, got 0 IDs"? It does not matter what PDF you upload - attached or any other, I always get that error.


2024-06-26 17:12:05 - INFO - PDF text extraction failed, skip text extraction...
2024-06-26 17:12:06 - INFO - Document split into chunks
2024-06-26 17:12:06 - INFO - Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
OllamaEmbeddings: 0it [00:00, ?it/s]
2024-06-26 17:12:06.669 Uncaught app exception
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 589, in _run_script
    exec(code, module.__dict__)
  File "/Users/michal/Downloads/ollama_pdf_rag-main/streamlit_app.py", line 278, in <module>
    main()
  File "/Users/michal/Downloads/ollama_pdf_rag-main/streamlit_app.py", line 223, in main
    st.session_state["vector_db"] = create_vector_db(file_upload)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/michal/Downloads/ollama_pdf_rag-main/streamlit_app.py", line 89, in create_vector_db
    vector_db = Chroma.from_documents(
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/langchain_community/vectorstores/chroma.py", line 790, in from_documents
    return cls.from_texts(
           ^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/langchain_community/vectorstores/chroma.py", line 754, in from_texts
    chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)
  File "/opt/homebrew/lib/python3.12/site-packages/langchain_community/vectorstores/chroma.py", line 325, in add_texts
    self._collection.upsert(
  File "/opt/homebrew/lib/python3.12/site-packages/chromadb/api/models/Collection.py", line 296, in upsert
    ) = self._validate_and_prepare_upsert_request(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/chromadb/api/models/CollectionCommon.py", line 515, in _validate_and_prepare_upsert_request
    ) = self._validate_embedding_set(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/chromadb/api/models/CollectionCommon.py", line 163, in _validate_embedding_set
    valid_ids = validate_ids(maybe_cast_one_to_many_ids(ids))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/chromadb/api/types.py", line 232, in validate_ids
    raise ValueError(f"Expected IDs to be a non-empty list, got {len(ids)} IDs")
ValueError: Expected IDs to be a non-empty list, got 0 IDs```
tonykipkemboi commented 3 months ago

It seems the extraction process fails PDF text extraction failed, skip text extraction... so there are no vector embeddings created. The code you modified also skips the process even when there's no text to embed. Can you share the code you modified?

tonykipkemboi commented 2 months ago

@michalcharvat, can you try updating your requirements.txt file again with the current contents I have in the repo file. I realized that I did not add some of the required dependencies initially.