run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
37.11k stars 5.33k forks source link

[Bug]: Example in the Semantic Retrieval notebook does not work #9749

Closed simonff closed 11 months ago

simonff commented 11 months ago

Bug Description

Hi,

I'm following https://github.com/run-llama/llama_index/blob/main/docs/examples/managed/GoogleDemo.ipynb

I configured service account auth, enabled the Google API, downloaded the paul graham doc and tried to index it, but got:

Newly created corpus ID is 2c40d915-0c18-4210-a412-a418ba6eac7f.
WARNING:llama_index.vector_stores.google.generativeai.genai_extension:Document e480262f-bae5-4f54-aecd-750c680e4ffe in corpus 2c40d915-0c18-4210-a412-a418ba6eac7f not found: 404 Documents does not exist
Response is Empty Response

The documents variable seems to be set correctly

Version

0.9.23

Steps to Reproduce

Run the example notebook

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 11 months ago

🤖

Hello @simonff! I'm Dosu, a friendly bot here to assist you with your LlamaIndex issues, answer any questions, and even guide you in becoming a contributor. While we wait for a human maintainer to chime in, I'm here to help you get started.

The "Document not found: 404 Documents does not exist" error typically occurs when the document you're trying to index doesn't exist in the specified location or the document is not properly loaded into the system.

In the provided context, you are loading documents from the directory "./data/paul_graham/" using the SimpleDirectoryReader:

reader = SimpleDirectoryReader("./data/paul_graham/")
documents = reader.load_data()

Then, you are parsing these documents into nodes and adding them to the docstore:

nodes = SentenceSplitter().get_nodes_from_documents(documents)
storage_context.docstore.add_documents(nodes)

If you're getting a "Document not found: 404 Documents does not exist" error, it could be due to one of the following reasons:

  1. The document doesn't exist in the "./data/paul_graham/" directory. You can verify this by checking the contents of the directory.

  2. The document is not being properly loaded by the SimpleDirectoryReader. You can verify this by checking the contents of the 'documents' variable after calling reader.load_data().

  3. The nodes are not being properly added to the docstore. You can verify this by checking the contents of the docstore after calling storage_context.docstore.add_documents(nodes).

Please check these points and ensure that your documents are being correctly loaded and added to the docstore. You can refer to the DynamoDBDocstoreDemo.ipynb for more details on how to properly load and add documents to the docstore.

Also, please ensure that the documents you are trying to load are in a format supported by the SimpleDirectoryReader. The supported formats include HWP, PDF, DOCX, PPTX, PPT, PPTM, JPEG, PNG, JPG, MP3, MP4, CSV, EPUB, MD, MBOX, and IPYNB files. If a file's extension is not in the DEFAULT_FILE_READER_CLS dictionary, the SimpleDirectoryReader will attempt to read the file as a standard text file. You can refer to the llama_index/readers/file/base.py for more details on the supported formats.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

-Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

simonff commented 11 months ago

Eventually this worked - I think my service account was too new and not visible at first