run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.91k stars 5.28k forks source link

[Bug]: Using Vector Store Index with Existing Weaviate Vector Store #14857

Open iiitmahesh opened 4 months ago

iiitmahesh commented 4 months ago

Bug Description

Using Vector Store Index with Existing Weaviate Vector Store

https://docs.llamaindex.ai/en/stable/examples/vector_stores/existing_data/weaviate_existing_data/ Error:

ValueError: Node content not found in metadata dict.

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
Cell In[28], [line 13](vscode-notebook-cell:?execution_count=28&line=13)
      [1](vscode-notebook-cell:?execution_count=28&line=1) # set Logging to DEBUG for more detailed outputs
      [2](vscode-notebook-cell:?execution_count=28&line=2) # question = "Effect of etafenone on total and regional myocardial blood flow."
      [3](vscode-notebook-cell:?execution_count=28&line=3) 
   (...)
      [9](vscode-notebook-cell:?execution_count=28&line=9) # pprint_source_node(nodes[0])
     [10](vscode-notebook-cell:?execution_count=28&line=10) # nodes[0].node.metadata
     [12](vscode-notebook-cell:?execution_count=28&line=12) query_engine = loaded_index.as_query_engine(similarity_top_k=2)
...
    [133](https://file+.vscode-resource.vscode-cdn.net/Users/learn_AL_ML/~/Documents/Workspace/github/general/.conda/lib/python3.10/site-packages/llama_index/vector_stores/weaviate/utils.py:133)         embedding=embedding,
    [134](https://file+.vscode-resource.vscode-cdn.net/Users/learn_AL_ML/~/Documents/Workspace/github/general/.conda/lib/python3.10/site-packages/llama_index/vector_stores/weaviate/utils.py:134)     )
    [135](https://file+.vscode-resource.vscode-cdn.net/Users/learn_AL_ML/~/Documents/Workspace/github/general/.conda/lib/python3.10/site-packages/llama_index/vector_stores/weaviate/utils.py:135) return node

KeyError: 'id'

My code

index_name = "MedicalArticle"
vector_store = WeaviateVectorStore(weaviate_client=client, index_name=index_name, text_key="abstract")
loaded_index = VectorStoreIndex.from_vector_store(vector_store)

query_engine = loaded_index.as_query_engine(similarity_top_k=2)
response = query_engine.query("Effect of etafenone on total and regional myocardial blood flow?")

print(response)

Version

0.10.56

Steps to Reproduce

Try this link

https://docs.llamaindex.ai/en/stable/examples/vector_stores/existing_data/weaviate_existing_data/

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 4 months ago

To resolve the KeyError: 'id' when querying an existing Weaviate Vector Store using the Vector Store Index, ensure that each node being queried has the 'id' attribute properly set. This error typically occurs when the expected 'id' key is missing in the response or the data structure being processed.

Here's a solution to ensure the node_id is correctly set:

  1. Ensure Nodes Have node_id Attribute: When adding nodes to the Weaviate vector store, make sure each BaseNode instance has a valid node_id attribute.

    from llama_index.legacy.schema import BaseNode
    
    # Example node creation
    node1 = BaseNode(node_id="123", embedding=[0.1, 0.2, 0.3], text="Example text 1")
    node2 = BaseNode(node_id="456", embedding=[0.4, 0.5, 0.6], text="Example text 2")
    
    # Add nodes to the Weaviate vector store
    weaviate_vector_store = WeaviateVectorStore(weaviate_client=client, index_name="MedicalArticle", text_key="abstract")
    weaviate_vector_store.add([node1, node2])
  2. Querying the Vector Store: Ensure that the node_ids are correctly specified in the query.

    query_engine = loaded_index.as_query_engine(similarity_top_k=2)
    response = query_engine.query("Effect of etafenone on total and regional myocardial blood flow?")
    print(response)
  3. Loading the Index: When loading the index from the vector store, ensure the embeddings are generated and stored correctly.

    def retrieve_index_from_vector_store(index_name: str) -> VectorStoreIndex:
       client = weaviate.Client("http://localhost:8080")
       model_name = "sentence-transformers/all-MiniLM-L6-v2"
       embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name=model_name))
       service_context = ServiceContext.from_defaults(embed_model=embed_model)
       vector_store = WeaviateVectorStore(weaviate_client=client, index_name=index_name, text_key="abstract")
       storage_context = StorageContext.from_defaults(vector_store=vector_store)
       documents = WeaviateReader("http://localhost:8080").load_data(
           class_name=index_name + "_Node",
           properties=["id", "text", "embedding"]
       )
       return VectorStoreIndex.from_documents(documents=documents, storage_context=storage_context, service_context=service_context)
    
    loaded_index = retrieve_index_from_vector_store("MedicalArticle")

By ensuring that the node_id is correctly set and used in both the addition and querying of nodes, you should be able to resolve the KeyError: 'id' issue [1][2].

To continue talking to Dosu, mention @dosu.

logan-markewich commented 3 months ago

It kind of assumes your vector db has an ID key. A PR would be needed to update this vector store to better work with existing dbs created outside of llama-index

yasuyukitera commented 3 weeks ago

This phenomenon seems very similar to that reported in #13787. I believe #13787 has been resolved by PR #16671, so I hope this will be resolved as well. Please close this issue if necessary.

yasuyukitera commented 3 weeks ago

Reply to self. #13787 may not fix it. I will take the time to check myself.