Open mdciri opened 2 months ago
I found a similar closed issue that might help you resolve this problem: [Bug]: AttributeError: 'TextNode' object has no attribute 'get_doc_id' [1]. The issue arises because the TextNode
class does not have the get_doc_id()
method, which is required by the OpensearchVectorClient
class when indexing results.
To resolve this, you need to ensure that the TextNode
class implements the get_doc_id()
method. Here is an example of how you can add this method to the TextNode
class:
class TextNode(BaseNode):
def __init__(self, text, metadata=None, id_=None, start_char_idx=None, end_char_idx=None, relationships=None, extra_info=None):
super().__init__(text, metadata, id_, start_char_idx, end_char_idx, relationships, extra_info)
self.node_id = id_ if id_ else str(uuid.uuid4())
def get_doc_id(self):
return self.node_id
By adding the get_doc_id()
method to the TextNode
class, you ensure that the OpensearchVectorClient
can correctly retrieve the document ID when indexing nodes [2].
To continue talking to Dosu, mention @dosu.
I am sorry, I found the error to create the index. It was really easy to catch that I was using wrongly .from_documents()
. Anyway, now, that I created the index, it does not work when I do:
from llama_index.core.retrievers import AutoMergingRetriever
base_retriever = automerging_index.as_retriever(similarity_top_k=5)
retriever = AutoMergingRetriever(
vector_retriever=base_retriever,
storage_context=automerging_index.storage_context,
verbose=True
)
query_str = "<my_str_query>"
nodes = retriever.retrieve(query_str)
because I get:
INFO:opensearch:POST http://localhost:9200/vector-index-test/_search [status:200 request:0.052s]
----> nodes = retriever.retrieve(query_str)
ValueError: doc_id f242477a-6be5-4905-bf20-33125fa0c876 not found.
it works instead if I simply do:
base_retriever.retrieve(query_str)
how is that possible?
The AutoMergingRetriever
returns a "doc_id not found" error because it relies on a hierarchical structure of nodes stored in a docstore
. If the docstore
does not contain the necessary nodes or if there is a mismatch in the node IDs, the AutoMergingRetriever
will not be able to find the required documents. On the other hand, base_retriever.retrieve(query_str)
works correctly because it directly retrieves from the vector store containing the leaf-level nodes, which are correctly indexed [1][2].
To continue talking to Dosu, mention @dosu.
Bug Description
I am trying to save my indexVectorStore in Opensearch, but it does not accept the nodes I created with the automerging retriever strategy. The leaf nodes created with the
HierarchicalNodeParser()
areTextNode()
type and it does not have, obviously, theget_doc_id()
method.Version
0.10.68
Steps to Reproduce
Relevant Logs/Tracbacks