Open LikhithRishi opened 3 weeks ago
The error "Query id 40b9d01f-b211-413d-b3d4-a799eff700d6 not found in either retriever_dict or query_engine_dict" indicates that the query ID is not present in the dictionaries used by the RecursiveRetriever
. Here are some steps to ensure consistency and synchronization between the vector_store
, retriever_dict
, and query_engine_dict
in the DenseXRetrievalPack
class:
Ensure All Nodes Are Included:
Make sure that all nodes, including the original and generated proposition nodes, are included in the all_nodes_dict
. This dictionary should map node IDs to nodes.
all_nodes = nodes + sub_nodes
all_nodes_dict = {n.node_id: n for n in all_nodes}
Consistent Service Context:
Use a consistent ServiceContext
for both the VectorStoreIndex
and the RetrieverQueryEngine
. This ensures that the same LLM and embedding models are used across different components.
service_context = ServiceContext.from_defaults(
llm=query_llm,
embed_model=embed_model,
num_output=self._proposition_llm.metadata.num_output,
)
Synchronized Vector Store and Retriever:
Initialize the VectorStoreIndex
with all nodes and use it to create a retriever that is then included in the retriever_dict
of the RecursiveRetriever
. This ensures that the retriever is always in sync with the vector store.
if os.path.exists('./chroma_db'):
chroma_client = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
self.vector_index = VectorStoreIndex.from_vector_store(vector_store, service_context=service_context)
else:
chroma_client = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
self.vector_index = VectorStoreIndex(
all_nodes, service_context=service_context, show_progress=True, storage_context=storage_context, store_nodes_override=True
)
self.retriever = RecursiveRetriever(
"vector",
retriever_dict={
"vector": self.vector_index.as_retriever(similarity_top_k=similarity_top_k)
},
node_dict=all_nodes_dict,
)
Consistent Query Engine Initialization:
Initialize the RetrieverQueryEngine
using the same service_context
and the RecursiveRetriever
. This ensures that the query engine operates on the same data and configuration as the retriever and vector store.
self.query_engine = RetrieverQueryEngine.from_args(
self.retriever, service_context=service_context
)
Check for Missing Nodes:
Ensure that the nodes are correctly added to the vector_store
and that their IDs are correctly mapped in the all_nodes_dict
. Any missing nodes could lead to the "query ID not found" error.
By following these practices, you can ensure that the vector_store
, retriever_dict
, and query_engine_dict
remain consistent and synchronized within the DenseXRetrievalPack
class [1][2].
To continue talking to Dosu, mention @dosu.
Question Validation
Question
getting error while running Dense_X with same index name, while entering a query to generate response,
Query id 40b9d01f-b211-413d-b3d4-a799eff700d6 not found in either retriever_dict or query_engine_dict
it is working fine for below code:
but when we reuse embeddings using Dense_X we are getting query Id error