Closed aliozts closed 3 months ago
Hey there @aliozts! Great to see you diving into another challenge 🚀
Based on the information you've provided, it seems like you're experiencing an issue with the RecursiveRetriever
not working as expected when used with a StorageContext
. This issue might be related to the way the StorageContext
interacts with the VectorStoreIndex
and RecursiveRetriever
.
From the context provided, it appears that there are similar issues reported in the LlamaIndex repository. One such issue is Failing to retrieve documents because vector store is empty when storage context is loaded, and another is [Bug]: cannot get document node. The suggested solution for these issues is to ensure that the document ID exists within the correct collection, the namespace and collection suffix are correctly configured, and the key-value store is functioning properly.
In the LlamaIndex codebase, the StorageContext
is used to manage different types of stores involved in storing and retrieving data. It encapsulates the document store (docstore
), index store (index_store
), vector stores (vector_stores
), and graph store (graph_store
). These components are essential for the retrieval process in the LlamaIndex system, enabling efficient storage, indexing, and retrieval of documents, vectors, and graph data.
The VectorStoreIndex
class in the LlamaIndex codebase is designed to work on top of an existing vector store, facilitating the indexing and retrieval of vectorized representations of data. Its primary role is to manage the process of converting data nodes into their vector embeddings, storing these embeddings, and then allowing for efficient retrieval based on these vector representations.
The RecursiveRetriever
class in the LlamaIndex codebase is designed to handle the retrieval of nodes in a recursive manner, particularly focusing on exploring links from nodes to other retrievers or query engines. Its primary role in the retrieval process is to navigate through the nodes, identify if they are IndexNodes
, and if so, query the linked retriever or query engine associated with those nodes.
Given this information, it's possible that the issue you're experiencing is due to a misconfiguration or misunderstanding of how these components interact. I would recommend checking the configuration of your StorageContext
and ensuring that your VectorStoreIndex
and RecursiveRetriever
are set up correctly. If you're still experiencing issues, it would be helpful to provide more information about the specific error messages you're seeing or the behavior that's not as expected.
The way I used is to create it with storage_context
first for the vector store
and doc store
to be filled. Then call it with
# first create the index with vector store and docstore so that they are populated
storage_context = StorageContext.from_defaults(vector_store=vector_store, docstore=docstore)
vector_index_chunk = VectorStoreIndex(nodes=nodes, embed_model=embeddings_creator, insert_batch_size=128, storage_context=storage_context, show_progress=True)
vector_index_chunk = VectorStoreIndex.from_vector_store(embed_model=embeddings_creator,
vector_store=vector_store
)
vector_retriever_chunk = vector_index_chunk.as_retriever(similarity_top_k=10)
retriever_chunk = RecursiveRetriever(
"vector",
retriever_dict={"vector": vector_retriever_chunk},
node_dict=docstore.docs,
verbose=True,
)
This worked for me but I'd prefer to not have this way around and use the vector_retriever_chunk
directly.
Question Validation
Question
Following the example from the documentation. I wanted to create a
RecursiveRetriever
utilizingQdrant
as the vector db andRedis
as thedocstore
. I'm doingWhen I perform retrieval with this
retriever_chunk
, I cannot retrieve any documents. If I don't use theStorageContext
. It works properly. If I create thevectorstoreindex
this way and utilizeVectorStoreIndex.from_vector_store
it also works (not sure if correct though). May I ask if this can be checked? Thank you