Open JPC612 opened 1 month ago
The BM25Retriever, as currently implemented, interacts primarily with the document store to retrieve nodes for the BM25 model. It does not have a built-in mechanism to limit retrieval to only those documents present in a specific VectorStoreIndex. This means that, by default, it will access all documents in the document store, not just those in the VectorStoreIndex [1][2].
Unfortunately, there is no configuration option or method available in the BM25Retriever to filter documents based on their presence in a VectorStoreIndex. This limitation might lead to the behavior you're observing, where the retriever accesses all documents in the document store [1][2].
To continue talking to Dosu, mention @dosu.
Question Validation
Question
Hello, I am currently using the following configuration for the BM25 retriever:
I’ve noticed that the BM25 retriever seems to access all documents in the docstore, not just the ones contained within the VectorStoreIndex (index). Is this behavior expected or intended?
In my use case, I want the BM25 retriever to only retrieve documents that are also present in the corresponding VectorStoreIndex (index), as otherwise, in hybrid search, it retrieves documents that users should not see because they are not part of the specific index. Is there a way to configure BM25 so it only retrieves documents that are also in the VectorStoreIndex?
For context, I am using Qdrant as my vector store and RedisDocumentStore as my document store.
Thanks in advance for your help!