Open Timotheevin opened 4 months ago
In theory it sounds easy, but I'm not sure if the bm25s
library exposes an easy way to do this
At first glance, it feels like it doesn't
Hi @logan-markewich,
Thanks for your answer. You're right, the bm25s
library doesn't seem to implement this feature, but I was wondering if it was possible to deal with this on the wrapper side.
In llama-index/retrievers/bm25/base.py:39, there is :
self._corpus = [self._tokenizer(node.get_content()) for node in self._nodes]
self.bm25 = BM25Okapi(self._corpus)
would it be a feasible solution to apply the filter at this stage ? (i.e. filter the _nodes before those two lines depending on the metadata of each node)
Thanks
Thats in the constructor though. Normally you'd want to filter per-retrieval no?
If you want, you can filter the nodes before constructing the retriever then in that case
Idk if you can give a filter as an argument at the retrieval stage. At least in this doc, it is done when building the retriever :
filters = MetadataFilters(
filters=[
MetadataFilter(key="theme", operator=FilterOperator.EQ, value="Mafia"),
]
)
retriever = index.as_retriever(filters=filters)
retriever.retrieve("What is inception about?")
But yes you're right, I can do it before constructing the retriever, I guess it's just a matter of where is the line between what feature is provided by the framework and what has to be implemented on the dev side.
Any update on this issue? Looks like it is needed for most of the usecases when ever BM25 retriever is used
Feature Description
It seems there is no way to add metadata filters when initializing a BM25Retriever object. I am wondering if it would be possible to add this feature.
Reason
No idea why this is not already implemented, doesn't seem very difficult technically speaking.
Value of Feature
a VectorIndexRetriever is usually used along with a BM25Retriever, but only the VectorIndexRetriever can currently take a filter as argument.