Open plurch opened 9 months ago
Hi @plurch! That's available in the C++ layer already. I assume you are using a binding. Is it written in Python? Are you going to check those IDs against some Python data-structure like a set or dictionary?
Hi @ashvardanian , I am using the python bindings to build the index and then the javascript bindings to do the search from a web app. So I really would be using the search filtering through JS in my case, but python support would probably be useful eventually also.
I think some type of interface similar to faiss that accepts a set of int ids to exclude/include might work. This isn't something that is actively blocking me, just curious about feasibility at this point while evaluating some ANN libraries. Thanks!
Gotcha, @plurch, thanks for the feedback! I will keep in mind for future releases π€
Sounds good π
I would be able to utilise this library with this functionality. Love the work though! I would be using the python lib.
metadata filtering would a game change feature
are you thinking about adding metadata storage besides vectors storage? i mean, for the filtering support. Avoiding Faiss way in which you should filter in advance the ids to compare, but sometimes these ids could be million quantities
@raulcarlomagno, in our case, we use predicate functions instead of an ID list. Passing them from C and Rust isn't hard to add, C++ already supports that, but in Python and JavaScript, I am not sure about how we can make it fast...
what about adding an optional storage for metadata like rocksdb? you keep the current vectors index, and other index for the metadata, and this predicate function thing is done inside C, not python the heavy thing is done in internally in C, transparent for python API wrapper
or maybe you don't want to mess storing metadata... βΊοΈ
Hi @plurch! That's available in the C++ layer already. I assume you are using a binding. Is it written in Python? Are you going to check those IDs against some Python data-structure like a set or dictionary?
Would this also apply to the clustering, as this would be a real game changer?
I just built a PoC with usearch. It's amazing! However, metadata filtering is blocking me to use it in our product. In our case, we were first using the java bindings, but are now using python. A predicate solution, like you have in c++, would work. However setting some meta data fields and then being able to filter on them would be the best--basically how it works in qdrant.
I'm aware that this is a big ask and will require to extend your store by some other, non-vector index, but it would make it one of the most attractive in-process vector stores out there.
Describe what you are looking for
Very nice library with fast performance in my limited testing so far π
I am curious to know if there are any plans to support filtering when searching the index? It is useful in some use-cases to exclude specific ids and still get the expected top
n
results returned.Faiss does support this but it has some performance impacts.
Can you contribute to the implementation?
Is your feature request specific to a certain interface?
It applies to everything
Contact Details
No response
Is there an existing issue for this?
Code of Conduct