rapidsai / cuvs

cuVS - a library for vector search and clustering on the GPU
https://rapids.ai
Apache License 2.0
232 stars 67 forks source link

[FEA] Support / expose "labeled" filtered search #418

Open bkarsin opened 1 month ago

bkarsin commented 1 month ago

Filtered search for a more general type of label data is needed for many use cases. Typically, this takes the form of a vector of integers for every vector in the dataset and query. The general predicate function approach implemented in search_with_filter can be used for this, but is not currently exposed in the header for CAGRA. As a work-around, the source file with "search_with_filtering" can be included directly, but this greatly increases compile time and is not ideal.

If it is simpler or more performant, another option is a less general "labeled" search option where a specific format of label data can be provided to perform filtered search this way. Something like this could satisfy most use cases and may be preferable to a completely general predicate function.

lowener commented 1 month ago

As of version 24.10, the search_with_filter function has been replaced with an overload on the search function that you can see here. The source file doesn't need to be included directly.

bkarsin commented 4 weeks ago

Does this new overloaded search function also work with an arbitrary predicate-based filter? I was under the impression the new overloaded version only supported bitset and bitmap filters.

lowener commented 3 weeks ago

It support only bitset or bitmap, depending on the index. Predicate-based filtering is still being worked on.