nmslib / hnswlib

Header-only C++/python library for fast approximate nearest neighbors
https://github.com/nmslib/hnswlib
Apache License 2.0
4.3k stars 633 forks source link

multi vector search use case #534

Open patelprateek opened 8 months ago

patelprateek commented 8 months ago

I observed recently we started supporting multi vector search and epsilon search. Can you please add some pointers or some documentation on what are the use cases ?

Is epsilon search trying to support the use case where we want to do some kind of range query search for ex : get the topK similar items with constraint that max distance of query to items < epsilon ?

I looked at the code for multi vector search but it was not clear to me what is the use case , it seems like we add repeated labels wtih the data in the space interface which could be different than external labels , can you please elaborate on what use case and query type this support ? From what i understand it just takes additional label in space interface which could be same for different embeddings , we try to assign each embedding a unique label but internally they could map to same labels and rest of the behaviour is pretty much the same ?

yurymalkov commented 8 months ago

Is epsilon search trying to support the use case where we want to do some kind of range query search for ex : get the topK similar items with constraint that max distance of query to items < epsilon ?

Correct. It is implemented in the C++. A code example is here https://github.com/nmslib/hnswlib/blob/master/examples/cpp/example_epsilon_search.cpp

From what i understand it just takes additional label in space interface which could be same for different embeddings , we try to assign each embedding a unique label but internally they could map to same labels and rest of the behaviour is pretty much the same ?

Yes. The main application is when one is searching for K documents, represented as multiple embeddings. As the output there should be K documents, not K embeddings, the new stop condition fixes this problem.