zilliztech / knowhere

Knowhere is an open-source vector search engine, integrating FAISS, HNSW, etc.
Apache License 2.0
189 stars 82 forks source link

Does hnsw and other ann support adding external labels ? #360

Closed patelprateek closed 8 months ago

patelprateek commented 10 months ago

I am trying to move our production core ann engine to knowhere and in order to support few legacy use case that used the nmslib/hnswlib header only library we wanted to add data points with our labels . I observe in the thirdparty/hnswlib , we do not have labellookup map. Curios if we support adding embeddings with external label/document ids as opposed to some internally assigned ones ?

liliu-z commented 10 months ago

Knowhere doesn't support labels for now. It will only use offset as the id.

patelprateek commented 10 months ago

@liliu-z : can you please elaborate a bit . If i have a bunch of data that insert in parallel to my index , then the index internally assigns some sequential ids to those data items . For k nearest neighbour how do we map back the ids to some external ids ? how exactly the clients/customer make sense of the internal ids returned by search api ?

patelprateek commented 10 months ago

@liliu-z : any idea how milvus saves the external ids ? it seems milvus does support adding documents with ids .

For using zilliz , if i insert few million docs in parallel using multiple threads and then query for nearest neighbours , how do i make sense of internal ids ? for faiss usually the internal ids are in same order as in a batch , do we need to rescane the data in index to get the internal ids ?

liliu-z commented 9 months ago

We can simply think Milvus collection as a table, the label is a column of the table. In each Segment (smallest unit for data), the organic order of data row determine the offsets of vectors in Knowhere from 0 to len(segment) - 1. And after get the the topK offsets back from Knowhere, Milvus will find the corresponding rows and pop them up to higher level for further reducing.

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.