Indexing dense numeric vector

juji-io / datalevin

A simple, fast and versatile Datalog database

https://github.com/juji-io/datalevin

Eclipse Public License 1.0

1.13k stars 63 forks source link

Open huahaiy opened 2 years ago

huahaiy commented 2 years ago

Add a data type :vec, for indexing dense numeric vectors and search based on similarity.

huahaiy commented 1 year ago

Current state of art is ScaNN, see paper, extended paper, which is built for tensorflow serving and depends on tensorflow.

huahaiy commented 1 year ago

huahaiy commented 1 year ago

For our initial version, https://github.com/nmslib/hnswlib seems to be ideal, for the following reason:

It performs well across the board. https://github.com/erikbern/ann-benchmarks/
It out performs everything else for huge vectors https://ann-benchmarks.com/kosarak-jaccard_10_jaccard.html. The current batch of state of the art libraries optimize for vectors less than 1000 dimensions. But my hunch is that huge vectors (greater than 10k dimensions) are needed to really take advantage of the vector views of semantics (i..e. the so called vector symbolic approach).
It is small and seems to be much easier to integrate than others.

huahaiy commented 1 year ago

huahaiy commented 3 months ago