spotify / voyager

🛰️ An approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability.
https://spotify.github.io/voyager/
Apache License 2.0
1.26k stars 51 forks source link

Missing ann-benchmarks documentation? #32

Closed loisaidasam closed 10 months ago

loisaidasam commented 10 months ago

The README says:

like Annoy, but with much higher recall

https://github.com/spotify/voyager/blob/c6d09ccc91d4689b15e99b6df965f4580af85358/README.md#L17

but I don't see any references to Voyager on the ann benchmarks page. Am I missing something?

loretoparisi commented 10 months ago

I think they were mentioning the fact that in respect to Annoy, Voyager is using hnsw lib as ANN, hence the base benchmark shows that Voyager has a higher Recall / Queries per seconds than Annoy. You can clearly see this from here.

image

loisaidasam commented 10 months ago

Which "base benchmark" are you referring to?

Perhaps faiss's hnsw implementation? or perhaps hnswlib? Those don't seem like fair comparisons, since Voyager is an entirely separate implementation.

Also, I think "clearly" is being generous. The way that the README links to ann-benchmarks suggests that one would see "Voyager" on these charts.

loretoparisi commented 10 months ago

hnswlib

I think a reasonable reference (Base line) could be the bare hnswlib. That said, according to the announcement, Voyager has a customized hnswllib version (as you can see from the sources), so the tests should be done from scratch to be more accurate.

ijanderso commented 10 months ago

Closing as it has been added to ann-benchmarks: https://github.com/erikbern/ann-benchmarks/pull/473