vespa-engine / vespa

AI + Data, online. https://vespa.ai
https://vespa.ai
Apache License 2.0
5.58k stars 587 forks source link

Support for dot product distance with HNSW index #16456

Closed SaiKiranBurle closed 3 years ago

SaiKiranBurle commented 3 years ago

Hello, I intend to use Vespa along with this Approximate nearest neighbor capabilities for large scale recommender systems.

After some initial search, the supported distance metrics (listed at https://docs.vespa.ai/documentation/reference/schema-reference.html#distance-metric) does not include dot product function. Because of various intricacies in our model, we cannot use cosine similarity. We need to use dot product where the vectors are not normalized. Is there any way I can leverage the HNSW index of Vespa for my use-case?

jobergum commented 3 years ago

Hello @SaiKiranBurle and thank you for your interest in Vespa and its approximate nearest neighbor search operator.

It's correct that dotproduct is not listed as it's not a proper distance metric.

The common approach is to transform dotproduct to euclidean space, see Speeding Up the Xbox Recommender System Using a Euclidean Transformation for Inner-Product Spaces

We use the mentioned transformation in this dense retrieval sample app where the original DPR representation uses the dot product and the representation on Vespa has transformed the data to euclidean space where we can use ANN.

See also https://blog.vespa.ai/efficient-open-domain-question-answering-on-vespa/

SaiKiranBurle commented 3 years ago

Thank you. This is a reasonable alternative.

jobergum commented 3 years ago

Great, thanks for the feedback and feel free to reach out. We are also releasing new sample apps and tutorials for recommendation very soon so make sure you follow blog.vespa.ai for updates.

To optimize bootstrap feeding with large tensors I recommend a high concurrency parameter (1.0). See https://docs.vespa.ai/en/reference/services-content.html#feeding-concurrency.