spotify / annoy-java

Approximate nearest neighbors in Java
Apache License 2.0
138 stars 46 forks source link

"ANGULAR" distance defined differently than in main annoy package #22

Open jrhaberstroh opened 2 years ago

jrhaberstroh commented 2 years ago

This Java implementation defines ANGULAR distance as "cosine similarity", while the C++/python implementation defines ANGULAR distance as "approximate theta".

Evidence:

Java implementation -- dot(u,v), the value of cosine https://github.com/spotify/annoy-java/blob/aef275bd5b872b0931bde557bf8627c0476246af/src/main/java/com/spotify/annoy/ANNIndex.java#L189 https://github.com/spotify/annoy-java/blob/aef275bd5b872b0931bde557bf8627c0476246af/src/main/java/com/spotify/annoy/ANNIndex.java#L319

C++ / python implementation -- sqrt(2 * (1 - dot(u,v))), an approximation of theta https://github.com/spotify/annoy/blob/6f6b0c84ab413337eb4d2e850a4cba637f52ccbc/src/annoylib.h#L473 https://github.com/spotify/annoy/blob/6f6b0c84ab413337eb4d2e850a4cba637f52ccbc/src/annoylib.h#L502

Related issue defining "angular" method in the other repo: https://github.com/spotify/annoy/issues/530

Because this would be a significant breaking change, I did not put a PR together. Would the lead devs provide guidance on the appropriate way to name a new distance that implements angular correctly ("angular-cpp" or "angular-correct"?).