neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
621 stars 160 forks source link

GDS cosine similarity #168

Closed devineyfajr closed 2 years ago

devineyfajr commented 2 years ago

Describe the bug

returns wrong result. 1.0 instead of -1.0

To Reproduce

neo4j@neo4j> return gds.alpha.similarity.cosine([-1.0,-1.0], [1.0,1.0]); +-----------------------------------------------------+ | gds.alpha.similarity.cosine([-1.0,-1.0], [1.0,1.0]) | +-----------------------------------------------------+ | 1.0 | +-----------------------------------------------------+

GDS version: X.Y.Z Neo4j version: X.Y.Z Operating system: (for example Windows 95/Ubuntu 16.04)

Steps to reproduce the behavior:

Expected behavior

Should return -1.0

Additional context

https://en.wikipedia.org/wiki/Cosine_similarity - "two opposite vectors have a similarity of -1"

on https://neo4j.com/docs/graph-data-science/current/alpha-algorithms/cosine/, the formula similarity(A,B) for [-1,-1] and [1,1] yields -2/2, which equals -1.

devineyfajr commented 2 years ago

another example

neo4j@neo4j> RETURN gds.alpha.similarity.cosine([1,0], [-1,0]) AS similarity; +------------+ | similarity | +------------+ | 1.0 | +------------+

adamnsch commented 2 years ago

Thank you for reporting this. That seems wrong indeed. We will have a look.

adamnsch commented 2 years ago

A fixed has been merged https://github.com/neo4j/graph-data-science/commit/69e77e2bfc297ce6b9dab17e58fdcd32c76972bb. It will be part of the next 2.0 alpha release and the next stable patch release (1.8.4).