neo4j-contrib / neo4j-graph-algorithms

Efficient Graph Algorithms for Neo4j
https://github.com/neo4j/graph-data-science/
GNU General Public License v3.0
770 stars 194 forks source link

Fixed calculation of denominator for Jaccard-similarity #907

Open oschlueter opened 5 years ago

oschlueter commented 5 years ago

The current implementation of Jaccard-similarity doesn't discard dupliate input values when calculating the denominator. I identified this issue by calculating Jaccard on identical input containing duplicates which didn't return 1.0 for which I added test cases.

oschlueter commented 5 years ago

When creating the test cases I forgot to set the call to Similarities::overlapSimilarity. Once fixed I saw that the calculation is also affected so I added a propsed fix for overlapSimilarity as well.