Open Akanksha-kedia opened 6 months ago
@Akanksha-kedia I've checked the implementation and seems that you are correct. Fix is on the way
This is a duplicate of https://github.com/trinodb/trino/issues/18995
i have closed, This is a duplicate of https://github.com/trinodb/trino/issues/18995.
Title: Incorrect Jaccard Index Calculation in Trino
Description:
I've encountered an issue with the jaccard_index function in Trino where the output does not match the expected result according to the Jaccard index formula.
Here are the queries I ran:
SELECT jaccard_index(make_set_digest(value), make_set_digest(value1)) FROM (VALUES ('abc', 'def'),('ee', 'abc')) T(value,value1); The expected Jaccard index for this query should be 0.3333333333333333, but the output is 0.5.
SELECT jaccard_index(make_set_digest(value), make_set_digest(value1)) FROM (VALUES (1,4),(2,5),(3,6),(4,7),(5,8)) T(value,value1); For this query, the sets are s1 = {1, 2, 3, 4, 5} and s2 = {4, 5, 6, 7, 8}. The expected Jaccard index is 0.25, but the output is 0.4.
The Jaccard index is a measure of the similarity between two sets and is calculated as the size of the intersection divided by the size of the union of the two sets. Based on this, the outputs of the above queries should be 0.3333333333333333 and 0.25 respectively.
This seems to be a bug in the jaccard_index function in Trino.
Can someone look into this.