Closed GoogleCodeExporter closed 9 years ago
Sorry, I can't edit the title: It's rather a precision problem than a round-off
error.
Original comment by rouven.r...@gmail.com
on 19 Jul 2013 at 8:23
Well, we could add a check for identical sentences and return 1.0 in that case,
but it will increase the run-time in all other cases.
Is it affecting your application?
In most cases 0.9999999999999999 should be close enough to 1.0 to pass a
equality check with a reasonable epsilon.
Original comment by torsten....@gmail.com
on 19 Jul 2013 at 9:04
Hi, no "0.9999999999999999" doesn't affect my application at all! I just added
this example for completeness.
However, I assume most users (including myself) expect the score strictly
within [0, 1], therefore 1.0000000000000002 was a problem for my application.
Original comment by rouven.r...@gmail.com
on 19 Jul 2013 at 1:25
Should be fixed now.
Original comment by torsten....@gmail.com
on 22 Jul 2013 at 1:04
The fix introduces another bug.
See the following (anonymized) example:
String textA = "1 3 4 5 6 7 8 9 3 10 7 11 .";
String textB = "2 3 12 13 5 3 7 11 14 15 3 7 .";
CosineSimilarity cosineSimilarityMeasure = new CosineSimilarity();
List<String> tokensA = getTokens(textA);
List<String> tokensB = getTokens(textB);
double similarity = cosineSimilarityMeasure.getSimilarity(tokensA, tokensB);
System.out.println(similarity); // Returns 1.0
The incorrect score 1.0 is returned because of the change in line 198 ff.
Original comment by rouven.r...@gmail.com
on 19 Aug 2013 at 2:23
Original issue reported on code.google.com by
rouven.r...@gmail.com
on 19 Jul 2013 at 7:26