Closed isoboroff closed 9 months ago
Hi,
My name is Elias Bassani, and I am the author of ranx (a Python library for ranking evaluation and comparison).
During the implementation of my library, whose metrics are checked against trec_eval
for correctness, I noticed trec_eval
sometimes misbehaves when document score differences are tiny (less than 10-8).
I think this could be related to the issue above.
Here is a working example:
qrels
q1 0 d0 1
run
q1 Q0 d62 1 0.9752302810058760 Sys
q1 Q0 d25 2 0.9720277433347962 Sys
q1 Q0 d17 3 0.9425774560923942 Sys
q1 Q0 d24 4 0.9406931541311498 Sys
q1 Q0 d84 5 0.9394391812940595 Sys
q1 Q0 d3 6 0.9359824150337328 Sys
q1 Q0 d51 7 0.9316103082678681 Sys
q1 Q0 d0 8 0.9222621453752586 Sys
q1 Q0 d65 9 0.9222621427281577 Sys
q1 Q0 d8 10 0.9062275552229255 Sys
Output of trec_eval -m recip_rank qrels run
: 0.1111
Expected result : 0.125
Simply replacing all float
occurrences in the codebase with double
solves the issue.
Testing with make quicktest
outputs Test succeeeded
after replacement.
I can do a PR if you want.
Best,
Elias
A patch would be welcome, a patch against the 10.0-dev branch even moreso. I've been wary of touching this one. Does the 'make test' still pass?
Same results on 10.0-dev
branch (make quicktest
outputs Test succeeeded
).
I am currently on MacOS, I will check everything works fine on Windows and Ubuntu in the next few days.
Anything else you want me to check before opening a PR?
PS: make test
outputs make: Nothing to be done for 'test'
on both branches.
This is resolved in the 10.0-rc branch
This was reported by Fernando Diaz:
This may have broader implications across the codebase.