terrierteam / ir_measures

provides a common interface to many IR measure tools
https://ir-measur.es/
Apache License 2.0
76 stars 8 forks source link

qrel readed by ir_measures.read_trec_qrels gives nan #34

Closed ArvinZhuang closed 2 years ago

ArvinZhuang commented 2 years ago

Hi, I found that using ir_measures.read_trec_qrels to read a trec qrel file and do multi-time evaluations will only work for the first evaluation but give nan for laters.

For example my qrel.txt is:

Q0  0   D0  0
Q0  0   D1  1
Q1  0   D0  0
Q1  0   D3  2

and run the following code:

import ir_measures
from ir_measures import * # imports all supported measures, e.g., AP, nDCG, RR, P

qrels = ir_measures.read_trec_qrels('qrel.txt')
run = {
    'Q0': {"D0": 1.2, "D1": 1.0},
    "Q1": {"D0": 2.4, "D3": 3.6}
}

print(ir_measures.calc_aggregate([RR@10, nDCG@10, R@1000], qrels, run))
print(ir_measures.calc_aggregate([RR@10, nDCG@10, R@1000], qrels, run))

this gives:

{nDCG@10: 0.8154648767857288, RR@10: 0.75, R@1000: 1.0}
{nDCG@10: nan, RR@10: nan, R@1000: nan}

this is however fine if I define the qrel myself as a dictionary:

seanmacavaney commented 2 years ago

Hi @ArvinZhuang

The original design was to try to avoid keeping everything in memory, when possible. E.g., in case qrels or run files were super big. As a result, read_trec_qrels actually returns an iterator over the file. So after the first time you consume it, it's gone. If you want to load it all into memory, you can wrap it in the list constructor:

qrels = list(ir_measures.read_trec_qrels('qrel.txt'))

Now, in practice, most of the measure provides actually load up the entire run/qrels into memory internally anyway. So I may just reconsider the above design decision.

Does this resolve your issue?

ArvinZhuang commented 2 years ago

Hi @seanmacavaney, yes this solves my issue, and I think your design makes sense. Thank you!