joaopalotti / trectools

A simple toolkit to process TREC files in Python.
https://pypi.python.org/pypi/trectools
BSD 3-Clause "New" or "Revised" License
163 stars 32 forks source link

NDCG=0.0 instead of nan for IDCG=0.0 #25

Closed lironT74 closed 2 years ago

lironT74 commented 3 years ago

Hi, I believe this update is necessary, since if IDCG is 0 then NDCG should be 0, and not nan.

guidozuc commented 3 years ago

thanks @lironT74. This is a quite particular case. Mathematically, to compute nDCG, the DCG value is divided by the IDCG value. Then, if IDCG=0, x/0 = nan, i.e. dividing a number by zero is undefined. In practice, if IDCG=0, it means there are no relevant documents for that query in the qrels. Thus, one should exclude the query from the dataset or the evaluation: whatever you do on that query, will be impossible to evaluate (as you have no relevant documents for that query).

ishnid commented 3 years ago

I can see both sides :-)

I would suggest that the project should aim to follow whatever behaviour the official trec_eval has in situations like these.

lironT74 commented 3 years ago

@guidozuc I agree, but as @ishnid mentioned, trec_eval outputs 0.0 for those cases.

lironT74 commented 3 years ago

@joaopalotti I see. It seems that the fix is changing other metrics which might suffer from this issue the (almost exact) same way you did for NDCG + the fillna per query fix, correct?

I can work on it in my spare time, but I am not sure which additional metrics need to be changed.

joaopalotti commented 2 years ago

Thanks @lironT74, your suggestion was taken and included in the last commit!