Numerical difference between trectools and trec_eval in terms of Rprec and ndcg

sergeyf commented 4 years ago

Hello,

I have a situation where there large differences between some metrics computed using the official trec_eval (as called from pytrec_eval) and trectools. To reproduce, you can run the following (it will download file.qrel and file.run from gist):

Expected output:

trectools_results {'Rprec': 79.06, 'ndcg': 93.69}                                                                                                                       
pytrec_eval_results {'Rprec': 79.62, 'ndcg': 94.35}

The code:

import urllib.request
import numpy as np
from trectools import TrecQrel, TrecRun, TrecEval
import pytrec_eval

##### download qrel and run files

qrel_file_path = 'https://gist.githubusercontent.com/sergeyf/4d88da8d865ccad06cfd140b8583cf55/raw/d93a104817103e4330619aa93506c424cfd5ae16/file.qrel'
qrel_file = 'file.qrel'
urllib.request.urlretrieve(qrel_file_path, qrel_file)

run_file_path = 'https://gist.githubusercontent.com/sergeyf/4d88da8d865ccad06cfd140b8583cf55/raw/d93a104817103e4330619aa93506c424cfd5ae16/file.run'
run_file = 'file.run'
urllib.request.urlretrieve(run_file_path, run_file)

###### trectools

qrel = TrecQrel(qrel_file)
run = TrecRun(run_file)

trec_eval = TrecEval(run, qrel)

trectools_results = {'Rprec': np.round(100 * trec_eval.get_rprec(), 2),
                     'ndcg': np.round(100 * trec_eval.get_ndcg(), 2)}

###### pytrec_eval
def get_metrics(qrel_file, run_file, metrics=('ndcg', 'Rprec')):
    with open(qrel_file, 'r') as f_qrel:
        qrel = pytrec_eval.parse_qrel(f_qrel)

    with open(run_file, 'r') as f_run:
        run = pytrec_eval.parse_run(f_run)

    evaluator = pytrec_eval.RelevanceEvaluator(qrel, set(metrics))
    results = evaluator.evaluate(run)

    out = {}
    for measure in sorted(metrics):
        res = pytrec_eval.compute_aggregated_measure(
                measure, 
                [query_measures[measure]  for query_measures in results.values()]
            )
        out[measure] = np.round(100 * res, 2)
    return out

pytrec_eval_results = get_metrics(qrel_file, run_file)

print('trectools_results', trectools_results)
print('pytrec_eval_results', pytrec_eval_results)

joaopalotti commented 4 years ago

Hi Sergey,

Thanks for using this package and reporting this bug. The problem happened when one of the topics had no relevant documents in the qrels. It should be fixed now. Please update the package in your system to v0.0.43.

By the way, we have been looking for help to write unit tests for trectools. Once we have all the proper tests done, we can release version 1.0. Please feel free to help the community and grow this package!

sergeyf commented 4 years ago

Ah, glad that it is fixed!

Are unit tests mostly comparisons to the command line trec_eval tool?

joaopalotti commented 4 years ago

That is correct. I started writing some code for that (please see https://github.com/joaopalotti/trectools/blob/master/unittests/testtreceval.py), but I could never continue. Please let me know if you are interested in helping the development.

Anyhow, as this bug is fixed, I will close this issue now.

Thanks!

sergeyf commented 4 years ago

It would be fun to help, but I'm not flush with time at the moment (job, family, etc). Sorry to say, but if I do have some time, I'll loop around and maybe make some tests!

joaopalotti / trectools

Numerical difference between trectools and trec_eval in terms of Rprec and ndcg #18