joaopalotti / trectools

A simple toolkit to process TREC files in Python.
https://pypi.python.org/pypi/trectools
BSD 3-Clause "New" or "Revised" License
163 stars 32 forks source link

Run example 0 with NDCG@5 #20

Closed PepijnBoers closed 3 years ago

PepijnBoers commented 4 years ago

Is it possible to run this example with ndcg@5 instead of P@10? Simply replacing the "P_10" into "NDCG_5" does not seem to work.

from trectools import TrecQrel, procedures

qrels_file = "./robust03/qrel/robust03_qrels.txt"
qrels = TrecQrel(qrels_file)

# Generates a NDCG@5 graph with all the runs in a directory
path_to_runs = "./robust03/runs/"
runs = procedures.list_of_runs_from_path(path_to_runs, "*.gz")

results = procedures.evaluate_runs(runs, qrels, per_query=True)
ndcg_5 = procedures.extract_metric_from_results(results, "NDCG_5")
fig = procedures.plot_system_rank(ndcg_5, display_metric="NDCG@5", outfile="plot.pdf")
fig.savefig("plot.pdf", bbox_inches='tight', dpi=600)
# Sample output with one run for each participating team in robust03:
joaopalotti commented 4 years ago

Hi Pepijin,

Thanks for using this tool! Your question is really good and makes me think that we should change part of the evaluation workflow in trectools. But the answer for why simply changing from P_10 to NDCG_5 does not work is that that NDCG_5 is not part of the default output from trec_eval. The method evaluate_runs, as it is implemented now, will only output a part of all possible metrics. There are multiple ways to enhance trectools and you are invited to be part of it.

Just FYI, if you want to write some code outside of trectools to plot the NDCG@5 values per run, your code will look like this:

from trectools import TrecEval
from trectools import TrecRes
import pandas as pd

results = []
for r in runs:
    evaluator = TrecEval(r, qrels)
    ndcg_5 = evaluator.get_ndcg(depth=5)

    result_run = [{"metric": "NDCG_5", "query": "all", "value": ndcg_5}]

    tres = TrecRes()
    tres.data = pd.DataFrame(result_run)
    tres.runid = r.get_runid() 

    results.append(tres)

ndcg5 = procedures.extract_metric_from_results(results, "NDCG_5")
# Note that error bars are not ploted. If you want error bars, you have to evaluate NDCG with per_query = True
procedures.plot_system_rank(ndcg5, display_metric="NDCG@5")

Ideally, what we would like is to simplify it and that is the purpose of the module called procedures. I can eventually work on it, but please feel welcome to help out as well!