Closed PepijnBoers closed 3 years ago
Hi Pepijin,
Thanks for using this tool! Your question is really good and makes me think that we should change part of the evaluation workflow in trectools.
But the answer for why simply changing from P_10 to NDCG_5 does not work is that that NDCG_5 is not part of the default output from trec_eval
.
The method evaluate_runs
, as it is implemented now, will only output a part of all possible metrics.
There are multiple ways to enhance trectools
and you are invited to be part of it.
Just FYI, if you want to write some code outside of trectools to plot the NDCG@5 values per run, your code will look like this:
from trectools import TrecEval
from trectools import TrecRes
import pandas as pd
results = []
for r in runs:
evaluator = TrecEval(r, qrels)
ndcg_5 = evaluator.get_ndcg(depth=5)
result_run = [{"metric": "NDCG_5", "query": "all", "value": ndcg_5}]
tres = TrecRes()
tres.data = pd.DataFrame(result_run)
tres.runid = r.get_runid()
results.append(tres)
ndcg5 = procedures.extract_metric_from_results(results, "NDCG_5")
# Note that error bars are not ploted. If you want error bars, you have to evaluate NDCG with per_query = True
procedures.plot_system_rank(ndcg5, display_metric="NDCG@5")
Ideally, what we would like is to simplify it and that is the purpose of the module called procedures
.
I can eventually work on it, but please feel welcome to help out as well!
Is it possible to run this example with ndcg@5 instead of P@10? Simply replacing the "P_10" into "NDCG_5" does not seem to work.