Open mam10eks opened 1 year ago
If we have this ticket, we can implement this:
I think I would now also favor such multi-step jobs for all re-ranking approaches in information retrieval experiments.
My line of thinking is the following:
For each IR benchmark, we add two datasets in TIRA:
Full-Rank: software has access to the complete corpus
Re-Rank: a "pseudo dataset" where a software has access to to-be-re-ranked query document pairs
For the re-ranking scenario, one must further select which run one wants to re-rank (out of all own and public runs).
If there is an official re-ranking run (e.g., for MS MARCO), we should define them as default.
For all non-default runs, I would suggest the following:
Users might select (some) public run of the dataset as the to-be-re-ranked run
TIRA automatically wraps this run into an "ir-datasets-loader" job that is the previous stage of the job
I.e., if the run was never used before, the run is transformed by the "ir-datasets-loader" job into the "standard" to-be-re-ranked query document pair format
If the run was used before, the standard multi-step-thing described above kicks in so that the job itself is not executed again
The job itself then uses the to-be-re-ranked query document pairs as "pseudo-input" (i.e., directly merged with the original input)
This has the advantage, that we can provide a set "default" runs to be re-ranked with appropriate documentation (e.g., BM25, the judgment pool with corresponding warnings that this usually tends to overestimate the effectiveness, etc) but also all other runs (and even very costly runs, as mono/duoT5 etc.) can be used directly without adoption. We have full flexibility per dataset and can show this transparent in the leaderboards. I think this way, we provide a big benefit, because users can even built upon very costly systems in a fast and cost-efficient way. We should combine this and run some of the costly but important pipelines on all datasets (and make one or two A100 GPUs temporarily available in TIRA so that we can even run the largest versions?).
This issue has been marked stale because it has been open 60 days with no activity.
The goal of this ticket is to allow jobs with access to ir_datasets.