Open heinrichreimer opened 1 month ago
Very much so! It would make the most sense as a sub-type of scored docs!? I will put it on my to-do list, but I will not get to it within the next two weeks. Feel free to integrate this yourself :)
Good question on how to integrate this, but I would also be a big fan of this!
I could make a first proposal, as I need to process it for some other project anyway, so I could do some first "hacking" and then we can improve upon this :)
Awesome! IMO the most fitting way would be to add it as a scored_docs and set the score as the negative rank: https://github.com/allenai/ir_datasets/blob/930a4e076f21b623d1de713ec434686b2c2c292d/ir_datasets/formats/base.py#L27
I added a first version: https://github.com/webis-de/msmarco-llm-distillation/blob/main/data/ir_datasets_scored_docs.py
Very nice!
Could be handy to have this dataset in ir_datasets.