Open seanmacavaney opened 3 years ago
fwiw it appears that the version of the qid/pid triples file prior to https://github.com/microsoft/MSMARCO-Passage-Ranking/commit/4695a71c6c76ce85c07a51c0f12690cab19abbb0 did have the triples in the same order as triples.train.full.tsv.gz
(but some records were missing, which is what the change was about).
I think there are compelling reasons to have the qidpidtruples
file in the same order as the triples
file. But I also understand that this may seem somewhat pedantic and not be seen as a priority.
If I built this file for you, would you host it?
Per discussion here: https://github.com/microsoft/MSMARCO-Passage-Ranking/commit/4695a71c6c76ce85c07a51c0f12690cab19abbb0
The current version of
qidpidtriples.train.full.2.tsv.gz
has the same records astriples.train.full.tsv.gz
, but they are in a different order.It would be nice for these to be consistent so that those using these files as the training data sequence can control for the order of training in experiments.