Closed drennings closed 5 years ago
Hey.
Good eye. Seems I uploaded the wrong files but I have fixed it. We had initially subsampled the files when experimenting since the set is so big.
New queries 101093 queries.dev.tsv 101092 queries.eval.tsv 502939 queries.train.tsv new qrels 45684 qrels.dev.tsv 401023 qrels.train.tsv
The sizes are going to be a little different since for the train set we are removing all queries that do not have an answer(original train is ~800,000) but we have not removed these from dev and eval in order to keep these sets help out and not affecting the other msmarco task.
That being said, the percentage of queries that do not have answers is about the same across splits(~35%) so the sets are now matched.
Great, thanks for the fix and clarification!
Let me double check if I get things right:
queries.dev
, queries.eval
, qrels.train
and qrels.dev
have been updated, whereas collection
and queries.train
have not been updated.queries.train
that do not have an answer in qrels.train
, but do have an answer passage present in the collection (that is, presuming that the original qrels.train
contained 532,761 correct entries).If these statements are correct, I would wonder what the use would be of the questions in questions.train
that actually have an answer in collection.tsv
that is not mentioned in qrels.train
. If each split should contain queries for which no answer exists, shouldn't queries.train then also be updated? For instance:
Hey,
Sorry for closing this early.
You are correct those files were updated. It seems that the original queries.dev and queries.eval were just subsamples of the actual queries so I included the whole part. The qrels.train became smaller because it seems there was some normalization with the collection.tsv and the person who did that is on vacation. Once I fix this normalization error I will update the collection.tsv file and the qrels file. The expectede file should be ~550k for train, 56k for dev(and about the same for eval).
It worth noting that the queries.* files are only for ease of joining sets they are not used in evaluation. For evaluation your system will be reranking passages for a query where there is an answer. Your system score is based on how highly your system is able to rank the relevant passages(qrels). Since there are a few times where the BM25 model did not return the passage marked as relevant(few but they happen) a system will never be able to achieve a perfect 1 for MRR. I will post the theoretical maximum MRR for this dataset shortly.
Hey,
No problem at all, thanks for your reply.
By "normalization error", do you mean that there are now duplicate passages in collection.tsv
(passages that have a different id but the same contents)?
And that these duplicate documents will be removed from collection.tsv
, so that the file will only contain unique passages, and that all qrels files will be updated accordingly?
Looking forward to the updated dataset!
No by normalization I mean that some chars were removed in collection.tsv that weren't removed elsewhere the ids are constant and the size is the same.
If you go ahead and check the updated qrels you will now find the full files. erasmus@spacemanidol:~/MSMARCOV2/Ranking/Baselines/DataDir$ wc -l qrels.* 59273 qrels.dev.tsv 59187 qrels.eval.tsv 532761 qrels.train.tsv 651221 total There may be an update in the future to the dataset but for now feel free to have at it!
Hi,
I have two questions about the train/dev/test split of the ranking dataset. I noted that:
queries.train
consists of 502,939 questions, of which all 502,939 have at least 1 answer inqrels.train
.queries.dev
consists of 12,665 questions, of which only 6,986 have at least 1 answer inqrels.dev
.queries.eval
consists of 12,560 questions.Now, my questions are:
Thanks in advance!