Closed QingyaoAi closed 5 years ago
Yes, it’s a good idea. Some people may not be interested to passage contents and just want to train based on current top retrieval results.
Sorry for the delay on this but I finally got you. https://msmarco.blob.core.windows.net/msmarcoranking/qidpidtriples.train.full.tar.gz
Thank you for creating such a great dataset for passage re-ranking!
I'm wondering if it is possible to release the top 1000 passages retrieved for each training/dev/test query with the corresponding QID and PID? The current training data is constructed with the raw text of queries and passages, which are too huge to use. Also, since the qrels files are actually constructed with QID and PID, it would make life much easier if the train/dev/test data are also constructed with QID and PID.