Closed whale-z closed 1 year ago
The original queries are downloaded from here (dev small in queries.tar.gz in passage ranking dataset)
the generated typo queries and spell checker corrected queries are generated using scripts in (https://github.com/ielab/CharacterBERT-DR/tree/main/data)
Thank you very much for your reply. But as far as I know, the queries.dev.tsv file under queries.tar.gz contains 101093 data, how did you filter the 6980 data out of these data? Is it just a random sampling strategy?
Hi
Im sorry, the eval small queries should be in collectionandqueries.tar.gz. This is a sub-set of eval queries and it is used for the leaderboard eval and so in many research papers.
This is exactly what I want. Thank you very much for your help.
Hi, I was wondering how you got the 6980 data in the marco_dev folder from the MS MARCO dev set? best wishes!