texttron / tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.
http://tevatron.ai
Apache License 2.0
435 stars 87 forks source link

NQ's mined hard negatives file hn.json contains more queries (70076) than the original NQ train set (58880)? #113

Open x-zb opened 3 months ago

x-zb commented 3 months ago

Hi :),

For NQ, it seems in your self-mined hard negatives training set hn.json, there are 70076 queries. But in the original training set downloaded from DPR (biencoder-nq-train.json), there are only 58880 queries. Can I ask where these extra queries are from?

Thanks in advance.