texttron / tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.
http://tevatron.ai
Apache License 2.0
435 stars 87 forks source link

How do you get wikipedia-nq? #90

Closed ShiyuNee closed 8 months ago

ShiyuNee commented 8 months ago

The count of samples in wikipedia-nq is 3000+ while the count in original nq dataset is nearly 8000.

I would like to know how the data is screened.

Thanks!

ShiyuNee commented 8 months ago

The count of samples in wikipedia-nq is 3000+ while the count in original nq dataset is nearly 8000.

I would like to know how the data is screened.

Thanks!

I got the answer in the paper of DPR.

MXueguang commented 8 months ago

sorry for not replying in time.

just to have a record: The training data for wikipedia-nq is converted from the original DPR repo. The difference from original NQ is due to:

Screen Shot 2023-10-22 at 3 44 18 PM
ShiyuNee commented 8 months ago

sorry for not replying in time.

just to have a record: The training data for wikipedia-nq is converted from the original DPR repo. The difference from original NQ is due to: Screen Shot 2023-10-22 at 3 44 18 PM

Thanks.