How do you get wikipedia-nq?

texttron / tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.

http://tevatron.ai

Apache License 2.0

435 stars 87 forks source link

How do you get wikipedia-nq? #90

Closed ShiyuNee closed 8 months ago

ShiyuNee commented 8 months ago

The count of samples in wikipedia-nq is 3000+ while the count in original nq dataset is nearly 8000.

I would like to know how the data is screened.

Thanks!

ShiyuNee commented 8 months ago

The count of samples in wikipedia-nq is 3000+ while the count in original nq dataset is nearly 8000.

I would like to know how the data is screened.

Thanks!

I got the answer in the paper of DPR.

MXueguang commented 8 months ago

sorry for not replying in time.

just to have a record: The training data for wikipedia-nq is converted from the original DPR repo. The difference from original NQ is due to:

ShiyuNee commented 8 months ago

sorry for not replying in time.

just to have a record: The training data for wikipedia-nq is converted from the original DPR repo. The difference from original NQ is due to:

Thanks.