wilburOne / cosmosqa

61 stars 18 forks source link

dataset number doesn't match paper #5

Open yeliu918 opened 4 years ago

yeliu918 commented 4 years ago

Hi, Thanks for publishing the dataset and great work!

I have 25262|2985|6963(train|valid|test), 35.2K dataset in total. That is different from paper 25588|3000|7000, 35.6K. I don't know whether I use the wrong dataset.

theblackcat102 commented 4 years ago

Same here, I tried loading the same datasets from huggingface nlp package and got the same total of datasets as above

VatsalRaina commented 2 years ago

I am attempting to use this datatset and am facing the same mismatch between the data distributed and the numbers quoted in the paper. Also, the paper quotes unanswerability rates of 5.9% and 8.7% for the training and development sets respectively but the currently distributed data has unanswerability rates of 12.1% and 14.9% respectively.