Closed drennings closed 5 years ago
I'll get on this
There are 7555149 unique words There are 8841823 unique passages The average question length is 6.373235758460644 words with a range 1 to 75 words The average passage length is 56.25311069900404 words with a range 1 to 362 words Top 1000 dev contains 3895239 unique passages Top 1000 eval contains 3831719 unique passages
Hi,
Responding to the request of feedback on the documentation, I have a suggestion.
To me it would have been helpful if the size of each split of the dataset were included in the documentation as listed in issue #11. Additionally, it would be interesting to include other characteristics of the dataset such as the average question length, the average passage length, the amount of unique passages included in the top 1000 ranking by BM25 (assuming this is a subset of the 8 million passages in the whole dataset).
Thanks in advance!