Closed juharris closed 6 years ago
The docs say that the 2.1 data is in JSONL format but the training data I downloaded from http://www.msmarco.org/dataset.aspx is not:
$ wc -l train_v2.1.json 0 train_v2.1.json
Similarly:
i = 0 with open('train_v2.1.json') as f: for l in f: i += 1 print(i) # 1
Whoops. My oversight. You are correct. Files are json format but the tojsonl will convert files into jsonl.
I have updated the docs to reflect this better. thanks.
The docs say that the 2.1 data is in JSONL format but the training data I downloaded from http://www.msmarco.org/dataset.aspx is not:
Similarly: