spacemanidol / MSMARCO

Utilities, Baselines, Statistics and Descriptions Related to the MSMARCO DATASET
MIT License
190 stars 41 forks source link

Invalid line breaks in the top1000 TSV files of the reranking datasets #31

Closed ikuyamada closed 5 years ago

ikuyamada commented 5 years ago

Describe the bug

A lot of invalid line breaks are contained in the top1000 TSV files of the reranking datasets. For example, line 234472 in the top1000.dev.tsv does not start with the IDs.

To Reproduce

% sed -n 234471,234472p top1000.dev.tsv
1082445 3492590 what does unlock my device mean iOS: Understanding the SIM PIN.
 You can lock your SIM card so that it can't be used without a Personal Identification Number (PIN). You can use a SIM pin to prevent access to cellular data networks.In order to use cellular data, you must enter the PIN whenever you swap SIM cards or restart your iPhone or iPad (Wi-Fi + Cellular models).hen restoring the device, you will need to unlock the SIM card to complete the restore process. The device and iTunes display the following prompts to notify you: To complete the restore process: 1  Disconnect the device from your computer. 2  Tap Unlock on the device.
rodrigonogueira4 commented 5 years ago

+1 Same problem in the triples.train.small.tsv file:

sed -n '43427,43427p' triples.train.small.tsv When you're on a call or listening to voicemail on your iPhone, you might not be able to hear a person's voice clearly. Or you might hear crackling, static, or generally poor sound quality. Follow the steps below to resolve the issue.

spacemanidol commented 5 years ago

Working on in https://github.com/microsoft/MSMARCO-Passage-Ranking/issues/1

spacemanidol commented 5 years ago

Update this should be fixed later on today.

rodrigonogueira4 commented 5 years ago

I've downloaded the train triples small file but it seems that the problem persists: https://msmarco.blob.core.windows.net/msmarcoranking/triples.train.small.tar.gz

The md5 checksum is still the same from the old version.

36e27d06e66b85957eb774b5504723a6