microsoft / TREC-2019-Deep-Learning

Website for the TREC Deep Learning Track 2019
https://microsoft.github.io/TREC-2019-Deep-Learning/
Creative Commons Attribution 4.0 International
87 stars 28 forks source link

cannot download msmarco corpus tsv #13

Closed lumost closed 4 years ago

lumost commented 4 years ago

The download for the corpus always times out on the server side before completing.

https://msmarco.blob.core.windows.net/msmarcoranking/msmarco-docs.tsv.gz

curl https://msmarco.blob.core.windows.net/msmarcoranking/msmarco-docs.tsv.gz -O
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 82 8054M   82 6677M    0     0  1285k      0  1:46:57  1:28:39  0:18:18     0
curl: (56) OpenSSL SSL_read: Connection timed out, errno 110

is there an alternate download mirror? I've retried >6 times over 24 hours.

bmitra-msft commented 4 years ago

Unfortunately, that's the only place the data is hosted right now.

But I tested the download using AzCopy this morning and it worked successfully. Do you mind giving it a shot and let me know if that fixes the issue? https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-blobs

$ ./azcopy.exe copy https://msmarco.blob.core.windows.net/msmarcoranking/msmarco-docs.tsv.gz msmarco-docs.tsv.gz
INFO: Scanning...
INFO: Any empty folders will not be processed, because source and/or destination doesn't have full folder support

Job eb1190a9-1f97-7346-522b-5efbef47b102 has started

99.9 %, 0 Done, 0 Failed, 1 Pending, 0 Skipped, 1 Total,

Job eb1190a9-1f97-7346-522b-5efbef47b102 summary
Elapsed Time (Minutes): 22.0427
Number of File Transfers: 1
Number of Folder Property Transfers: 0
Total Number of Transfers: 1
Number of Transfers Completed: 1
Number of Transfers Failed: 0
Number of Transfers Skipped: 0
TotalBytesTransferred: 8446274598
Final Job Status: Completed