Closed AKaubay closed 2 months ago
It works fine in our environment. Is the download started by running the following commands?
DLINK=$(echo -n "aHR0cHM6Ly9jb252ZXJzYXRpb25odWIuYmxvYi5jb3JlLndpbmRvd3MubmV0L2JlaXQtc2hhcmUtcHVibGljL01pbmlMTE0vcHJvY2Vzc2VkX2RhdGEudGFyP3N2PTIwMjMtMDEtMDMmc3Q9MjAyNC0wNC0xMFQxMyUzQTExJTNBNDRaJnNlPTIwNTAtMDQtMTFUMTMlM0ExMSUzQTAwWiZzcj1jJnNwPXImc2lnPTRjWEpJalZSWkhJQldxSGpQZ0RuJTJGMDFvY3pwRFdYaXBtUENVazNaOHZiUSUzRA==" | base64 --decode)
wget -O processed_data.tar $DLINK
https://github.com/microsoft/LMOps/blob/main/minillm/README.md
The links have been updated.
Unable to download processed RoBERTa Corpus only. Also encountering repeated interruptions during download of the full processed_data.tar with an error indicating dead links, possibly due to incomplete or corrupt file structure in the compressed archive. I also tried to download it from my personal computer with a 10 Mbps connection and still encountered the same problem.