huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.18k stars 2.67k forks source link

Cannot load the dataset conll2012_ontonotesv5 #4031

Closed cathyxl closed 2 years ago

cathyxl commented 2 years ago

Describe the bug

Cannot load the dataset conll2012_ontonotesv5

Steps to reproduce the bug

# Sample code to reproduce the bug
from datasets import load_dataset
dataset = load_dataset('conll2012_ontonotesv5', 'english_v4', split="test")
print(dataset)

Expected results

The datasets should be downloaded successfully

Actual results

raise NonMatchingChecksumError(error_msg + str(bad_urls)) datasets.utils.info_utils.NonMatchingChecksumError: Checksums didn't match for dataset source files: ['https://md-datasets-cache-zipfiles-prod.s3.eu-west-1.amazonaws.com/zmycy7t9h9-1.zip']

Environment info

albertvillanova commented 2 years ago

Hi @cathyxl, thanks for reporting.

Indeed, we have recently updated the loading script of that dataset (and fixed that bug as well):

That fix will be available in our next datasets library release. In the meantime, you can incorporate that fix by:

Feel free to re-open this issue if the problem persists.