Closed SparkJiao closed 2 years ago
Hi try, to provide more information please.
Example code in a colab to reproduce the error, details on what you are trying to do and what you were expected and details on your environment (OS, PyPi packages version).
Hi try, to provide more information please.
Example code in a colab to reproduce the error, details on what you are trying to do and what you were expected and details on your environment (OS, PyPi packages version).
I have update the description, sorry for the incomplete issue by mistake.
Hi, I have manually downloaded the compressed dataset `openwebtext.tar.xz' and use the following command to preprocess the examples:
>>> dataset = load_dataset('/home/admin/workspace/datasets/datasets-master/datasets-master/datasets/openwebtext', data_dir='/home/admin/workspace/datasets')
Using custom data configuration default
Downloading and preparing dataset openwebtext/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /home/admin/.cache/huggingface/datasets/openwebtext/default/0.0.0/5c636399c7155da97c982d0d70ecdce30fbca66a4eb4fc768ad91f8331edac02...
Dataset openwebtext downloaded and prepared to /home/admin/.cache/huggingface/datasets/openwebtext/default/0.0.0/5c636399c7155da97c982d0d70ecdce30fbca66a4eb4fc768ad91f8331edac02. Subsequent calls will reuse this data.
>>> len(dataset['train'])
74571
>>>
The size of the pre-processed example file is only 354MB, however the processed bookcorpus dataset is 4.6g. Are there any problems?
NonMatchingChecksumError: Checksums didn't match for dataset source files:
i got this issue when i try to work on my own datasets kindly tell me, from where i can get checksums of train and dev file in my github repo
Hi, I got the similar issue for xnli dataset while working on colab with python3.7.
nlp.load_dataset(path = 'xnli')
The above command resulted in following issue :
NonMatchingChecksumError: Checksums didn't match for dataset source files:
['https://www.nyu.edu/projects/bowman/xnli/XNLI-1.0.zip']
Any idea how to fix this ?
Did anyone figure out how to fix this error?
Fixed by:
Says fixed but I'm still getting it.
command:
dataset = load_dataset("ted_talks_iwslt", language_pair=("en", "es"), year="2014",download_mode="force_redownload")
got:
Using custom data configuration en_es_2014-35a2d3350a0f9823 Downloading and preparing dataset ted_talks_iwslt/en_es_2014 (download: 2.15 KiB, generated: Unknown size, post-processed: Unknown size, total: 2.15 KiB) to /home/ken/.cache/huggingface/datasets/ted_talks_iwslt/en_es_2014-35a2d3350a0f9823/1.1.0/43935b3fe470c753a023642e1f54b068c590847f9928bd3f2ec99f15702ad6a6... Downloading: 2.21k/? [00:00<00:00, 141kB/s]
NonMatchingChecksumError: Checksums didn't match for dataset source files: ['https://drive.google.com/u/0/uc?id=1Cz1Un9p8Xn9IpEMMrg2kXSDt0dnjxc4z&export=download']
Hi, I have encountered this problem during loading the openwebtext dataset:
I think this problem is caused because the released dataset has changed. Or I should download the dataset manually?
Sorry for release the unfinised issue by mistake.