Open torial opened 1 year ago
I ran into some similar issues with prepare.py as well. Dunno if this is your case, I solved it by setting num_proc to 1 in line 11. Hope this helps.
Unfortunately that didn't help, but I appreciate the suggestion.
I also got this error when running on windows, I had to turn off windows defender and re download the dataset to get pass this
I received the following error while running
python prepare.py
:`Traceback (most recent call last): File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 1570, in _prepare_split_single for key, record in generator: File "C:\Users\fresh.cache\huggingface\modules\datasets_modules\datasets\openwebtext\85b3ae7051d2d72e7c5fdf6dfb462603aaa26e9ed506202bf3a24d261c6c40a1\openwebtext.py", line 85, in _generate_examples with open(filepath, encoding="utf-8") as f: File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\streaming.py", line 69, in wrapper return function(*args, use_auth_token=use_auth_token, *kwargs) File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\download\streaming_download_manager.py", line 445, in xopen return open(main_hop, mode, args, **kwargs) OSError: [Errno 22] Invalid argument: 'C:\Users\fresh\.cache\huggingface\datasets\downloads\extracted\f03a89c11b1133c3973ac7aed71b6be5c62feb33c5ec06cffb06511974f7194e\001 5896-b1054262f7da52a0518521e29c8e352c.txt'
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "C:\Users\fresh\Downloads\nanoGPT\data\openwebtext\prepare.py", line 14, in
dataset = load_dataset("openwebtext")
File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\load.py", line 1757, in load_dataset
builder_instance.download_and_prepare(
File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 860, in download_and_prepare
self._download_and_prepare(
File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 1611, in _download_and_prepare
super()._download_and_prepare(
File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 953, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 1449, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 1606, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset
`