karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
36.83k stars 5.83k forks source link

Issue with running prepare.py #50

Open torial opened 1 year ago

torial commented 1 year ago

I received the following error while running python prepare.py:

`Traceback (most recent call last): File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 1570, in _prepare_split_single for key, record in generator: File "C:\Users\fresh.cache\huggingface\modules\datasets_modules\datasets\openwebtext\85b3ae7051d2d72e7c5fdf6dfb462603aaa26e9ed506202bf3a24d261c6c40a1\openwebtext.py", line 85, in _generate_examples with open(filepath, encoding="utf-8") as f: File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\streaming.py", line 69, in wrapper return function(*args, use_auth_token=use_auth_token, *kwargs) File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\download\streaming_download_manager.py", line 445, in xopen return open(main_hop, mode, args, **kwargs) OSError: [Errno 22] Invalid argument: 'C:\Users\fresh\.cache\huggingface\datasets\downloads\extracted\f03a89c11b1133c3973ac7aed71b6be5c62feb33c5ec06cffb06511974f7194e\001 5896-b1054262f7da52a0518521e29c8e352c.txt'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "C:\Users\fresh\Downloads\nanoGPT\data\openwebtext\prepare.py", line 14, in dataset = load_dataset("openwebtext") File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\load.py", line 1757, in load_dataset builder_instance.download_and_prepare( File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 860, in download_and_prepare self._download_and_prepare( File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 1611, in _download_and_prepare super()._download_and_prepare( File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 953, in _download_and_prepare self._prepare_split(split_generator, **prepare_split_kwargs) File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 1449, in _prepare_split for job_id, done, content in self._prepare_split_single( File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 1606, in _prepare_split_single raise DatasetGenerationError("An error occurred while generating the dataset") from e datasets.builder.DatasetGenerationError: An error occurred while generating the dataset `

lowkick commented 1 year ago

I ran into some similar issues with prepare.py as well. Dunno if this is your case, I solved it by setting num_proc to 1 in line 11. Hope this helps.

torial commented 1 year ago

Unfortunately that didn't help, but I appreciate the suggestion.

PhillzMike commented 1 year ago

I also got this error when running on windows, I had to turn off windows defender and re download the dataset to get pass this