karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
37.55k stars 5.99k forks source link

FileNotFoundError: [Errno 2] No such file or directory: 'data/openwebtext/train.bin' #532

Open HarikrishnanK9 opened 5 months ago

HarikrishnanK9 commented 5 months ago

tokens per iteration will be: 491,520 Initializing a new model from scratch defaulting to vocab_size of GPT-2 to 50304 (50257 rounded up for efficiency) number of parameters: 123.59M num decayed parameter tensors: 50, with 124,354,560 parameters num non-decayed parameter tensors: 25, with 19,200 parameters using fused AdamW: True compiling the model... (takes a ~minute) Traceback (most recent call last): File "/home/paperspace/clinsight/backend/ner_re/test/Finetuning/Trash/nanoGPT/train.py", line 250, in X, Y = get_batch('train') # fetch the very first batch File "/home/paperspace/clinsight/backend/ner_re/test/Finetuning/Trash/nanoGPT/train.py", line 120, in get_batch data = np.memmap(os.path.join(data_dir, 'train.bin'), dtype=np.uint16, mode='r') File "/home/paperspace/anaconda3/envs/finetune_env/lib/python3.10/site-packages/numpy/core/memmap.py", line 229, in new f_ctx = open(os_fspath(filename), ('r' if mode == 'c' else mode)+'b') FileNotFoundError: [Errno 2] No such file or directory: 'data/openwebtext/train.bin'

kalgoritmi commented 4 months ago

Did you run this before training to download the data?

python data/openwebtext/prepare.py