pclucas14 / pixel-cnn-pp

Pytorch Implementation of OpenAI's PixelCNN++
Other
345 stars 76 forks source link

FileNotFoundError bug? #17

Closed NeroArcher closed 4 years ago

NeroArcher commented 4 years ago

After the training starts for a while, the training process stopped in the middle and shows the following:

Traceback (most recent call last): File "main.py", line 170, in torch.save(model.statedict(), 'models/{}{}.pth'.format(model_name, epoch)) File "/home/tangyeping/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 369, in save with _open_file_like(f, 'wb') as opened_file: File "/home/tangyeping/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 234, in _open_file_like return _open_file(name_or_buffer, mode) File "/home/tangyeping/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 215, in init super(_open_file, self).init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: 'models/pcnn_lr:0.00020_nr-resnet5_nr-filters160_9.pth'

I tried both cifar and mnist dataset and this problem keeps appearing . I am running the code in a remote server with 2 gpus.

pclucas14 commented 4 years ago

Is there a models/ directory in your home folder ? Try doing mkdir models and running again

NeroArcher commented 4 years ago

Thank you for the fast reply! Lol I didnt think of that simple solution. It looks that it can work for now but it still needs some time to reach to that original bug point. This is one of the first ML projects that I am trying to learn. In addition, I was also curious about how to apply the training in smaller version. Currently I am trying to do test training it but I do not have a lot gpus available to training for over a week. Again, thank you for the great and enlightening work!

pclucas14 commented 4 years ago

If you change line https://github.com/pclucas14/pixel-cnn-pp/blob/master/main.py#L157 with (epoch) instead of (epoch + 1) it will stop on the first epoch. If you want to train a smaller version, try smaller values for --nr_resnet and --nr_filters. You can probably use a bigger learning rate.

Have fun!