Closed NeroArcher closed 4 years ago
Is there a models/
directory in your home folder ? Try doing mkdir models
and running again
Thank you for the fast reply! Lol I didnt think of that simple solution. It looks that it can work for now but it still needs some time to reach to that original bug point. This is one of the first ML projects that I am trying to learn. In addition, I was also curious about how to apply the training in smaller version. Currently I am trying to do test training it but I do not have a lot gpus available to training for over a week. Again, thank you for the great and enlightening work!
If you change line https://github.com/pclucas14/pixel-cnn-pp/blob/master/main.py#L157 with (epoch)
instead of (epoch + 1)
it will stop on the first epoch.
If you want to train a smaller version, try smaller values for --nr_resnet
and --nr_filters
. You can probably use a bigger learning rate.
Have fun!
After the training starts for a while, the training process stopped in the middle and shows the following:
Traceback (most recent call last): File "main.py", line 170, in
torch.save(model.statedict(), 'models/{}{}.pth'.format(model_name, epoch))
File "/home/tangyeping/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 369, in save
with _open_file_like(f, 'wb') as opened_file:
File "/home/tangyeping/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 234, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/tangyeping/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 215, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'models/pcnn_lr:0.00020_nr-resnet5_nr-filters160_9.pth'
I tried both cifar and mnist dataset and this problem keeps appearing . I am running the code in a remote server with 2 gpus.