Dear author,
Thank you for this interesting work!
During my implementation, the first issue is about "RuntimeError: CUDA out of memory", which I kinda solved by changing the "line 154: batch_size = 256 (originally 1024) " in the file "/MCMG-master/1_train_prior_Transformer.py. ".
Btw, the original setting was PyTorch=1.6 which limited the applicable GPU (3090 failed so 2080 ti was used). I will try a better GPU and see if it works 😄.
My second issue for implementing 1_train_prior_Transformer.py.
# 2022-12-29 Error occurred after a period of running...
....
**************************************************
Epoch 0 step 3000 loss: 0.40
**************************************************
Epoch 0 step 3400 loss: 0.34
100%|█████████████████████████████████████████████████████████████████████████████████| 3456/3456 [16:06<00:00, 3.58it/s]
average epoch loss: 0.46137984852410024
100%|███████████████████████████████████████████████████████████████████████████████████| 384/384 [01:23<00:00, 4.63it/s]
now best_score: -0.34199808336173493
Traceback (most recent call last):
File "autodl-nas/MCMG-master/1_train_prior_Transformer.py", line 170, in <module>
train_prior(**arg_dict)
File "autodl-nas/MCMG-master/1_train_prior_Transformer.py", line 43, in train_prior
train_losses, val_losses = train(train_data, valid_data, Prior, optim, num_epochs,save_prior_path)
File "autodl-nas/MCMG-master/1_train_prior_Transformer.py", line 108, in train
torch.save(model.decodertf.state_dict(), save_prior_path)
File "/root/miniconda3/envs/mcmg/lib/python3.8/site-packages/torch/serialization.py", line 361, in save
with _open_file_like(f, 'wb') as opened_file:
File "/root/miniconda3/envs/mcmg/lib/python3.8/site-packages/torch/serialization.py", line 229, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/root/miniconda3/envs/mcmg/lib/python3.8/site-packages/torch/serialization.py", line 210, in __init__
super(_open_file, self).__init__(open(name, mode))
IsADirectoryError: [Errno 21] Is a directory: 'autodl-nas/MCMG-master/save_piror_model'
(mcmg) root@autodl-container-7250118952-3a8b7cdb:~#
I checked some blogs but found no applicable solution for my case. So could you pls kindly offer a clue? Thank you and happy new year~
Sincerely,
Sarai
Dear author, Thank you for this interesting work! During my implementation, the first issue is about "RuntimeError: CUDA out of memory", which I kinda solved by changing the "line 154: batch_size = 256 (originally 1024) " in the file "/MCMG-master/1_train_prior_Transformer.py. ".
Btw, the original setting was PyTorch=1.6 which limited the applicable GPU (3090 failed so 2080 ti was used). I will try a better GPU and see if it works 😄.
My second issue for implementing 1_train_prior_Transformer.py.
I checked some blogs but found no applicable solution for my case. So could you pls kindly offer a clue? Thank you and happy new year~ Sincerely, Sarai