jkwang93 / MCMG

MCMG_V1
MIT License
69 stars 25 forks source link

IsADirectoryError... #9

Closed SaraiQX closed 1 year ago

SaraiQX commented 1 year ago

Dear author, Thank you for this interesting work! During my implementation, the first issue is about "RuntimeError: CUDA out of memory", which I kinda solved by changing the "line 154: batch_size = 256 (originally 1024) " in the file "/MCMG-master/1_train_prior_Transformer.py. ".

Btw, the original setting was PyTorch=1.6 which limited the applicable GPU (3090 failed so 2080 ti was used). I will try a better GPU and see if it works 😄.

My second issue for implementing 1_train_prior_Transformer.py.

# 2022-12-29  Error occurred after a period of running... 
....
**************************************************                                                                        
Epoch   0   step 3000    loss:  0.40                                                                                      
**************************************************                                                                        
Epoch   0   step 3400    loss:  0.34                                                                                      

100%|█████████████████████████████████████████████████████████████████████████████████| 3456/3456 [16:06<00:00,  3.58it/s]
average epoch loss: 0.46137984852410024
100%|███████████████████████████████████████████████████████████████████████████████████| 384/384 [01:23<00:00,  4.63it/s]
now best_score: -0.34199808336173493
Traceback (most recent call last):
  File "autodl-nas/MCMG-master/1_train_prior_Transformer.py", line 170, in <module>
    train_prior(**arg_dict)
  File "autodl-nas/MCMG-master/1_train_prior_Transformer.py", line 43, in train_prior
    train_losses, val_losses = train(train_data, valid_data, Prior, optim, num_epochs,save_prior_path)
  File "autodl-nas/MCMG-master/1_train_prior_Transformer.py", line 108, in train
    torch.save(model.decodertf.state_dict(), save_prior_path)
  File "/root/miniconda3/envs/mcmg/lib/python3.8/site-packages/torch/serialization.py", line 361, in save
    with _open_file_like(f, 'wb') as opened_file:
  File "/root/miniconda3/envs/mcmg/lib/python3.8/site-packages/torch/serialization.py", line 229, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/root/miniconda3/envs/mcmg/lib/python3.8/site-packages/torch/serialization.py", line 210, in __init__
    super(_open_file, self).__init__(open(name, mode))
IsADirectoryError: [Errno 21] Is a directory: 'autodl-nas/MCMG-master/save_piror_model'
(mcmg) root@autodl-container-7250118952-3a8b7cdb:~#

I checked some blogs but found no applicable solution for my case. So could you pls kindly offer a clue? Thank you and happy new year~ Sincerely, Sarai

SaraiQX commented 1 year ago

Hi there, I think I've found the right direction. Thank you!