skgyu / CMOS-GAN

Code for paper "TIP2023 - CMOS-GAN: Semi-supervised Generative Adversarial Model for Cross-Modality Face Image Synthesis"
Apache License 2.0
6 stars 2 forks source link

when running no_recognition script, there's an error #2

Open PENGthu opened 1 year ago

PENGthu commented 1 year ago

Hi there, Thanks a lot for your work! When running the script bash script/S2P_CUFS_CUFSF/S2P_CUFS_CUFSF_no_recognition.sh,There's a problem below:

bash script/S2P_CUFS_CUFSF/S2P_CUFS_CUFSF_no_recognition.sh
0,1
####################
cuda:0
####################
[34]
serial_probility=0.25

[34]
serial_probility=0.25

[34]
serial_probility=0.25

9560
[31, 32, 33]
[34]
modelCrossModality_S2P
initialize_S2P
#########train from scratch#############
lr=0.0002
lr=0.0001
lr=0.0002
model [modelCrossModality] was created
lr=0.0002
lr=0.0001
lr=1e-05
lr=0.0002
enumerate 0
loss_G_X2Y_target 3.002969741821289
loss_L1_X2Y_source 61.25656509399414
loss_ffl 33.86601257324219
classify_loss_forG idx0 correct num= 0/32 
loss_cls_trainG_fake_Y_source 11.94918155670166
classify_loss_forG idx1 correct num= 0/32 
loss_cls_trainG_fake_Y_target 11.681120872497559
/home/usr/anaconda3/lib/python3.10/site-packages/torch/autograd/__init__.py:200: UserWarning: Error detected in ReluBackward0. Traceback of forward call that caused the error:
  File "/home/usr/anaconda3/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "/home/usr/anaconda3/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/home/usr/anaconda3/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
    output = module(*input, **kwargs)
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/networks/generators.py", line 27, in forward
    out = self.dec(x=hidden)
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/networks/generators.py", line 78, in forward
    x= self.model(x)
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/networks/blocks.py", line 22, in forward
    return self.model(x)
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/networks/blocks.py", line 35, in forward
    out = self.model(x)
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/networks/blocks.py", line 101, in forward
    x = self.activation(x)
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/nn/modules/activation.py", line 103, in forward
    return F.relu(input, inplace=self.inplace)
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/nn/functional.py", line 1457, in relu
    result = torch.relu(input)
 (Triggered internally at /opt/conda/conda-bld/pytorch_1682343967769/work/torch/csrc/autograd/python_anomaly_mode.cpp:114.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/train_crossmodality.py", line 279, in <module>
    model.optimize_step()
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/models/modelCrossModalitys/modelCrossModality_main.py", line 304, in optimize_step
    self.train_G()
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/models/modelCrossModalitys/modelCrossModality_main.py", line 224, in train_G
    loss_G.backward()
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 256, 56, 56]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Then:

0,1
####################
cuda:0
####################
[34]
serial_probility=0.25

[34]
serial_probility=0.25

[34]
serial_probility=0.25

9560
[31, 32, 33]
[34]
modelCrossModality_S2P
initialize_S2P
./checkpoints/CrossModal/main_S2P_CUFS_CUFSF_step1/40_net_encoder_X_source.pth
encoder_X_source load network error, path does not exists
Traceback (most recent call last):
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/train_crossmodality.py", line 211, in <module>
    model = create_model(opt)
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/models/models.py", line 8, in create_model
    model.initialize(opt)
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/models/modelCrossModality.py", line 69, in initialize
    getattr(self,'initialize_'+step)(opt)
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/models/modelCrossModalitys/modelCrossModality_main.py", line 70, in initialize_main
    self.load_network(self.encoder_X_source, 'encoder_X_source', which_epoch)
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/models/base_model.py", line 93, in load_network
    raise(RuntimeError('load network error'))
RuntimeError: load network error
0,1
####################
cuda:0
####################
[34]
serial_probility=0.25

[34]
serial_probility=0.25

[34]
serial_probility=0.25

9560
[31, 32, 33]
[34]
modelCrossModality_S2P
initialize_S2P
Traceback (most recent call last):
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/train_crossmodality.py", line 211, in <module>
    model = create_model(opt)
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/models/models.py", line 8, in create_model
    model.initialize(opt)
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/models/modelCrossModality.py", line 69, in initialize
    getattr(self,'initialize_'+step)(opt)
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/models/modelCrossModalitys/modelCrossModality_step2.py", line 375, in initialize_main
    self.load_feature_extraction_model(opt=opt)
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/models/base_model.py", line 277, in load_feature_extraction_model
    state_dict=  torch.load(  opt.recog_state_dict.loc) 
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/serialization.py", line 791, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/serialization.py", line 271, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/serialization.py", line 252, in __init__
    super().__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'checkpoints/CrossModal/main_S2P_CUFS_CUFSF_step1/50_net_feature_extraction_model.pth'
0,1
####################
cuda:0
####################
[34]
serial_probility=0.25

[34]
serial_probility=0.25

[34]
serial_probility=0.25

9560
[31, 32, 33]
[34]
modelCrossModality_S2P
initialize_S2P
Traceback (most recent call last):
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/train_crossmodality.py", line 211, in <module>
    model = create_model(opt)
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/models/models.py", line 8, in create_model
    model.initialize(opt)
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/models/modelCrossModality.py", line 69, in initialize
    getattr(self,'initialize_'+step)(opt)
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/models/modelCrossModalitys/modelCrossModality_step2.py", line 375, in initialize_main
    self.load_feature_extraction_model(opt=opt)
  File "/home/usr/project1/src/CMOS-GAN/CMOS-GAN_code_refactor/train/models/base_model.py", line 277, in load_feature_extraction_model
    state_dict=  torch.load(  opt.recog_state_dict.loc) 
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/serialization.py", line 791, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/serialization.py", line 271, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/home/usr/anaconda3/lib/python3.10/site-packages/torch/serialization.py", line 252, in __init__
    super().__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'checkpoints/CrossModal/main_S2P_CUFS_CUFSF_step1/50_net_feature_extraction_model.pth'

I have downloaded the dataset: 'CMOS-GAN/dataset/Viewed/AUG_3_9_AR' 'CMOS-GAN/dataset/Viewed/AUG_3_9_CUFSF' 'CMOS-GAN/dataset/Viewed/AUG_3_9_CUHK' 'CMOS-GAN/dataset/Viewed/AUG_3_9_XM2VTS' Is there any problem? Thanks!

skgyu commented 1 year ago

I just ran the code that was released and the same script, and did not encounter this issue.

I believe the first error is most likely due to the PyTorch version. The second error you mentioned is caused by the failure of the first command execution. If you open the file 'script/S2P_CUFS_CUFSF/S2P_CUFS_CUFSF_no_recognition.sh,' you will find that this script is designed to execute four Python commands sequentially. The subsequent Python commands rely on the results saved from the previous ones. Therefore, if any of the preceding Python commands encounter an error, it will lead to errors in the subsequent ones as well.

I used the following command to check my PyTorch version:

python
Python 3.7.9 (default, Aug 31 2020, 12:42:55)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.8.2'

The first error you encountered (one of the variables needed for gradient computation has been modified by an inplace operation) is likely due to certain in-place operations that are not working effectively on your PyTorch version. Please provide your PyTorch version. I think that with a not-too-old version of PyTorch, there should be no this problem. On your PyTorch version, I suspect the error might be related to the forward() function of class Conv2dBlock in CMOS-GAN_code_refactor/networks/blocks.py.

Perhaps you can modify it like this:

    def forward(self, x):
        x = self.conv(self.pad(x))
        if self.norm:
            x = self.norm(x)
        if self.activation:
            out = self.activation(x)
        return out

Below are the results of running the script 'bash script/S2P_CUFS_CUFSF/S2P_CUFS_CUFSF_no_recognition.sh' in my environment.

bash script/S2P_CUFS_CUFSF/S2P_CUFS_CUFSF_no_recognition.sh /data/skyu/W002/CMOS-GAN_code_refactor/options/base_options.py:48: YAMLLoadWarning: calling yaml.load() without Loader= ... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. config =yaml.load(stream) 1,3 #################### cuda:0 #################### /data/skyu/W002/CMOS-GAN_code_refactor/stools/sutil.py:215: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. config =yaml.load(stream) [34] serial_probility=0.25

[34] serial_probility=0.25

[34] serial_probility=0.25

9560 [31, 32, 33] [34] modelCrossModality_S2P initialize_S2P #########train from scratch############# lr=0.0002 lr=0.0001 lr=0.0002 model [modelCrossModality] was created lr=0.0002 lr=0.0001 lr=1e-05 lr=0.0002 enumerate 0 loss_G_X2Y_target 3.0017037391662598 loss_L1_X2Y_source 60.93376159667969 loss_ffl 34.575008392333984 classify_loss_forG idx0 correct num= 0/32 loss_cls_trainG_fake_Y_source 12.338781356811523 classify_loss_forG idx1 correct num= 0/32 loss_cls_trainG_fake_Y_target 11.850801467895508 loss_D_Y= 3.006690263748169 classify_loss_forC idx0 correct num= 0/32 classify_loss_forC idx2 correct num= 0/32 classify_loss_forC idx3 correct num= 0/32 loss_cls_fake_Y_source=10.72421932220459 loss_cls_data_Y_source=13.480998039245605 loss_cls_fake_Y_target=11.661787986755371 enumerate 1 loss_G_X2Y_target 2.727379083633423 loss_L1_X2Y_source 48.93858337402344 loss_ffl 29.83761978149414 classify_loss_forG idx0 correct num= 0/32 loss_cls_trainG_fake_Y_source 14.859338760375977 classify_loss_forG idx1 correct num= 0/32 loss_cls_trainG_fake_Y_target 15.499336242675781 loss_D_Y= 2.6908960342407227 classify_loss_forC idx0 correct num= 1/32 classify_loss_forC idx2 correct num= 0/32 classify_loss_forC idx3 correct num= 0/32 loss_cls_fake_Y_source=12.222427368164062 loss_cls_data_Y_source=13.191264152526855 loss_cls_fake_Y_target=10.940000534057617 enumerate 2 loss_G_X2Y_target 2.381397247314453 loss_L1_X2Y_source 45.20701599121094 loss_ffl 28.456880569458008 classify_loss_forG idx0 correct num= 0/32 loss_cls_trainG_fake_Y_source 14.414983749389648 classify_loss_forG idx1 correct num= 0/32 loss_cls_trainG_fake_Y_target 13.29682445526123

... It can be seen that the program has started running normally.