Looking for help while debugging

hhhljf commented 1 year ago

Thanks for Ur novel work. I am trying to train the model on Moya dataset. I followed the command of traing prived by you. When the runing the python file of models.networks, I met a error warning as follows: Traceback (most recent call last): File "/dg_hpc/CNG/lijf/ASCON-main/train.py", line 72, in model.optimize_parameters() # calculate loss functions, get gradients, update network weights File "/dg_hpc/CNG/lijf/ASCON-main/models/ASCON_model.py", line 128, in optimize_parameters self.loss_D = self.compute_D_loss() File "/dg_hpc/CNG/lijf/ASCON-main/models/ASCON_model.py", line 160, in compute_D_loss self.loss_D = self.MAC_Net(self.real_B, self.fake_B.detach()) File "/dg_hpc/CNG/lijf/ASCON-main/models/ASCON_model.py", line 189, in MAC_Net feat_k_pool_1, sample_ids, sample_local_ids, sample_top_idxs = self.netProjection_target(patch_size,feat_k_1, self.num_patches,None,None,None,pixweght=None) File "/dg_workfs/CNG/lijf/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/dg_workfs/CNG/lijf/miniconda3/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward return self.module(*inputs[0], *kwargs[0]) File "/dg_workfs/CNG/lijf/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/dg_hpc/CNG/lijf/ASCON-main/models/networks.py", line 130, in forward N_patches=num_patches[feat_id] IndexError: list index out of range I have checked the value of num_patches. num_patches seems should be a array. However, I find num_patches is a predefined int value of 256. I wonder how to eliminate the error. Thanks!

hao1635 commented 1 year ago

Thanks for Ur novel work. I am trying to train the model on Moya dataset. I followed the command of traing prived by you. When the runing the python file of models.networks, I met a error warning as follows: Traceback (most recent call last): File "/dg_hpc/CNG/lijf/ASCON-main/train.py", line 72, in model.optimize_parameters() # calculate loss functions, get gradients, update network weights File "/dg_hpc/CNG/lijf/ASCON-main/models/ASCON_model.py", line 128, in optimize_parameters self.loss_D = self.compute_D_loss() File "/dg_hpc/CNG/lijf/ASCON-main/models/ASCON_model.py", line 160, in compute_D_loss self.loss_D = self.MAC_Net(self.real_B, self.fake_B.detach()) File "/dg_hpc/CNG/lijf/ASCON-main/models/ASCON_model.py", line 189, in MAC_Net feat_k_pool_1, sample_ids, sample_local_ids, sample_top_idxs = self.netProjection_target(patch_size,feat_k_1, self.num_patches,None,None,None,pixweght=None) File "/dg_workfs/CNG/lijf/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/dg_workfs/CNG/lijf/miniconda3/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward return self.module(*inputs[0], *kwargs[0]) File "/dg_workfs/CNG/lijf/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/dg_hpc/CNG/lijf/ASCON-main/models/networks.py", line 130, in forward N_patches=num_patches[feat_id] IndexError: list index out of range I have checked the value of num_patches. num_patches seems should be a array. However, I find num_patches is a predefined int value of 256. I wonder how to eliminate the error. Thanks!

I'm sorry to give an incomplete training instruction. Please try: python train.py --name ASCON(experiment_name) --model ASCON --netG ESAU --dataroot /data/zhchen/Mayo2016_2d(path to images) --nce_layers 1,4 --layer_weight 1,1 --num_patches 32,512 --k_size 3,7 --lr 0.0002 --gpu_ids 6,7 --print_freq 25 --batch_size 8 --lr_policy cosine

hhhljf commented 1 year ago

Thanks for ur reply. Now I met another problem, during training, I find self.fake_B and self.real_B seems always the same, I subtracted these two tensors element-wise and all the elements in the resulting tensor are the same, the RMSE (root mean squared error) is also zero. However, loss_D and loss_G decrease gradually. I wonder what is the problem. The training process is as follows:

(epoch: 1, iters: 525,loss_D: 1.751378, loss_G: 0.174635,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 524/2167 [08:33<24:31, 1.12it/s]

(epoch: 1, iters: 550,loss_D: 1.712759, loss_G: 0.171654,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 549/2167 [08:56<23:45, 1.13it/s]

(epoch: 1, iters: 575,loss_D: 1.753926, loss_G: 0.174744,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 574/2167 [09:20<24:23, 1.09it/s]

(epoch: 1, iters: 600,loss_D: 1.735380, loss_G: 0.173059,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 599/2167 [09:43<22:16, 1.17it/s]

(epoch: 1, iters: 625,loss_D: 1.736408, loss_G: 0.170298,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 624/2167 [10:05<21:03, 1.22it/s]

(epoch: 1, iters: 650,loss_D: 1.700382, loss_G: 0.181855,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 649/2167 [10:30<29:49, 1.18s/it]

(epoch: 1, iters: 675,loss_D: 1.726071, loss_G: 0.174785,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 674/2167 [10:52<21:37, 1.15it/s]

(epoch: 1, iters: 700,loss_D: 1.677032, loss_G: 0.169842,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 699/2167 [11:17<22:23, 1.09it/s]

(epoch: 1, iters: 725,loss_D: 1.752134, loss_G: 0.172651,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 724/2167 [11:41<25:46, 1.07s/it]

(epoch: 1, iters: 750,loss_D: 1.749419, loss_G: 0.169668,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 749/2167 [12:05<22:34, 1.05it/s]

(epoch: 1, iters: 775,loss_D: 1.707352, loss_G: 0.167989,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000)

hhhljf commented 1 year ago

I do not use the same training dataset as the same of yours. The training datasets is normalized to 0-1. I have recovered the data while computing SSIM rmse

hao1635 / ASCON

Looking for help while debugging #2