lhoyer / DAFormer

[CVPR22] Official Implementation of DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation
Other
466 stars 92 forks source link

loss = Nan #22

Closed kaigelee closed 2 years ago

kaigelee commented 2 years ago

Thank you for your excellent work. when I run the code, I find such a problem, how can i fix it?

2022-05-07 17:02:57,696 - mmseg - INFO - Iter [50/40000] lr: 1.958e-06, eta: 1 day, 0:29:42, time: 2.207, data_time: 0.029, memory: 9801, decode.loss_seg: nan, decode.acc_seg: 32.2611, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: nan, mix.decode.acc_seg: 54.3771 2022-05-07 17:04:45,225 - mmseg - INFO - Iter [100/40000] lr: 3.950e-06, eta: 1 day, 0:08:59, time: 2.151, data_time: 0.017, memory: 9801, decode.loss_seg: nan, decode.acc_seg: 41.2013, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: nan, mix.decode.acc_seg: 69.9605 2022-05-07 17:06:32,717 - mmseg - INFO - Iter [150/40000] lr: 5.938e-06, eta: 1 day, 0:00:44, time: 2.150, data_time: 0.017, memory: 9801, decode.loss_seg: nan, decode.acc_seg: 41.6272, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: nan, mix.decode.acc_seg: 69.3706 2022-05-07 17:08:19,966 - mmseg - INFO - Iter [200/40000] lr: 7.920e-06, eta: 23:54:54, time: 2.145, data_time: 0.017, memory: 9801, decode.loss_seg: nan, decode.acc_seg: 42.4247, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: nan, mix.decode.acc_seg: 71.1068

lhoyer commented 2 years ago

I haven't experienced nan losses for decode.loss_seg before. Most probably, something is wrong with your setup. Please, double-check that you have followed the instructions in the README.md precisely. For debugging, you can also use the --debug command line flag to reduce the logging and image visualization interval (see https://github.com/lhoyer/DAFormer/blob/8d6e710700ff5e6a053c77bfe384ba44d4672cbe/run_experiments.py#L79). Please, also look at the class_mix_debug folder of this run for intermediate visualizations. Sometimes they help to understand what is going wrong.

kaigelee commented 2 years ago

I haven't experienced nan losses for decode.loss_seg before. Most probably, something is wrong with your setup. Please, double-check that you have followed the instructions in the README.md precisely. For debugging, you can also use the --debug command line flag to reduce the logging and image visualization interval (see

https://github.com/lhoyer/DAFormer/blob/8d6e710700ff5e6a053c77bfe384ba44d4672cbe/run_experiments.py#L79

). Please, also look at the class_mix_debug folder of this run for intermediate visualizations. Sometimes they help to understand what is going wrong.

Thank you very much. I fix it by updating my kornia.