irfanICMLL / structure_knowledge_distillation

The official code for the paper 'Structured Knowledge Distillation for Semantic Segmentation'. (CVPR 2019 ORAL) and extension to other tasks.
BSD 2-Clause "Simplified" License
702 stars 104 forks source link

NaN during training #21

Closed jankyee closed 4 years ago

jankyee commented 4 years ago

HI, I try to reproduce the training process, but get NaN D_loss after several training steps. I think that is because torch.sqrt in CriterionAdditionalGP rerurns a NaN. I refer to https://github.com/pytorch/pytorch/issues/2534 and add 1e-8 to the grad_norm,

image

but can't help. Any suggestions?

jankyee commented 4 years ago

I find that grad_l2_sum may explode after several steps image

so, adding 1e-8 can't help. But why exploding...

irfanICMLL commented 4 years ago

Have you modified the training code or the config of the project?

jankyee commented 4 years ago

Have you modified the training code or the config of the project?

I just modified the dataset and pretrain path.

Now, I try using adv-loss-type: hinge, and have trained 1300 steps, until now everything seems ok...

irfanICMLL commented 4 years ago

Ok, the hinge loss can also get similar performance.

jankyee commented 4 years ago

Ok, the hinge loss can also get similar performance.

emmm...the reproduced result seems worse. I sent you my training log, would you kindly help reviewing my settings? Or share your logs?

irfanICMLL commented 4 years ago

I have already sent the training log to you through email. What data are you using? Have you obtained a good teacher?

sungsooo commented 4 years ago

@Jankyee Hi, I got the same error! Did you resolve this problem? If so, could you help me? Thanks.

songyang86 commented 1 year ago

can you sent your training log to me? I also got worse result