irfanICMLL / structure_knowledge_distillation

The official code for the paper 'Structured Knowledge Distillation for Semantic Segmentation'. (CVPR 2019 ORAL) and extension to other tasks.
BSD 2-Clause "Simplified" License
708 stars 103 forks source link

the pixelwise loss never decreased during training #56

Open DHuiTnut opened 3 years ago

DHuiTnut commented 3 years ago

Hello! I only changed the batchsize from 8 to 4 becaouse the limit of my gpu memory. Then I came across NAN like issue 28. And I have tried git checkout d1ec858, but the pixelwise loss never decreased. image Can you please tell me how to deal with this problem? Thank you so much.

irfanICMLL commented 3 years ago

If you use a smaller batchsize, the performance will drop a lot.

Besides, we further discover a new kind of distillation loss, which is more useful. Please refer to this work: https://arxiv.org/abs/2011.13256. You can implement it with this project easily.

DHuiTnut commented 3 years ago

If you use a smaller batchsize, the performance will drop a lot.

Besides, we further discover a new kind of distillation loss, which is more useful. Please refer to this work: https://arxiv.org/abs/2006.01683. You can implement it with this project easily.

Thank you for your reply!

Do you actually mean this paper: Channel-wise Distillation for Semantic Segmentation? I'm not sure if you got it wrong.

irfanICMLL commented 3 years ago

https://arxiv.org/abs/2011.13256 Sorry. It should be this one.

DHuiTnut commented 3 years ago

https://arxiv.org/abs/2011.13256 Sorry. It should be this one.

Hello,that's an amazing work! In the training phase, how did you implement the CWD Module? Could you please release the train codes? Thank you very much!