Open DHuiTnut opened 3 years ago
If you use a smaller batchsize, the performance will drop a lot.
Besides, we further discover a new kind of distillation loss, which is more useful. Please refer to this work: https://arxiv.org/abs/2011.13256. You can implement it with this project easily.
If you use a smaller batchsize, the performance will drop a lot.
Besides, we further discover a new kind of distillation loss, which is more useful. Please refer to this work: https://arxiv.org/abs/2006.01683. You can implement it with this project easily.
Thank you for your reply!
Do you actually mean this paper: Channel-wise Distillation for Semantic Segmentation? I'm not sure if you got it wrong.
https://arxiv.org/abs/2011.13256 Sorry. It should be this one.
https://arxiv.org/abs/2011.13256 Sorry. It should be this one.
Hello,that's an amazing work! In the training phase, how did you implement the CWD Module? Could you please release the train codes? Thank you very much!
Hello! I only changed the batchsize from 8 to 4 becaouse the limit of my gpu memory. Then I came across NAN like issue 28. And I have tried git checkout d1ec858, but the pixelwise loss never decreased. Can you please tell me how to deal with this problem? Thank you so much.