question about the loss formulation of kl_div

microsoft / ProDA

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)

https://arxiv.org/abs/2101.10979

MIT License

286 stars 44 forks source link

question about the loss formulation of kl_div #20

Closed ferric123 closed 3 years ago

ferric123 commented 3 years ago

I am wondering whether there is any motivation behind the fact you excluded log() whiling computing kl_div loss in distillation stage? (in line 315 of adaptation_modelv2.py). Usually people use loss_kd = F.kl_div(student.log(), teacher), but here you are giving loss_kd = F.kl_div(student, teacher), right?

panzhang0212 commented 3 years ago

Sorry, the code in this line https://github.com/microsoft/ProDA/blob/9ba80c7dbbd23ba1a126e3f4003a72f27d121a1f/models/adaptation_modelv2.py#L309 should be changed to : student = F.log_softmax(target_out['out'], dim=1) It's a bug. Thank for point out!

jis3613 commented 3 years ago

Then, should teacher output also be changed to teacher = F.log_softmax(teacher_out['out'], dim=1) ? current code is teacher = F.softmax(teacher_out['out'], dim=1)

super233 commented 3 years ago

@jis3613 Just see the official document

jis3613 commented 3 years ago

Thanks!