Closed ferric123 closed 3 years ago
Sorry, the code in this line https://github.com/microsoft/ProDA/blob/9ba80c7dbbd23ba1a126e3f4003a72f27d121a1f/models/adaptation_modelv2.py#L309 should be changed to : student = F.log_softmax(target_out['out'], dim=1) It's a bug. Thank for point out!
Then, should teacher output also be changed to teacher = F.log_softmax(teacher_out['out'], dim=1) ? current code is teacher = F.softmax(teacher_out['out'], dim=1)
@jis3613 Just see the official document
Thanks!
I am wondering whether there is any motivation behind the fact you excluded log() whiling computing kl_div loss in distillation stage? (in line 315 of adaptation_modelv2.py). Usually people use loss_kd = F.kl_div(student.log(), teacher), but here you are giving loss_kd = F.kl_div(student, teacher), right?