megvii-research / mdistiller

The official implementation of [CVPR2022] Decoupled Knowledge Distillation https://arxiv.org/abs/2203.08679 and [ICCV2023] DOT: A Distillation-Oriented Trainer https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_DOT_A_Distillation-Oriented_Trainer_ICCV_2023_paper.pdf
808 stars 123 forks source link

how to understand nckd loss ? #27

Closed Tongfengyu closed 1 year ago

Tongfengyu commented 2 years ago

捕获 I think the nckd loss is kl loss between teacher and student prediction among nckd output probability, whose shape should be (n,c-1),like Algorithm 1, the Pseudo code of DKD in your paper. But, why you compute like this , is this equivalent? Could you give a further explanation, thanks!

Zzzzz1 commented 2 years ago

See issue #1 .