szagoruyko / attention-transfer

Improving Convolutional Networks via Attention Transfer (ICLR 2017)
https://arxiv.org/abs/1612.03928
1.43k stars 274 forks source link

Question about KL_loss average #18

Closed Lan1991Xu closed 6 years ago

Lan1991Xu commented 6 years ago

Hi, Thanks for your sharing code. I have a question about the KL_loss implement. The pytorch KL_loss is caculate by average the batch size and dimension. But the original knowledge distill do not need average the loss in the dimension. So I assume there is some bug in it?