szagoruyko / attention-transfer

Improving Convolutional Networks via Attention Transfer (ICLR 2017)
https://arxiv.org/abs/1612.03928
1.44k stars 276 forks source link

Question on KL loss #7

Closed wentianli closed 6 years ago

wentianli commented 7 years ago

Thank you for your code!

I don‘t know why the kl loss here is multiplied by 2. https://github.com/szagoruyko/attention-transfer/blob/master/utils.py#L60 Could you explain it?

szagoruyko commented 7 years ago

I think it was in one the KD papers, can't find exactly where now.