Closed wentianli closed 6 years ago
Thank you for your code!
I don‘t know why the kl loss here is multiplied by 2. https://github.com/szagoruyko/attention-transfer/blob/master/utils.py#L60 Could you explain it?
I think it was in one the KD papers, can't find exactly where now.
Thank you for your code!
I don‘t know why the kl loss here is multiplied by 2. https://github.com/szagoruyko/attention-transfer/blob/master/utils.py#L60 Could you explain it?