haitongli / knowledge-distillation-pytorch

A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility
MIT License
1.84k stars 342 forks source link

About "reduction" built in KLDivLoss #48

Open junfish opened 2 years ago

junfish commented 2 years ago

The reason why your temperature is bigger than the original paper setting (said T = 2) may be caused by KLDivLoss. You may try to set reduction = "batchmean" in KLDivLoss. Just a guess. Welcome others to discuss.