zhengli97 / CTKD

[AAAI 2023] Official PyTorch Code for "Curriculum Temperature for Knowledge Distillation"
https://zhengli97.github.io/CTKD/
Apache License 2.0
156 stars 12 forks source link

run_cifar_distill.sh #2

Closed JinYu1998 closed 1 year ago

JinYu1998 commented 1 year ago

Why is there no code for CTKD in the run_cifar_distill.sh file? And the code about 14 lines of DKD seems to be wrong!

zhengli97 commented 1 year ago
  1. The CTKD method has been integrated into other distillation methods. When you set --have_mlp 1 and run the script run_cifar_distill.sh, it can automatically learn the temperature during training. You can obtain the vanilla distillation results without CTKD as long as you set --have_mlp 0.
  2. Sorry about the mistake. Try this : python train_student.py --path-t ./save/models/resnet56_vanilla/ckpt_epoch_240.pth --distill dkd --model_s resnet20 -r 1 -a 0 -b 1 --dkd_alpha 1 --dkd_beta 2
JinYu1998 commented 1 year ago
  1. The CTKD method has been integrated into other distillation methods. When you set --have_mlp 1 and run the script run_cifar_distill.sh, it can automatically learn the temperature during training. You can obtain the vanilla distillation results without CTKD as long as you set --have_mlp 0.
  2. Sorry about the mistake. Try this : python train_student.py --path-t ./save/models/resnet56_vanilla/ckpt_epoch_240.pth --distill dkd --model_s resnet20 -r 1 -a 0 -b 1 --dkd_alpha 1 --dkd_beta 2

OK, thanks for your replay. I still have some questions:

  1. if I want to try vanilla CTKD, do I only need to set --distill kd?
  2. The paper mentions using the Gradient Reversal Layer to learn a suitable temperature during training, so is it also applicable to other hyperparameters? For example, weights in loss(e.g. KL Div).

Thanks again for your reply!

zhengli97 commented 1 year ago
  1. Yes.
  2. You can follow a similar idea by training a learnable module to generate appropriate hyperparameters for the network. But using GRL is probably not a good way to do it, because suitable hyperparameters may not be learned by inversely increasing the distillation loss. Anyway, you can try it.
zhengli97 commented 1 year ago

Hi~ I will close this issue tomorrow if you have no more questions. Feel free to contact me via email: zhengli97 [at] mail.nankai.edu.cn

JinYu1998 commented 1 year ago

okk~~ thanks again for your reply. Have a great life~