megvii-research / mdistiller

The official implementation of [CVPR2022] Decoupled Knowledge Distillation https://arxiv.org/abs/2203.08679 and [ICCV2023] DOT: A Distillation-Oriented Trainer https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_DOT_A_Distillation-Oriented_Trainer_ICCV_2023_paper.pdf
808 stars 123 forks source link

Train a single model on cifar 100 #20

Closed IfuChan closed 2 years ago

IfuChan commented 2 years ago

Hi, is there a way to train a single model like resnet32x4 on cifar 100 on this repo? Want to train the models from scratch without using the pretrained models.

Zzzzz1 commented 2 years ago

Please refer to this config:https://github.com/megvii-research/mdistiller/blob/master/configs/cifar100/vanilla.yaml (change the "student" to "resnet32x4")

IfuChan commented 2 years ago

Thanks for replying. I tried that, the issue is after training, when using the DKD strat to train the student, I get Missing keys and Unexpected keys error on train.py on load_state_dict. The model I trained has keys module.student.conv1.weights ... and the missing ones are conv1.weights etc.

Zzzzz1 commented 2 years ago

There would be two types of models saved(, student). The ones with 'student' in their name are suitable as pretrain models.

IfuChan commented 2 years ago

Worked, thank you. Also, wanted to ask why are the size of the models with 'student' in their names smaller than the ones without it? Since here we are using Vanilla, so no distillation is happening so I do not understand why it would be smaller? Screenshot 2022-07-24 144810

Zzzzz1 commented 2 years ago

We save all the parameters(of the student, the teacher, and the connector/contrastive memory..) for the checkpoints. Vanilla models are also saved with the teacher's parameters(useless), and we will consider to fixing that.

IfuChan commented 2 years ago

Understood, thank you very much.