megvii-research / mdistiller

The official implementation of [CVPR2022] Decoupled Knowledge Distillation https://arxiv.org/abs/2203.08679 and [ICCV2023] DOT: A Distillation-Oriented Trainer https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_DOT_A_Distillation-Oriented_Trainer_ICCV_2023_paper.pdf
808 stars 123 forks source link

ReviewKD with VGG architecture #23

Closed Mavrepis closed 2 years ago

Mavrepis commented 2 years ago

@Zzzzz1 I tried running the code for the ReviewKD method with vgg_13 as the teacher and vgg_8 but I get the following.

RuntimeError: Given groups=1, weight of size [256, 256, 1, 1], expected input[64, 512, 1, 1] to have 256 channels, but got 512 channels instead

Is the VGG architecture set up correctly? Does anyone else faced this issue?

OS: Ubuntu 20.04 GPU: GTX 1060 6GB

Which are the Shapes to be used for it to run properly ? Can you upload them for all the different architectures?

Thank you in advance!

Zzzzz1 commented 2 years ago

The config for reviewkd of vgg13 & vgg8 could be like the following:

EXPERIMENT:
  NAME: ""
  TAG: "reviewkd,vgg13,vgg8"
  PROJECT: "cifar100_baselines"
DISTILLER:
  TYPE: "REVIEWKD"
  TEACHER: "vgg13"
  STUDENT: "vgg8"
REVIEWKD:
  REVIEWKD_WEIGHT: 5.0
  SHAPES: [1, 4, 4, 8, 16]
  OUT_SHAPES: [1, 4, 4, 8, 16]
  IN_CHANNELS: [128, 256, 512, 512, 512]
  OUT_CHANNELS: [128, 256, 512, 512, 512]
SOLVER:
  BATCH_SIZE: 64
  EPOCHS: 240
  LR: 0.05
  LR_DECAY_STAGES: [150, 180, 210]
  LR_DECAY_RATE: 0.1
  WEIGHT_DECAY: 0.0005
  MOMENTUM: 0.9
  TYPE: "SGD"

We will consider uploading configs for all architectures in the future.

Mavrepis commented 2 years ago

I made it work with this setup, is it wrong?

REVIEWKD:
  IN_CHANNELS: [ 128, 256,512, 512, 512 ]
  OUT_CHANNELS: [ 128, 256,512, 512, 512 ]
  SHAPES: [ 1,  4, 4, 8, 16,32 ]
  OUT_SHAPES: [ 1, 4,  4, 8, 16,32 ]
Zzzzz1 commented 2 years ago

I made it work with this setup, is it wrong?

REVIEWKD:
  IN_CHANNELS: [ 128, 256,512, 512, 512 ]
  OUT_CHANNELS: [ 128, 256,512, 512, 512 ]
  SHAPES: [ 1,  4, 4, 8, 16,32 ]
  OUT_SHAPES: [ 1, 4,  4, 8, 16,32 ]

The shapes and channels are correct. However, I searched REVIEWKD_WEIGHT from 2.0 to 8.0 and didn't reproduce the performance(74.68%) published by the ReviewKD paper. And their official code does not include the setting for vgg13&vgg8. The best result is 74.10% with REVIEWKD_WEIGHT=8.0. Hope this will help you.

Mavrepis commented 2 years ago

Yes this is all very helpful! I will try to rerun with your config and RKD_WEIGHT : 8.0 to check how it will perform!

For future reference (use with caution) I figured out that the in_channel,shapes are coming from the student_feats and out channels (also known as result in the code) come from the teacher_feats.

So for anyone else just use debugger to sea the shape of student_feats and tearcher_feats and extract the shapes from there!