megvii-research / mdistiller

The official implementation of [CVPR2022] Decoupled Knowledge Distillation https://arxiv.org/abs/2203.08679 and [ICCV2023] DOT: A Distillation-Oriented Trainer https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_DOT_A_Distillation-Oriented_Trainer_ICCV_2023_paper.pdf
808 stars 123 forks source link

Cannot execute ReviewKD with wrn_40_2 & wrn_16_2 #22

Closed Philip-os closed 2 years ago

Philip-os commented 2 years ago

I tried running the code for the ReviewKD method with wrn_40_2 as the teacher and wrn_16_2 but I get the following.

RuntimeError: Given groups=1, weight of size [256, 256, 1, 1], expected input[64, 128, 1, 1] to have 256 channels, but got 128 channels instead

Is the WRN architecture set up correctly? Does anyone else faced this issue?

OS: Ubuntu 20.04 GPU: GTX 1060 6GB

Zzzzz1 commented 2 years ago

The config of reviewkd should be changed for different teacher-student pairs. For wrn_40_2 & wrn_16_2, the config should be like:

EXPERIMENT:
  NAME: ""
  TAG: "reviewkd,wrn40_2,wrn_16_2"
  PROJECT: "cifar100_baselines"
DISTILLER:
  TYPE: "REVIEWKD"
  TEACHER: "wrn_40_2"
  STUDENT: "wrn_16_2"
REVIEWKD:
  REVIEWKD_WEIGHT: 5.0
  SHAPES: [1, 8, 16, 32]
  OUT_SHAPES: [1, 8, 16, 32]
  IN_CHANNELS: [32, 64, 128, 128]
  OUT_CHANNELS: [32, 64, 128, 128]
SOLVER:
  BATCH_SIZE: 64
  EPOCHS: 240
  LR: 0.05
  LR_DECAY_STAGES: [150, 180, 210]
  LR_DECAY_RATE: 0.1
  WEIGHT_DECAY: 0.0005
  MOMENTUM: 0.9
  TYPE: "SGD"
Philip-os commented 2 years ago

Thank you very much this solved it.