Closed Philip-os closed 2 years ago
The config of reviewkd should be changed for different teacher-student pairs. For wrn_40_2 & wrn_16_2, the config should be like:
EXPERIMENT:
NAME: ""
TAG: "reviewkd,wrn40_2,wrn_16_2"
PROJECT: "cifar100_baselines"
DISTILLER:
TYPE: "REVIEWKD"
TEACHER: "wrn_40_2"
STUDENT: "wrn_16_2"
REVIEWKD:
REVIEWKD_WEIGHT: 5.0
SHAPES: [1, 8, 16, 32]
OUT_SHAPES: [1, 8, 16, 32]
IN_CHANNELS: [32, 64, 128, 128]
OUT_CHANNELS: [32, 64, 128, 128]
SOLVER:
BATCH_SIZE: 64
EPOCHS: 240
LR: 0.05
LR_DECAY_STAGES: [150, 180, 210]
LR_DECAY_RATE: 0.1
WEIGHT_DECAY: 0.0005
MOMENTUM: 0.9
TYPE: "SGD"
Thank you very much this solved it.
I tried running the code for the ReviewKD method with wrn_40_2 as the teacher and wrn_16_2 but I get the following.
Is the WRN architecture set up correctly? Does anyone else faced this issue?
OS: Ubuntu 20.04 GPU: GTX 1060 6GB