Closed Mavrepis closed 2 years ago
The config for reviewkd of vgg13 & vgg8 could be like the following:
EXPERIMENT:
NAME: ""
TAG: "reviewkd,vgg13,vgg8"
PROJECT: "cifar100_baselines"
DISTILLER:
TYPE: "REVIEWKD"
TEACHER: "vgg13"
STUDENT: "vgg8"
REVIEWKD:
REVIEWKD_WEIGHT: 5.0
SHAPES: [1, 4, 4, 8, 16]
OUT_SHAPES: [1, 4, 4, 8, 16]
IN_CHANNELS: [128, 256, 512, 512, 512]
OUT_CHANNELS: [128, 256, 512, 512, 512]
SOLVER:
BATCH_SIZE: 64
EPOCHS: 240
LR: 0.05
LR_DECAY_STAGES: [150, 180, 210]
LR_DECAY_RATE: 0.1
WEIGHT_DECAY: 0.0005
MOMENTUM: 0.9
TYPE: "SGD"
We will consider uploading configs for all architectures in the future.
I made it work with this setup, is it wrong?
REVIEWKD:
IN_CHANNELS: [ 128, 256,512, 512, 512 ]
OUT_CHANNELS: [ 128, 256,512, 512, 512 ]
SHAPES: [ 1, 4, 4, 8, 16,32 ]
OUT_SHAPES: [ 1, 4, 4, 8, 16,32 ]
I made it work with this setup, is it wrong?
REVIEWKD: IN_CHANNELS: [ 128, 256,512, 512, 512 ] OUT_CHANNELS: [ 128, 256,512, 512, 512 ] SHAPES: [ 1, 4, 4, 8, 16,32 ] OUT_SHAPES: [ 1, 4, 4, 8, 16,32 ]
The shapes and channels are correct. However, I searched REVIEWKD_WEIGHT from 2.0 to 8.0 and didn't reproduce the performance(74.68%) published by the ReviewKD paper. And their official code does not include the setting for vgg13&vgg8. The best result is 74.10% with REVIEWKD_WEIGHT=8.0. Hope this will help you.
Yes this is all very helpful! I will try to rerun with your config and RKD_WEIGHT : 8.0 to check how it will perform!
For future reference (use with caution) I figured out that the in_channel,shapes are coming from the student_feats and out channels (also known as result in the code) come from the teacher_feats.
So for anyone else just use debugger to sea the shape of student_feats and tearcher_feats and extract the shapes from there!
@Zzzzz1 I tried running the code for the ReviewKD method with vgg_13 as the teacher and vgg_8 but I get the following.
RuntimeError: Given groups=1, weight of size [256, 256, 1, 1], expected input[64, 512, 1, 1] to have 256 channels, but got 512 channels instead
Is the VGG architecture set up correctly? Does anyone else faced this issue?
OS: Ubuntu 20.04 GPU: GTX 1060 6GB
Which are the Shapes to be used for it to run properly ? Can you upload them for all the different architectures?
Thank you in advance!