Open june6423 opened 3 months ago
Hi @june6423 ,
Our B3 experiment on Swin Transformer was implemented on the original training code of Swin-Transformer, so there's no b3 config in this repo.
Alternatively, if you want to implement B3 on this repo, the strategy is similar to deit_tiny, you can use the following config for KD (T=4):
aa: rand-m9-mstd0.5
batch_size: 128 # x 8 gpus = 1024bs
color_jitter: 0.4
decay_by_epoch: false
decay_epochs: 3
decay_rate: 0.967
# dropout
drop: 0.0
drop_path_rate: 0.2
epochs: 300
log_interval: 50
lr: 1.e-3
min_lr: 5.0e-06
model_ema: False
model_ema_decay: 0.999
momentum: 0.9
opt: adamw
opt_betas: null
opt_eps: 1.0e-08
clip_grad_norm: true
clip_grad_max_norm: 5.0
interpolation: 'bicubic'
# random erase
remode: pixel
reprob: 0.25
# mixup
mixup: 0.8
cutmix: 1.0
mixup_prob: 1.0
mixup_switch_prob: 0.5
mixup_mode: 'batch'
sched: cosine
seed: 42
warmup_epochs: 20
warmup_lr: 5.e-7
weight_decay: 0.04
workers: 16
# kd
kd: 'kd'
ori_loss_weight: 1.
kd_loss_weight: 1.
teacher_model: 'timm_swin_large_patch4_window7_224'
teacher_pretrained: True
Thanks a lot!
Now I want to reproduce results of other KD methods including RKD and CRD (I am working on your Table5 in DIST_KD paper, CIFAR 100)
But I failed to find training config and code for training from scratch and other KD methods.
I am working on image_classification_sota with d9662f7 version.
I am wondering if there is already published code to experiment with these settings, or if I should implement them myself.
Thanks for your effort.
Greetings!
I read your paper with great interest and am trying to reproduce some of your experiments.
I want to reproduce your vanilla KD setting using strategy B1, B2, B3 based on your DIST_KD paper.
I found B1 and B2 strategy on your strategies folder, but I couldn't find B3 setting.
configs/strategies/deit/deit_tiny.yaml
appears to be B3, but I'm not sure, which leaves me with a question.Could you give me B3 setting with vanilla KD with temperature 4?