sunshangquan / logit-standardization-KD

[CVPR 2024 Highlight] Logit Standardization in Knowledge Distillation
310 stars 13 forks source link

Question about kd_weight #13

Closed tamltlkdn closed 5 months ago

tamltlkdn commented 5 months ago

Hi authors, In DKD+logit_stand of this implementation, I observe that the alpha and beta are multiplied by kd_weight (=9). Why kd_weight is 9 not 1 (default)? Didn't you say in the paper that "We follow the same experimental settings as previous works"?

cfg.DKD.ALPHA = cfg.DKD.ALPHA * args.kd_weight
cfg.DKD.BETA = cfg.DKD.BETA * args.kd_weight
sunshangquan commented 5 months ago

Hi @tamltlkdn , We mention the default choice of KD_WEIGHT=9 in the supplementary materials (Section "2. Implementation Details"in page 2). The phrase of "the same experimental settings as previous works [5, 17, 50]" means the common settings including the teacher/student pairs, training epochs, learning rate, optimizer, etc. The references [5, 17, 50] denotes ReviewKD, MLKD, and DKD (they follow the same settings of training epochs, etc.), not saying that we specifically follow the KD_WEIGHT choices of DKD.