hunto / DiffKD

Official implementation for paper "Knowledge Diffusion for Distillation", NeurIPS 2023
Apache License 2.0
76 stars 5 forks source link

Noisy_Adapter: the curves of average γ #7

Open omglet1 opened 8 months ago

omglet1 commented 8 months ago

hi @hunto I have been following your work for a long time and I am very excited that the code has been made public in the target classification task. but I found a problem: when I use resnet-34(teacher) to train resnet-18(student) with B1 baseline setting on CIFAR-10 dataset, the curves of average γ can't match your result. The curve is close to 1 and can not descend. feature logit These two images are the γ of the noisy adapter on the feature KD and logit KD respectively!

hunto commented 7 months ago

Hi @omglet1 ,

I think the values of gamma are up to the task and model you used in KD. It is acknowledged that CIFAR-10 is a very easy dataset, and all the students and teachers can achieve ~100% training accuracies. So I think there might not be significant gaps between the teacher and student noises. When the gamma equals to 1, it means that no addtional noise should be added.