Open omglet1 opened 8 months ago
Hi @omglet1 ,
I think the values of gamma are up to the task and model you used in KD. It is acknowledged that CIFAR-10 is a very easy dataset, and all the students and teachers can achieve ~100% training accuracies. So I think there might not be significant gaps between the teacher and student noises. When the gamma equals to 1, it means that no addtional noise should be added.
hi @hunto I have been following your work for a long time and I am very excited that the code has been made public in the target classification task. but I found a problem: when I use resnet-34(teacher) to train resnet-18(student) with B1 baseline setting on CIFAR-10 dataset, the curves of average γ can't match your result. The curve is close to 1 and can not descend. These two images are the γ of the noisy adapter on the feature KD and logit KD respectively!