Closed cookingbear closed 3 years ago
Sorry for the confusion. The 1*1 CNN should function on the student to prevent some trivial solutions (like the zero mapping). Our code is right. We will fix this in our paper in arXiv v3.
@lmxyy
So if a feature activation from the student has shape [8, 512, 512, 16]
and its equivalent in the teacher has [8, 512, 512, 32]
, the MSE calculation would be:
# Tensor dimmensions format [batch, height, width, channels]
student_channels = student_activations.shape[-1] # Return 16
mse_loss(student_activations,
professor_activations[:,:,:,:student_channels])
Is that correct?
In your paper(https://openaccess.thecvf.com/content_CVPR_2020/papers/Li_GAN_Compression_Efficient_Architectures_for_Interactive_Conditional_GANs_CVPR_2020_paper.pdf), the 1*1 cnn is calculated from the teacher to the student, but the code presented here is that it is calculated from the student to the techer. Does it have any differences?