cnn directions about calc_distill_loss

mit-han-lab / gan-compression

[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANs

Other

1.1k stars 148 forks source link

cnn directions about calc_distill_loss #57

Closed cookingbear closed 3 years ago

cookingbear commented 3 years ago

In your paper(https://openaccess.thecvf.com/content_CVPR_2020/papers/Li_GAN_Compression_Efficient_Architectures_for_Interactive_Conditional_GANs_CVPR_2020_paper.pdf), the 1*1 cnn is calculated from the teacher to the student, but the code presented here is that it is calculated from the student to the techer. Does it have any differences?

lmxyy commented 3 years ago

Sorry for the confusion. The 1*1 CNN should function on the student to prevent some trivial solutions (like the zero mapping). Our code is right. We will fix this in our paper in arXiv v3.

GustavoStahl commented 3 years ago

@lmxyy So if a feature activation from the student has shape [8, 512, 512, 16] and its equivalent in the teacher has [8, 512, 512, 32], the MSE calculation would be:

# Tensor dimmensions format [batch, height, width, channels]
student_channels = student_activations.shape[-1] # Return 16
mse_loss(student_activations, 
         professor_activations[:,:,:,:student_channels])

Is that correct?