Hi Zhendong,
In ViTKD, we only distill the knowledge from unmasked area, while full area in MGD.
My questions are:
1) Why ViTKD only distill the knowledge only from unmasked area
2) What is the difference and relationship between unmasked and masked area in distillation.
Hi Zhendong, In ViTKD, we only distill the knowledge from unmasked area, while full area in MGD. My questions are: 1) Why ViTKD only distill the knowledge only from unmasked area 2) What is the difference and relationship between unmasked and masked area in distillation.