Open mzy97 opened 3 years ago
I‘m also confused about the distillation loss for the other two tasks, but especially about the pixel-wise loss. The pixel-wise loss in the paper is for the segmentation task and is KL divergence, which is obviously not suitable for the depth task. I really wonder how the pixel-wise loss is implemented, though the author explains this doesn't work for the depth task.
Thank you for sharing this great work! Q1: I wonder where to use pair-wise distillation loss, apply it at the end of the encoder (for example, 1/16*HW feature map of ResNet) or apply it at every scale of the encoder ( 1/16, 1/8, 1/4...)? Q2: Can pair-wise distillation work when Teacher's encoder and Student's encoder has different downsample rate, (eg. student downsample input 1/8, while teacher downsamples input 1/16), or decoder structure? Q3: Can this method used to distill from VNL to structure like FastDepth (different with VNL-student in the decoder), because VNL-student may have heavy decoder.