irfanICMLL / structure_knowledge_distillation

The official code for the paper 'Structured Knowledge Distillation for Semantic Segmentation'. (CVPR 2019 ORAL) and extension to other tasks.
BSD 2-Clause "Simplified" License
702 stars 104 forks source link

How to apply pair-wise distillation on depth prediction task #57

Open mzy97 opened 3 years ago

mzy97 commented 3 years ago

Thank you for sharing this great work! Q1: I wonder where to use pair-wise distillation loss, apply it at the end of the encoder (for example, 1/16*HW feature map of ResNet) or apply it at every scale of the encoder ( 1/16, 1/8, 1/4...)? Q2: Can pair-wise distillation work when Teacher's encoder and Student's encoder has different downsample rate, (eg. student downsample input 1/8, while teacher downsamples input 1/16), or decoder structure? Q3: Can this method used to distill from VNL to structure like FastDepth (different with VNL-student in the decoder), because VNL-student may have heavy decoder.

djmth commented 3 years ago

I‘m also confused about the distillation loss for the other two tasks, but especially about the pixel-wise loss. The pixel-wise loss in the paper is for the segmentation task and is KL divergence, which is obviously not suitable for the depth task. I really wonder how the pixel-wise loss is implemented, though the author explains this doesn't work for the depth task.