How to apply pair-wise distillation on depth prediction task

irfanICMLL / structure_knowledge_distillation

The official code for the paper 'Structured Knowledge Distillation for Semantic Segmentation'. (CVPR 2019 ORAL) and extension to other tasks.

BSD 2-Clause "Simplified" License

707 stars 103 forks source link

Thank you for sharing this great work! Q1: I wonder where to use pair-wise distillation loss, apply it at the end of the encoder (for example, 1/16*HW feature map of ResNet) or apply it at every scale of the encoder ( 1/16, 1/8, 1/4...)? Q2: Can pair-wise distillation work when Teacher's encoder and Student's encoder has different downsample rate, (eg. student downsample input 1/8, while teacher downsamples input 1/16), or decoder structure? Q3: Can this method used to distill from VNL to structure like FastDepth (different with VNL-student in the decoder), because VNL-student may have heavy decoder.

irfanICMLL / structure_knowledge_distillation

How to apply pair-wise distillation on depth prediction task #57