jbwang1997 / CrossKD

CrossKD: Cross-Head Knowledge Distillation for Dense Object Detection
Other
143 stars 18 forks source link

Does this method require teacher head and student head to have the same number of input channels? #8

Open tomato18463 opened 1 year ago

tomato18463 commented 1 year ago

Hi,

Thanks for the paper and code. I get the idea of feeding the student's backbone features to the teacher's prediction head. My question is , does this require the student's backbone to have the same number of output channels as the teacher's (which seems rarely the case for networks with different size)? Also, how does the method perform if the student's and teacher's backbones have different number of output channels, and the number of channels have to be aligned by some way, e.g. adding a conv layer? Do you have any empirically results on this? Thank you for your help!

jbwang1997 commented 1 year ago

Sorry for my late reply. Our method attempts to reuse the blocks in teacher's detection head, which generally has the same channel number attributing to FPN. Adding a conv before reusing teacher's blocks is also working and has no performance drops in my early experiments.