Open tomato18463 opened 1 year ago
Sorry for my late reply. Our method attempts to reuse the blocks in teacher's detection head, which generally has the same channel number attributing to FPN. Adding a conv before reusing teacher's blocks is also working and has no performance drops in my early experiments.
Hi,
Thanks for the paper and code. I get the idea of feeding the student's backbone features to the teacher's prediction head. My question is , does this require the student's backbone to have the same number of output channels as the teacher's (which seems rarely the case for networks with different size)? Also, how does the method perform if the student's and teacher's backbones have different number of output channels, and the number of channels have to be aligned by some way, e.g. adding a conv layer? Do you have any empirically results on this? Thank you for your help!