Wrong behavior of modulated convolutions line 71-72

Hi, First, thank you for your work! I tried to train a network on my side and I feel like some modulated convolutions are not working as intended regarding the following code: temp_fea.append(F.upsample_bilinear(self.DyConv[0](x[feature_names[level + 1]], **conv_args), size=[feature.size(2), feature.size(3)])) (l.71-72 of dyhead.py) When running this line, the modulated conv receives an input which is four times smaller than the offset and mask (twice shorter on H and W dimensions). As there is no "assert" on the shape of the inputs, the code runs fine but what is being computed is not really what you expect: the offset and the mask are flattened and only the first quarter of the vector is being used. This leads to a huge shifting in the computation of the output of the modulated convolution. To "fix" the issue, I think that the upsample_bilinear() should be applied on x[featurenames[level + 1]] and not the output of the layer. Hope it helps.

microsoft / DynamicHead

Wrong behavior of modulated convolutions line 71-72 #25