Closed danczs closed 2 years ago
Hi, this is just a predefined parameter to init the model, so we set it the same as the ImageNet setting (224) to load pretrained model wit the same size. In practice, the feature resolution is defined by the current input, as its size is dynamic. So we could ensure that the attention window is always H x sw or W x sw.
Thank for your reply. I have found the corresponding code.
Hello, this work is interesting but I have some questions about the 'patches_resolution' of the segmentation model. I notice that the long side of the cross-shaped windows is the 'patches_resolution' rather than the real feature resoulution. For example, in the stage-3, the long side is 224 / 16 = 14. Do I understand it correctly? Does that make it impossible to exchannge information outside the 'patches_resolution' ?