Question about image shapes

harsanyidani commented 2 months ago

Hello! I have some questions abou image shapes:

I see that the images you get from the dataloder after the pipelines are of 1600x672 resolution. But the backbone is a ResNet101 pretrained on ImageNet, which I think accepts 224x224 images. If this is true, then the images are resized by the backbone, but the ground truths will be on the original scale. This leads to some confusions for me. For example: I see it in the code that center predictions in the fcos head are based on strides so the correspond to the 224x224 images. But the gt 2d centers are from the the 1600x672 resolution annotations. So they don't match. So how does this work? My intuition is that ResNet isn't actually 224x224 here but I couldn't find any evidence.
Multiple times the code makes it seem like that the images in the bacthes are not of the same shape (but that can't be the case right?): https://github.com/tjiiv-cprg/EPro-PnP-v2/blob/85215de8002dbd8523ee8eaaf1bae85b47179ebe/EPro-PnP-Det_v2/epropnp_det/models/dense_heads/deform_pnp_head.py#L796-L797 This part of the code I really don't understand because I think 'batch_input_shape' and 'img_shape' are always the same here, so this will be an all-zero mask: https://github.com/tjiiv-cprg/EPro-PnP-v2/blob/85215de8002dbd8523ee8eaaf1bae85b47179ebe/EPro-PnP-Det_v2/epropnp_det/models/dense_heads/deform_pnp_head.py#L383-L390

Thanks in advance for the help!

Lakonik commented 2 months ago

Hi! Thanks for your interest in our work.

CNNs are not restricted to certain resolutions, so the actual image size is 1600x672
For compatibility concerns, the images may be resized and zero-padded, so img_shapes, ori_shapes, batch_input_shape can be different (although for nuScenes they should be the same).

harsanyidani commented 2 months ago

Thanks for the quick reply @Lakonik !

CNNs are not restricted to certain resolutions, so the actual image size is 1600x672

Yes, for some reason I tought because of the pretrained version, that it is restricted. But now that I think about it I was wrong. But if this is the case then what happens to strides (for example 1600 % 128 != 0)?

For compatibility concerns, the images may be resized and zero-padded, so img_shapes, ori_shapes, batch_input_shape can be different (although for nuScenes they should be the same).

Understandable, thanks!

Lakonik commented 2 months ago

My understanding is that non-divisible shape is generally OK for CNN at deep layers, where spatial mismatch isn't that important.

harsanyidani commented 2 months ago

I see, seems logical. I was just concerned about stride's involvement in the algorithm and it being not always exact.

tjiiv-cprg / EPro-PnP-v2

Question about image shapes #9