Closed harsanyidani closed 2 months ago
Hi! Thanks for your interest in our work.
img_shapes
, ori_shapes
, batch_input_shape
can be different (although for nuScenes they should be the same).Thanks for the quick reply @Lakonik !
CNNs are not restricted to certain resolutions, so the actual image size is 1600x672
Yes, for some reason I tought because of the pretrained version, that it is restricted. But now that I think about it I was wrong. But if this is the case then what happens to strides (for example 1600 % 128 != 0)?
For compatibility concerns, the images may be resized and zero-padded, so img_shapes, ori_shapes, batch_input_shape can be different (although for nuScenes they should be the same).
Understandable, thanks!
My understanding is that non-divisible shape is generally OK for CNN at deep layers, where spatial mismatch isn't that important.
I see, seems logical. I was just concerned about stride's involvement in the algorithm and it being not always exact.
Hello! I have some questions abou image shapes:
I see that the images you get from the dataloder after the pipelines are of 1600x672 resolution. But the backbone is a ResNet101 pretrained on ImageNet, which I think accepts 224x224 images. If this is true, then the images are resized by the backbone, but the ground truths will be on the original scale. This leads to some confusions for me. For example: I see it in the code that center predictions in the fcos head are based on strides so the correspond to the 224x224 images. But the gt 2d centers are from the the 1600x672 resolution annotations. So they don't match. So how does this work? My intuition is that ResNet isn't actually 224x224 here but I couldn't find any evidence.
Multiple times the code makes it seem like that the images in the bacthes are not of the same shape (but that can't be the case right?): https://github.com/tjiiv-cprg/EPro-PnP-v2/blob/85215de8002dbd8523ee8eaaf1bae85b47179ebe/EPro-PnP-Det_v2/epropnp_det/models/dense_heads/deform_pnp_head.py#L796-L797 This part of the code I really don't understand because I think
'batch_input_shape'
and'img_shape'
are always the same here, so this will be an all-zero mask: https://github.com/tjiiv-cprg/EPro-PnP-v2/blob/85215de8002dbd8523ee8eaaf1bae85b47179ebe/EPro-PnP-Det_v2/epropnp_det/models/dense_heads/deform_pnp_head.py#L383-L390Thanks in advance for the help!