rwightman / efficientdet-pytorch

A PyTorch impl of EfficientDet faithful to the original Google impl w/ ported weights
Apache License 2.0
1.58k stars 293 forks source link

How to use custom dataset of non-squared images? #108

Closed sarmientoj24 closed 3 years ago

sarmientoj24 commented 4 years ago

I saw that config for image_size seems to be just for squares. Is it possible to have it non-square? Also, does it also come in factor of 2?

rwightman commented 3 years ago

@sarmientoj24 see the more_datasets branch, I have support for not square anchor layout sizes on that branch.. still has a dim%128==0 limitation due to the way I've handled the scaling across feature maps.

Keep in mind that resolution is just for the res at which you feed images to the model for train/eval. It works perfectly well with differing sizes so long as the image is scaled and palced in the square canvas with appropriate padding (as is done by default here). It is best to match the anchor layout to the average image dims for the dataset though, so if it's always 16x9 you should say pick a layout that's close to that (within / 128 constraint). Possible on the branch and subsequent release when I merge.

sarmientoj24 commented 3 years ago

I have support for not square anchor layout sizes on that branch.. still has a dim%128==0 limitation due to the way I've handled the scaling across feature maps.

How do I input this on the config? Does this mean that it could take on non-squared with any combination of 128, 256, 384, 512, 640, 768 (e.g. 768 x 640)?

Do I still need to pad it while training and for evaluation?

rwightman commented 3 years ago

@sarmientoj24 it's define in the model_cfg.py for the model you're using, you can create a new config or modify an existing. All of the image size defs have been changed to tuples so how you'd do it should be self evident from there... it's (H, W). You'd still want to letter box (pad) as it's usually best to maintain aspect, especially for validation. If your original images have the same aspect as the model image size then you could just scale the image to fill.

sarmientoj24 commented 3 years ago

You'd still want to letter box (pad) as it's usually best to maintain aspect, especially for validation. If your original images have the same aspect as the model image size then you could just scale the image to fill.

So does it mean it's fine to have non-squared (H, W) tuples but it's preferred to pad it for training to evaluation and make it squared?

rwightman commented 3 years ago

No, I meant that even if you get the aspect closer to your images with a non equal H/W, you're still unlikely to have it perfectly matched so might still need some letterboxing in the rectangular image to maintain aspect of your original images. I've tested the non-square H,W with COCO and it worked on pretrained weights and started off training fine. If you find a bug with it, reoppen the issue, but otherwise think everything should be good, this code is now on master.

Ekta246 commented 3 years ago

No, I meant that even if you get the aspect closer to your images with a non equal H/W, you're still unlikely to have it perfectly matched so might still need some letterboxing in the rectangular image to maintain aspect of your original images. I've tested the non-square H,W with COCO and it worked on pretrained weights and started off training fine. If you find a bug with it, reoppen the issue, but otherwise think everything should be good, this code is now on master.

Also, the transforms.py takes care of padding. For eg, my i/p image config is (600, 800). But for efficientdet-d0 in model_config the i/p is (512, 512). So the transform pads, resize an fill accordingly. I see there is no extra efforts to be taken if you are running the transforms.py

@sarmientoj24 did you try any alternative?

Ekta246 commented 3 years ago

No, I meant that even if you get the aspect closer to your images with a non equal H/W, you're still unlikely to have it perfectly matched so might still need some letterboxing in the rectangular image to maintain aspect of your original images. I've tested the non-square H,W with COCO and it worked on pretrained weights and started off training fine. If you find a bug with it, reoppen the issue, but otherwise think everything should be good, this code is now on master.

The input_

@sarmientoj24 see the more_datasets branch, I have support for not square anchor layout sizes on that branch.. still has a dim%128==0 limitation due to the way I've handled the scaling across feature maps.

Keep in mind that resolution is just for the res at which you feed images to the model for train/eval. It works perfectly well with differing sizes so long as the image is scaled and palced in the square canvas with appropriate padding (as is done by default here). It is best to match the anchor layout to the average image dims for the dataset though, so if it's always 16x9 you should say pick a layout that's close to that (within / 128 constraint). Possible on the branch and subsequent release when I merge.

@sarmientoj24 @MichaelMonashev

I hope you would clear my doubt But I have one doubt, the input image size resolution for say required by the efficientdet-d0 is (512,512). So, how can we change the existing model_config.py to any other square/rectangle dimensions? Also, if I use the custom dataset (600, 800) input resolution the ResizePad and the Random ResizePad transforms takes care of resizing the image and padding the image and scaling the groundtruth bounding box. So, do I need to just specify target['img_size' ] in the loader as my original input size or I need to change the input_size in the model_config.py. I was sceptical to change the model_config input size since I assume the efficientdet-d0 still needs (512, 512)image.

Or the efficientdet-d0 doesn't really cares about the input image size other than 512, 512 for D0 model?

If I change the target['img_size'] to 800*600 then I see a problem that even after I resize the image to 512,512 it still gives bounding box predictions greater than 512,512