Different image sizes in dataset

mareksubocz commented 11 months ago

Hi, Thank you for a great repo :D

is there a way to deal with differing image sizes in the dataset? This is the error I get:

  File "/Users/mareksubocz/it/keypoint-detection/keypoint_detection/data/coco_dataset.py", line 235, in collate_fn
    images = torch.stack(images)
RuntimeError: stack expects each tensor to be equal size, but got [3, 1080, 1920] at entry 0 and [3, 1122, 1998] at entry 7

The image's width and height are included in the COCO's json.

tlpss commented 11 months ago

Hi @mareksubocz

This is indeed not supported in the dataloader at the moment, and stacking these tensors will give the error you have encountered above. What you will need to do is make sure that all image tensors have the same shape before they are collated in the dataloader.

How to bring them to the same shape? there are many ways to achieve this. One would be to select the largest dimensions in the dataset and to pad all other images to this size, but this will make training more expensive if you have big differences. Another, more common, strategy is to resize all images to a specific size, e.g. 1920x1080. If you want to keep aspect ratio, you can resize and then pad where needed. This is the approach that YOLO takes for example.

How to implement this? You can achieve this in multiple ways. Albumentations has a number of transforms that you could use to achieve this: Resize while keeping aspect ratio and padding or directly resizing.

You can apply these transforms upfront on your dataset before training, or you can do it at runtime in the keypoint detector. The latter can be achieved by adding these augmentations to all dataloaders, similar to how it is now done for the training augmentations here. For the former, I think you might find a part of my lab's coco-tooling codebase useful, it allows you to run a number of albumentation transforms on a coco dataset and create updated images/annotations.

Let me know if you encounter issues! I'm rather busy the next two weeks, but afterwards I could even help implementing this, as it seems like a nice feature to have.

mareksubocz commented 11 months ago

I think I might try to do this on my own and submit a pr, we'll see. Thank you for the tips :D. I guess the keypoint values should also be updated to match the resized image, right?

All the best Marek

mareksubocz commented 11 months ago

Do you think it would be a good idea to apply resize in detectory.py in order to have it work for inference as well?

tlpss commented 11 months ago

I think I might try to do this on my own and submit a pr, we'll see .

Looking forward to it!

I guess the keypoint values should also be updated to match the resized image, right?

Yes, Albumentations can do this. But a slight modification is needed to accomodate for a possibly varying amount of keypoints in each channel. You should use this class instead of the normal Compose.

Do you think it would be a good idea to apply resize in detectory.py in order to have it work for inference as well?

That would be an option as well. You could add a 'preprocess' step to the forward() function. My gut feeling says to do this in the dataset instead of in the detector to keep predicitions and preprocessing separate, but this indeed implies you need to also do it manually at inference.

mareksubocz commented 11 months ago

I kept the preprocessing inside the datamodule, adding the resize option to all dataloaders, including validation and test ones. It's a very bare bones version of this feature, but worked with some basic tests I ran.

Feel free to change it up or even throw out altogether. Let me know what you think :)

tlpss / keypoint-detection

Different image sizes in dataset #36