Open geezah opened 1 month ago
Thanks a lot for the super detailed feature request @geezah !
This sounds reasonable but before we move towards a PR, can you help me understand why you think padding is preferable to resizing the input image here?
Side note: using query_size(flat_inputs)
as suggested in the snippet above will enforce that all images in the input are of the same original shape. I don't think we can avoid such enforcement (at least not easily), but I just wanted to point that out in case that's not desirable for your own use-case.
Thanks for the feedback! The padding approach was suggested mainly for cases where preserving aspect ratios could be beneficial, such as:
You're right about the issue with the same-sized inputs. For handling variable input sizes, one could implement a custom collate_fn
that performs random resizing at batch creation time instead of during the transform pipeline. This would allow for more flexibility while maintaining batch efficiency.
Thank for coming back to me @geezah . This sounds good, please feel free to submit a PR! Let's go with the simple approach of using query_size
first, we can consider the collate_fn
approach later if needed.
Alright ๐ Thank you for coming back to it quickly!
๐ The feature
A new transform class, PadToSquare, that pads non-square images to make them square by adding padding to the shorter side. Configuration is inspired by
torchvision.transforms.v2.Pad
. Note that positional argumentsize
is dropped since we calculate the target size based on the non-square image we want to square pad. This feature would be beneficial in situations where square inputs are required for downstream models or processes, and it simplifies the pipeline by embedding this transformation within torchvision.transforms.v2.Case 1 (Width > Height):
Case 2: Height > Width:
Case 3: Height == Width: Nothing changes :-)
Image Sources: VOC2012
Motivation, pitch
Iโm working on a multi-label classification project that requires images to be square, but the input dataset has a variety of shapes and aspect ratios. PadSquare would streamline the preprocessing pipeline by automatically resizing these images to square while allowing flexible padding modes. This avoids distortions when resizing further and simplifies handling various image shapes. This feature request is based on the need to make square inputs straightforward and robust with consistent padding.
Alternatives
I have considered using existing padding methods within torchvision, but they require additional logic to conditionally apply padding only to the shorter side, making the code less modular, e.g. as demonstrated in this discussion. Current alternatives involve manually calculating padding and applying it to achieve square shapes. By having a dedicated PadSquare transform, it would streamline this common operation into a more reusable and convenient utility.
Additional context
The PadSquare class uses the _get_params method to calculate the necessary padding values, ensuring the padded image is centered. It also supports multiple padding modes and allows for a specified fill value when using 'constant' mode. It would enhance the versatility of torchvision.transforms.v2 by providing a reusable utility for data preprocessing. Let me know what you think of it! :-)
Initial Implementation
My initial implementation of
PadSquare
is inspired by the implementation of Pad.