pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.21k stars 6.95k forks source link

Fill arg and _apply_grid_transform improvements #6517

Open vfdev-5 opened 2 years ago

vfdev-5 commented 2 years ago

Few years ago we introduced non-const fill value handling in _apply_grid_transform using mask approach:

https://github.com/pytorch/vision/blob/0d69e35c4e951109dbaa8b42b0a8416d199aee0b/torchvision/transforms/functional_tensor.py#L550-L568

There are few minor problems with this approach:

1) if we pass fill = [0.0, ], we would expect to have a similar result as fill=None. This is not exactly true for bilinear interpolation mode where we do linear interpolation: https://github.com/pytorch/vision/blob/0d69e35c4e951109dbaa8b42b0a8416d199aee0b/torchvision/transforms/functional_tensor.py#L567-L568

Most probably, we would like to skip fill_img creation for all fill values that has sum(fill) == 0 as grid_sample pads with zeros.

- if fill is not None:
+ if fill is not None and sum(fill) > 0:

2) Linear fill_img and img interpolation may be replaced by directly applying a mask:

         mask = mask < 0.9999
         img[mask] = fill_img[mask] 

That would match better PIL Image behaviour.

https://github.com/pytorch/vision/blob/0d69e35c4e951109dbaa8b42b0a8416d199aee0b/torchvision/transforms/functional_tensor.py#L567-L568

image

cc @datumbox

pmeier commented 1 year ago

Since we have another report in #8083, do we want to tackle this? IMO, we should just align the two branches

https://github.com/pytorch/vision/blob/f69eee6108cd047ac8b62a2992244e9ab3c105e1/torchvision/transforms/v2/functional/_geometry.py#L588-L594

with something like

bool_mask = mask < 1
float_img[bool_mask] = fill_img.expand_as(float_img)[bool_mask] 

This removes the blending and in turn the "shadow" for bilinear interpolation. Plus, this is equivalent for nearest interpolation, since the mask in that case only contains 0.0 and 1.0 entries.

vfdev-5 commented 1 year ago

@pmeier the value 0.9999 for mask was sort of on purpose. In the description example affine rotation by 50 degrees with bilinear mode creates a rotated mask with unique values:

tensor([0.00000000, 0.02883029, 0.02883148, 0.10955429, 0.10955477, 0.11125469,
         0.11125565, 0.19197845, 0.19197917, 0.19367909, 0.19367981, 0.27440262,
         0.27440357, 0.35512805, 0.35512924, 0.35682678, 0.35682797, 0.43755341,
         0.43755519, 0.43925095, 0.43925512, 0.51997960, 0.51998138, 0.60240537,
         0.60240555, 0.68312985, 0.68313217, 0.68482971, 0.68482977, 0.76555562,
         0.76555634, 0.76725388, 0.76725554, 0.84798002, 0.84798050, 0.92870331,
         0.92870587, 0.93040466, 0.93040580, 0.99999994, 1.00000000]))

and 0.99999994 can appear inside the mask:

plt.imshow(((mask > 0.999) & (mask < 1.0))[0, 0, ...], interpolation="none")

image

so, using mask < 1 gives: image

amanikiruga commented 2 months ago

this hasn't been fixed