Open datumbox opened 1 year ago
Trying to understand this more, is the problem that:
fill
has different defaults across APIs?fill=0
is different from fill=None
?If we were to have fill
with the same default across APIs, would we still have a problem?
Trying to understand this more, is the problem that:
-> answer is both
3rd alternative is: to try to put fill=0 in functional ops and make fill=0 behave as fill=None.
Checking the codebase history, I have an impression is that default fill 0 or None has its root in PIL API:
Fill argument exists for pad, affine, rotate and others affine-like functional ops.
1) For PIL input pad invokes ImageOps.expand with fill which can't have None value -> thus fill is 0 by default in torchvision for pad
2) affine invokes PIL Image.transform with fill which can take None or 0 and produces same result with 0 pixels. PR with affine in torchvision had fill=None by default
3) rotate invokes PIL Image.rotate with fill which can take None or 0 and produces same result with 0 pixels. rotate was with fill=0 for some time and then was fixed to None (https://github.com/pytorch/vision/pull/1760)
So, making output result for all the ops the same if fill=None and fill=0, would be a move forward. Then we can pick None or 0 as default value and make API uniform on that...
@vfdev-5 Thanks for the comprehensive reply. Do you think I should add the alternative as a 3rd option? Or perhaps I could just edit the "proposal" to suggest that we should be using 0
as defaults and just handle None
similarly? My understanding is that the 2 proposals are similar and it's a matter of which default value we choose to expose (I agree that 0
is probably better).
Thanks for the details @vfdev-5 .
passing None on fill for some transforms isn't the same as passing 0
I'm looking at https://github.com/pytorch/vision/issues/6517 but I can't pin-point exactly what the difference is. Would you have more details? Is it that the "filling" simply doesn't happen when fill=None
, or is it something more subtle than that?
Also instead of a BC breaking change, is deprecation an option here?
I'm looking at https://github.com/pytorch/vision/issues/6517 but I can't pin-point exactly what the difference is. Would you have more details? Is it that the "filling" simply doesn't happen when fill=None, or is it something more subtle than that?
Yeah, this issue description may be unclear. The problem arises when we call _apply_grid_transform
with mode="bilinear" + 1) fill=None and 2) fill=0. Two outputs wont match. I assume that outputs should match because for me fill=None and fill=0 is the same configuration. Also, by the way outputs are matching if mode="nearest" + similar configs for fill.
Why results are not matching for bilinear ?
Because when fill is None we do only grid sampling (which does padding_mode="zeros"). If fill=0 we create additional mask channel, concat it to the input image, then grid sample, extract transformed mask and apply it with fill values (which are 0 again) to the output image. The way how we blend mask with fill value and the image is probably incorrect for bilinear mode (see images in the issue, there is a black shadow around rotated image for current torchvision. This shadow is absent for PIL). So, the problem comes from blending zero fill values to already transformed image with zero fill.
Before fixing blending, we may ask why we bother doing all this mask fancy stuff for fill=0 if grid sample already does that for us. So, I would expect that fill=0 should be a no-op like fill=None in terms of computing mask and manually applying fill value.
Thus, point 1 in the issue is about making fill=0 option as like fill=None without computing any fill mask. Point 2 is about doing something with fill value mask blending for mode=bilinear.
Just had a chat with @vfdev-5 and @pmeier , I'll try to briefly summarize the situation on V1 transforms for my own understanding (possibly re-hashing what has already been said above). These are general statements that do not necessarily apply to all transforms in all cases (sometimes, only a subset of them):
fill
has inconsistent default values across transforms (None for AA, 0 for the rest). Functionals mostly have None as default except for Pad (0).fill
may have a different default value between the transforms and their corresponding functionals. E.g. Perspective (0 in transforms, None in functional).fill
value to a transform and to a functional may lead to different results (because some transforms convert None
to 0, which is an undocumented behaviour)fill=0
or fill=any_color
may result in different results between PIL and Tensors (passing fill=None
should result in consistent results). Whether those differences are within the range of acceptable differences or not is unclear (to me).All of these are issues we'll need to fix eventually. We don't know how yet.
As discussed in https://github.com/pytorch/vision/pull/7258, the V2 transform introduce a minor BC-breaking change by not doing this anymore:
some transforms convert
None
to 0, which is an undocumented behaviour
I.e. if users pass None to a transform, the functional will now get None as well in V2. This allows users who explicitly pass None
to get consistent results between PIL and Tensor backends. It's slightly BC because this may lead to different results (compared to V1) for users who a) were passing None
explicitly which was never documented and b) were using the Tensor backend. The PIL backend is unchanged because the pil functionals always do the None->0 conversion anyway.
- on the functionals, passing
fill=0
orfill=any_color
may result in different results between PIL and Tensors (passingfill=None
should result in consistent results). Whether those differences are within the range of acceptable differences or not is unclear (to me).
To narrow it down a little, this only happens for InterpolationMode.BILINEAR
. In general it applies to all interpolation modes that can create intermediate values, e.g. bicubic, but the affine functions are limited to nearest and bilinear interpolation:
grid_sample
does not support filling by any color and is equivalent to fill=0
. If the user requests any fill color, we apply the same affine transformation to a boolean mask and use that to apply the filling later. However for bilinear interpolation this creates values besides 0
and 1
and so we have to decide how we want to deal with this. Current implementation accepts these extra values and uses them to blend between the fill color and the original image. However this creates shadow artifacts.
🚀 The feature
As discussed at https://github.com/pytorch/vision/issues/6517, passing
None
onfill
for some transforms isn't the same as passing0
. The difference applies on some cases but not others, it was potentially done to avoid JIT issues (or is it a bug?) and has unforeseen effects on the behaviour of theTransforms
.Currently on the prototype area:
fill=None
as default value.fill=0
as default.fill=None
.The above inconsistent behaviour maintains BC with stable but leads to a weird API.
Motivation, pitch
Break BC on the
functional
level and makefill=None
behave similar tofill=0
.This unfortunately will have accuracy effects on models trained with the Tensor API. Worth noting that such a change would align closer with PIL which already sets
fill=None
to 0: https://github.com/pytorch/vision/blob/56e707bfccb62ada836d21e431d6db0d10dd73a1/torchvision/transforms/functional_pil.py#L264-L265This proposal could be adopted if we decide to break BC and align the
Tensor
andPIL
backends on other aspects (such asantialias
etc).Alternatives
There are 2 other potential alternatives:
Additional context
No response
cc @vfdev-5