pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.14k stars 6.94k forks source link

Parameter for transforms.ToTensor #4210

Open pelinsuciftcioglu opened 3 years ago

pelinsuciftcioglu commented 3 years ago

🚀 Feature

The method converts a PIL Image or numpy.ndarray to a tensor with scaling pixel values to the range [0,1]. The new parameter will provide an option to the user whether the pixel values will be scaled or not.

Motivation

I am working on Variational Encoders on images which requires processing images. I use a categorical distribution for the images to predict every pixel value which requires each pixel value to stay in the range [0, 255]. I use datasets from torchvision.datasets. However, when I load the data, the type is as PIL Image, and the only transform that converts the image to a tensor is ToTensor() and it scales the pixel values. I think this is really impractical because I think there should be an opiton for the user so it can be a generalizable method for every use. Thus, I ended up taking the method definitions from library and change it instead of using the package for transforms itself.

Pitch

I would suggest adding a parameter "scale" to the transform ToTensor(scale=None) or ToTensor(scale=[0, 1]) which scales the input based on the appropriate input or othwersie does what it already does as default. I saw few methods before with this scaling option so I found it appropriate and the scaling in the original method only happens in line of code so it should be easy to make it optional. Or otherwise I do not understand the motivation to not put an option even though I understand a lot of time the input is normalized by the user in application.

Alternatives

Alternatively, it could be a boolean variable which gives the option to whether scale or not if the math could cause any unwanted results.

Additional context

NA

cc @vfdev-5

datumbox commented 3 years ago

@pelinsuciftcioglu Thanks for raising.

Indeed the issue with ToTensor is it does too many things at the same time. We have discussed decoupling the conversion to tensor from rescaling but this would be a BC-breaking change. Your proposal is BC compatible but further increases the complexity of the method.

At the moment, it's not clear to us which direction we should follow but I will leave this issue open so factor in your input during future discussions.

cc @vfdev-5 @fmassa

vfdev-5 commented 3 years ago

For scaling purposes there is also https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.Normalize

EDIT: There is also (do not why undocumented transform) PILToTensor: https://github.com/pytorch/vision/blob/de96b977112e092510207950615103292e618b89/torchvision/transforms/transforms.py#L103-L120

which does not scale: https://github.com/pytorch/vision/blob/de96b977112e092510207950615103292e618b89/torchvision/transforms/functional.py#L143-L169

Edit 2: opened issue for the docs : https://github.com/pytorch/vision/issues/4225