Open pelinsuciftcioglu opened 3 years ago
@pelinsuciftcioglu Thanks for raising.
Indeed the issue with ToTensor is it does too many things at the same time. We have discussed decoupling the conversion to tensor from rescaling but this would be a BC-breaking change. Your proposal is BC compatible but further increases the complexity of the method.
At the moment, it's not clear to us which direction we should follow but I will leave this issue open so factor in your input during future discussions.
cc @vfdev-5 @fmassa
For scaling purposes there is also https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.Normalize
EDIT: There is also (do not why undocumented transform) PILToTensor: https://github.com/pytorch/vision/blob/de96b977112e092510207950615103292e618b89/torchvision/transforms/transforms.py#L103-L120
which does not scale: https://github.com/pytorch/vision/blob/de96b977112e092510207950615103292e618b89/torchvision/transforms/functional.py#L143-L169
Edit 2: opened issue for the docs : https://github.com/pytorch/vision/issues/4225
🚀 Feature
The method converts a PIL Image or numpy.ndarray to a tensor with scaling pixel values to the range [0,1]. The new parameter will provide an option to the user whether the pixel values will be scaled or not.
Motivation
I am working on Variational Encoders on images which requires processing images. I use a categorical distribution for the images to predict every pixel value which requires each pixel value to stay in the range [0, 255]. I use datasets from torchvision.datasets. However, when I load the data, the type is as PIL Image, and the only transform that converts the image to a tensor is ToTensor() and it scales the pixel values. I think this is really impractical because I think there should be an opiton for the user so it can be a generalizable method for every use. Thus, I ended up taking the method definitions from library and change it instead of using the package for transforms itself.
Pitch
I would suggest adding a parameter "scale" to the transform ToTensor(scale=None) or ToTensor(scale=[0, 1]) which scales the input based on the appropriate input or othwersie does what it already does as default. I saw few methods before with this scaling option so I found it appropriate and the scaling in the original method only happens in line of code so it should be easy to make it optional. Or otherwise I do not understand the motivation to not put an option even though I understand a lot of time the input is normalized by the user in application.
Alternatives
Alternatively, it could be a boolean variable which gives the option to whether scale or not if the math could cause any unwanted results.
Additional context
NA
cc @vfdev-5