pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.04k stars 6.93k forks source link

Grayscale transform does not work correctly for 3 channels to 1. #8632

Closed manavkulshrestha closed 3 weeks ago

manavkulshrestha commented 3 weeks ago

šŸ› Describe the bug

Grayscale transform returns an image that is all zeros. In this case below, I would expect to get a 1 channel image gray that is exactly channel.

from torchvision import transforms as T
import torch

channel = torch.randint(2, (224, 224))
assert (channel == 1).any()
torch.stack((channel, channel, channel))
assert (image[0,:,:] == image[1,:,:]).all() and (image[1,:,:] == image[2,:,:]).all()

gray = T.Grayscale(num_output_channels=1)(image)
print((gray == 1).any())
tensor(False)

Versions

PyTorch version: 2.4.0 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A

OS: Amazon Linux release 2 (Karoo) (x86_64) GCC version: (GCC) 7.3.1 20180712 (Red Hat 7.3.1-17) Clang version: Could not collect CMake version: version 2.8.12.2 Libc version: glibc-2.26

Python version: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] (64-bit runtime) Python platform: Linux-5.10.223-190.873.amzn2int.x86_64-x86_64-with-glibc2.26 Is CUDA available: True CUDA runtime version: 12.1.66 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: Tesla V100-SXM2-16GB GPU 1: Tesla V100-SXM2-16GB GPU 2: Tesla V100-SXM2-16GB GPU 3: Tesla V100-SXM2-16GB GPU 4: Tesla V100-SXM2-16GB GPU 5: Tesla V100-SXM2-16GB GPU 6: Tesla V100-SXM2-16GB GPU 7: Tesla V100-SXM2-16GB

Nvidia driver version: 550.54.15 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 2 Core(s) per socket: 16 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz Stepping: 1 CPU MHz: 2700.134 CPU max MHz: 3000.0000 CPU min MHz: 1200.0000 BogoMIPS: 4600.01 Hypervisor vendor: Xen Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 46080K NUMA node0 CPU(s): 0-15,32-47 NUMA node1 CPU(s): 16-31,48-63 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor est ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt ida

Versions of relevant libraries: [pip3] numpy==2.0.1 [pip3] pytorch-lightning==2.4.0 [pip3] torch==2.4.0 [pip3] torchaudio==2.4.0 [pip3] torchmetrics==1.4.1 [pip3] torchvision==0.19.0 [pip3] triton==3.0.0 [conda] blas 1.0 mkl conda-forge [conda] libblas 3.9.0 16_linux64_mkl conda-forge [conda] libcblas 3.9.0 16_linux64_mkl conda-forge [conda] liblapack 3.9.0 16_linux64_mkl conda-forge [conda] libopenvino-pytorch-frontend 2024.3.0 he02047a_0 conda-forge [conda] mkl 2022.1.0 hc2b9512_224 [conda] numpy 2.0.1 py311hed25524_0 conda-forge [conda] pytorch 2.4.0 py3.11_cuda12.1_cudnn9.1.0_0 pytorch [conda] pytorch-cuda 12.1 ha16c6d3_5 pytorch [conda] pytorch-lightning 2.4.0 pypi_0 pypi [conda] pytorch-mutex 1.0 cuda pytorch [conda] torchaudio 2.4.0 py311_cu121 pytorch [conda] torchmetrics 1.4.1 pypi_0 pypi [conda] torchtriton 3.0.0 py311 pytorch [conda] torchvision 0.19.0 py311_cu121 pytorch

venkatram-dev commented 3 weeks ago

Few things to note.

  1. Channel should be float channel = torch.randint(2, (224, 224)).float()

  2. This is the formula for grayscale conversion.

(0.2989 * r + 0.587 * g + 0.114 * b)

https://github.com/pytorch/vision/blob/main/torchvision/transforms/_functional_tensor.py#L160

The max value for r , g, b in this example is 1. Using those max values, the maximum possible value after grayscale conversion is 0.9998999999999999

from torchvision import transforms as T
import torch

# Creating a channel with random values 0 or 1
channel = torch.randint(2, (224, 224)).float()  # Ensure it's floating point

image = torch.stack([channel, channel, channel], dim=0)
print('image.shape',image.shape)  # Should be [3, 224, 224]

# Assert that all channels are identical
assert (image[0,:,:] == image[1,:,:]).all() and (image[1,:,:] == image[2,:,:]).all(), "Channels are not identical."

# Apply the Grayscale transformation
gray = T.Grayscale(num_output_channels=1)(image)

print('gray min max',gray.min(), gray.max())  # Check the range of grayscale values
print('gray first few values',gray[0, 0, :10])  # Check the first few values to see their actual numbers

# Example using direct computation
r, g, b = 1.0, 1.0, 1.0
gray_manual = 0.2989 * r + 0.5870 * g + 0.1140 * b
print("Manual grayscale calculation:", gray_manual)
NicolasHug commented 3 weeks ago

Hi @manavkulshrestha , please refer here: https://pytorch.org/vision/main/transforms.html#dtype-and-expected-value-range

The expected range of the values of a tensor image is implicitly defined by the tensor dtype. Tensor images with a float dtype are expected to have values in [0, 1]. Tensor images with an integer dtype are expected to have values in [0, MAX_DTYPE] where MAX_DTYPE is the largest value that can be represented in that dtype. Typically, images of dtype torch.uint8 are expected to have values in [0, 255].