pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
15.99k stars 6.92k forks source link

ConvertImageDtype not converting properly from uint8 #8604

Closed manavkulshrestha closed 1 week ago

manavkulshrestha commented 3 weeks ago

ConvertImageDtype doesn't preserve the values of the image tensor from uint8 to long

import torch
import torchvision.transforms as T

img = torch.randint(2, size=(224, 224), dtype=torch.uint8)
print(T.ConvertImageDtype(torch.long)(img))

gives

tensor([[36028797018963968,                 0, 36028797018963968,
          ..., 36028797018963968, 36028797018963968,
         36028797018963968],
        [                0,                 0, 36028797018963968,
          ...,                 0,                 0,
                         0],
        [36028797018963968,                 0, 36028797018963968,
          ...,                 0, 36028797018963968,
         36028797018963968],
        ...,
        [                0,                 0,                 0,
          ...,                 0,                 0,
         36028797018963968],
        [36028797018963968, 36028797018963968, 36028797018963968,
          ...,                 0, 36028797018963968,
         36028797018963968],
        [                0,                 0,                 0,
          ...,                 0, 36028797018963968,
         36028797018963968]])

when I would expect an img tensor containing just 0s or 1s. Looks like the 1 is converted to 36028797018963968. Maybe some kind of underflow?

Versions

PyTorch version: 2.4.0 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A

OS: Amazon Linux release 2 (Karoo) (x86_64) GCC version: (GCC) 7.3.1 20180712 (Red Hat 7.3.1-17) Clang version: Could not collect CMake version: version 2.8.12.2 Libc version: glibc-2.26

Python version: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] (64-bit runtime) Python platform: Linux-5.10.223-190.873.amzn2int.x86_64-x86_64-with-glibc2.26 Is CUDA available: True CUDA runtime version: 12.1.66 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: Tesla V100-SXM2-16GB GPU 1: Tesla V100-SXM2-16GB GPU 2: Tesla V100-SXM2-16GB GPU 3: Tesla V100-SXM2-16GB GPU 4: Tesla V100-SXM2-16GB GPU 5: Tesla V100-SXM2-16GB GPU 6: Tesla V100-SXM2-16GB GPU 7: Tesla V100-SXM2-16GB

Nvidia driver version: 550.54.15 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 2 Core(s) per socket: 16 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz Stepping: 1 CPU MHz: 2700.134 CPU max MHz: 3000.0000 CPU min MHz: 1200.0000 BogoMIPS: 4600.01 Hypervisor vendor: Xen Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 46080K NUMA node0 CPU(s): 0-15,32-47 NUMA node1 CPU(s): 16-31,48-63 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor est ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt ida

Versions of relevant libraries: [pip3] numpy==2.0.1 [pip3] pytorch-lightning==2.4.0 [pip3] torch==2.4.0 [pip3] torchaudio==2.4.0 [pip3] torchmetrics==1.4.1 [pip3] torchvision==0.19.0 [pip3] triton==3.0.0 [conda] blas 1.0 mkl conda-forge [conda] libblas 3.9.0 16_linux64_mkl conda-forge [conda] libcblas 3.9.0 16_linux64_mkl conda-forge [conda] liblapack 3.9.0 16_linux64_mkl conda-forge [conda] libopenvino-pytorch-frontend 2024.3.0 he02047a_0 conda-forge [conda] mkl 2022.1.0 hc2b9512_224
[conda] numpy 2.0.1 py311hed25524_0 conda-forge [conda] pytorch 2.4.0 py3.11_cuda12.1_cudnn9.1.0_0 pytorch [conda] pytorch-cuda 12.1 ha16c6d3_5 pytorch [conda] pytorch-lightning 2.4.0 pypi_0 pypi [conda] pytorch-mutex 1.0 cuda pytorch [conda] torchaudio 2.4.0 py311_cu121 pytorch [conda] torchmetrics 1.4.1 pypi_0 pypi [conda] torchtriton 3.0.0 py311 pytorch [conda] torchvision 0.19.0 py311_cu121 pytorch

NicolasHug commented 2 weeks ago

Hi @manavkulshrestha ,

uint8 tensors are assumed to be in [0, 255]. When converting uint8 1 to torch.long the operation is basically 1 / 256 * (torch.iinfo(torch.long).max + 1) which corresponds to 36028797018963968.

manavkulshrestha commented 1 week ago

Ok, what is the intended way to accomplish what I need?

NicolasHug commented 1 week ago

img.to(torch.long) should do it.