decode_jpeg doesn't give the same result on different machine

tchaton commented 1 year ago

🐛 Describe the bug

I am really confused. I am running this code on linux based machine and my M1 Mac. I am getting different results.

I verified the result from torch.frombuffer is the same on both machine. However, decode_jpeg isn't.

import numpy as np
import torch
from torchvision.io import decode_jpeg
from torchvision.transforms import ToTensor, PILToTensor
from time import time
from io import BytesIO
from PIL import Image
from lightning import seed_everything

seed_everything(42)

np_data = np.random.randint(255, size=(28, 28, 3), dtype=np.uint8)
img = Image.fromarray(np_data)

# from the JPEG image directly
path = "random_image.JPEG"
img.save(path, format="jpeg", quality=100)
img = Image.open(path)

t0 = time()
with open(path, "rb") as f:
    data = f.read()
array = torch.frombuffer(data, dtype=torch.uint8)
array_torvision = decode_jpeg(array)
print(time() - t0)

Versions

Collecting environment information...
PyTorch version: 2.0.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.5.2 (arm64)
GCC version: Could not collect
Clang version: 14.0.3 (clang-1403.0.22.14.1)
CMake version: Could not collect
Libc version: N/A

Python version: 3.9.15 (main, Dec  5 2022, 15:51:18)  [Clang 14.0.0 (clang-1400.0.29.202)] (64-bit runtime)
Python platform: macOS-13.5.2-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Max

Versions of relevant libraries:
[pip3] mypy==1.0.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.1
[pip3] pytorch-lightning==1.9.0
[pip3] torch==2.0.1
[pip3] torchdata==0.6.1
[pip3] torchmetrics==0.11.1
[pip3] torchvision==0.15.1
[conda] Could not collect

NicolasHug commented 1 year ago

How big are the differences? I would assume this comes from a different version of libjpeg (or libjpeg-turbo) being used by both machines. This is sort of expected: the jpeg specs are loose enough that two valid decoders will lead to small differences in decoded outputs.

Perhaps you can check what torchvision is linked against with ldd / otool -L?

tchaton commented 1 year ago

Hey @NicolasHug. Thanks for answering me :)

If this is expected, then all good ! I was just using the logic to assert the tensors would be the same before and after serialisation. This worked on Linux but not on Mac.

I will check it and come back to you.

NicolasHug commented 1 year ago

It reminds me that we've been observing similar failures on our CI for a while... https://github.com/pytorch/vision/actions/runs/6418750616/job/17429710345

What is confusing is that from the logs, both macos and linux jobs are linked against libjpeg-turbo 8.0... Perhaps the differences are on libjpeg-turbo's side?

pytorch / vision

decode_jpeg doesn't give the same result on different machine #8027

🐛 Describe the bug

Versions