pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
83.95k stars 22.62k forks source link

Conv2d is not deterministic when input tensor has different strides #88147

Open zplizzi opened 2 years ago

zplizzi commented 2 years ago

🐛 Describe the bug

I would expect that if I pass two identical tensors through a Conv2d in deterministic mode, they would produce an identical output. However this is not the case if the tensors are identical in every way except their stride - in that case the output is different. A difference in stride doesn't cause the two tensors to not be torch.equal, which is especially confusing - two "equal" tensors can produce a different output.

import os
os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"
import torch
torch.use_deterministic_algorithms(True)
torch.backends.cudnn.benchmark = False

conv = torch.nn.Conv2d(3, 3, kernel_size=2).cuda()
in_a = torch.randn(4, 3, 64, 64).cuda()
out_a = conv(in_a)
in_b = torch.clone(in_a.permute(0, 2, 3, 1), memory_format=torch.contiguous_format).permute(0, 3, 1, 2)
out_b = conv(in_b)
print(in_a.stride())  # (12288, 4096, 64, 1)
print(in_b.stride())  # (12288, 1, 192, 3)
print(torch.equal(in_a, in_b)) # True
print(torch.equal(out_a, out_b))  # False

Ideally this should be fixed such that differences in stride don't affect the output of otherwise-deterministic operations, but at minimum the page on reproducibility should mention this.

Versions

Collecting environment information... PyTorch version: 1.13.0+cu116 Is debug build: False CUDA used to build PyTorch: 11.6 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: version 3.16.3 Libc version: glibc-2.31

Python version: 3.9.5 (default, Nov 23 2021, 15:27:38) [GCC 9.3.0] (64-bit runtime) Python platform: Linux-5.13.0-30-generic-x86_64-with-glibc2.31 Is CUDA available: True CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090 GPU 1: NVIDIA GeForce RTX 3090 GPU 2: NVIDIA GeForce RTX 3090

Nvidia driver version: 470.103.01 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] geotorch==0.2.0 [pip3] mypy==0.971 [pip3] mypy-boto3-ec2==1.17.41.0 [pip3] mypy-boto3-s3==1.17.41.0 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.22.2 [pip3] pytorch-lightning==1.7.3 [pip3] torch==1.13.0+cu116 [pip3] torchmetrics==0.7.0 [pip3] torchvision==0.13.0+cu113 [conda] Could not collect

ngimel commented 2 years ago

Don't use equal to compare floating point tensors. This is expected behavior, see https://pytorch.org/docs/stable/notes/numerical_accuracy.html. Channels-last and not channels last convolutions use different kernels and produce slightly different results (so allclose would work, but equal would not).

zplizzi commented 2 years ago

equal is appropriate for comparing results that you expect to be deterministic, is it not? This is the whole point of running code in deterministic mode - ensuring exactly identical results. My point in raising this issue is to note that Conv2d is not deterministic in a case that many people may expect it would be - and that this behavior could be documented (or fixed, if running in deterministic mode) to save others the confusion that I had when discovering this.

Also, perhaps equal could have a flag to enable checking if the memory format of the two tensors is identical also? Or at least a note in the docs that it doesn't compare memory formats. Because it is very confusing to get different results from two tensors that are equal.

And for what it's worth, the code example I gave above also fails allclose with default tolerances, the numerical difference is slightly larger than the default tolerance of allclose.

gchanan commented 2 years ago

I agree the docs could be clearer, we don't clearly define what input means:

Sets whether PyTorch operations must use “deterministic” algorithms. That is, algorithms which, given the same input, and when run on the same software and hardware, always produce the same output. When enabled, operations will use deterministic algorithms when available, and if only nondeterministic algorithms are available they will throw a RuntimeError when called.