PyTorch version: 2.5.0a0+gitc19005d
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 14.6.1 (arm64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.3.9.4)
CMake version: version 3.30.1
Libc version: N/A
Python version: 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:13:44) [Clang 16.0.6 ] (64-bit runtime)
Python platform: macOS-14.6.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
🐛 Describe the bug
This was discovered while annotating failures in https://github.com/pytorch/pytorch/pull/134184. Original test case is in
test/test_nn.py
.There's a diff between the results from MPS and CPU.
The algorithm for correctly computing the expected output in 1d can be seen in https://github.com/pytorch/pytorch/blob/7af38eb98bdceb8fc6f8635ed7dd664ef44e4b10/test/test_nn.py#L9330-L9336 and 2d in https://github.com/pytorch/pytorch/blob/7af38eb98bdceb8fc6f8635ed7dd664ef44e4b10/test/test_nn.py#L9438-L9447
See also https://github.com/pytorch/pytorch/issues/34808.
minimal repro:
Versions
PyTorch version: 2.5.0a0+gitc19005d Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A
OS: macOS 14.6.1 (arm64) GCC version: Could not collect Clang version: 15.0.0 (clang-1500.3.9.4) CMake version: version 3.30.1 Libc version: N/A
Python version: 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:13:44) [Clang 16.0.6 ] (64-bit runtime) Python platform: macOS-14.6.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Apple M3 Max
Versions of relevant libraries: [pip3] flake8==6.1.0 [pip3] flake8-bugbear==23.3.23 [pip3] flake8-comprehensions==3.15.0 [pip3] flake8-executable==2.1.3 [pip3] flake8-logging-format==0.9.0 [pip3] flake8-pyi==23.3.1 [pip3] flake8-simplify==0.19.3 [pip3] mypy==1.10.0 [pip3] mypy-extensions==1.0.0 [pip3] numpy==2.0.1 [pip3] optree==0.12.1 [pip3] torch==2.5.0a0+gitc19005d [pip3] torch-tb-profiler==0.4.3 [pip3] torchvision==0.20.0a0+0d80848 [pip3] triton==3.0.0 [conda] numpy 2.0.1 pypi_0 pypi [conda] optree 0.12.1 pypi_0 pypi [conda] torch 2.5.0a0+gitc19005d dev_0
[conda] torch-tb-profiler 0.4.3 pypi_0 pypi
[conda] torchfix 0.4.0 pypi_0 pypi
[conda] torchvision 0.20.0a0+0d80848 dev_0
[conda] triton 3.0.0 pypi_0 pypi
cc @kulinseth @albanD @malfet @DenisVieriu97 @jhavukainen