Using the MPS backend, it is possible to sample elements outside of the domain when using multinomial. See below for code snippet:
import torch
import torch.distributions
device = torch.device("mps")
# 10 dimensional distribution, expected max output is 9
violating_dist = torch.tensor([4.3330236804e-04, 1.6706718498e-07, 5.6105983504e-07, 2.5240040486e-05,
5.4649823142e-05, 5.5108112283e-03, 9.9348586798e-01, 4.5977579077e-08,
4.8896443332e-04, 3.4132514770e-07], device=device)
sample = torch.multinomial(violating_dist, 100000000, True)
# >> 11, outside domain!
print(torch.max(sample))
# This distribution is the one above with default printing precision
almost_similar_non_violating_dist = torch.tensor([4.3330e-04, 1.6707e-07, 5.6106e-07, 2.5240e-05, 5.4650e-05, 5.5108e-03,
9.9349e-01, 4.5978e-08, 4.8896e-04, 3.4133e-07], device=device)
sample = torch.multinomial(almost_similar_non_violating_dist, 100000000, True)
# >> 9
print(torch.max(sample))
# Violating distribution but on cpu
sample = torch.multinomial(violating_dist.cpu(), 100000000, True)
# >> 9
print(torch.max(sample))
So for some reason, on MPS this particular probability tensor sometimes samples an 11 even though there are only 10 elements it can sample from (and hence the maximum should be 9). Furthermore, it doesn't happen with the same tensor when defined with lower precision, nor does it happen with the CPU backend.
Versions
Collecting environment information...
PyTorch version: 2.4.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 14.7 (arm64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.3.9.4)
CMake version: Could not collect
Libc version: N/A
Python version: 3.9.6 (default, Feb 3 2024, 15:58:27) [Clang 15.0.0 (clang-1500.3.9.4)] (64-bit runtime)
Python platform: macOS-14.7-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Apple M3 Max
Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==2.0.2
[pip3] storchastic==0.3.7
[pip3] torch==2.4.1
[pip3] torchvision==0.19.1
[conda] No relevant packages
cc @kulinseth @albanD @malfet @DenisVieriu97 @jhavukainen
Hi @HEmile, thanks for reporting this issue, I'm able to repro it with the latest nightly PyTorch. I'll post additional updates as soon as I get a chance to debug further
🐛 Describe the bug
Using the MPS backend, it is possible to sample elements outside of the domain when using
multinomial
. See below for code snippet:So for some reason, on MPS this particular probability tensor sometimes samples an 11 even though there are only 10 elements it can sample from (and hence the maximum should be 9). Furthermore, it doesn't happen with the same tensor when defined with lower precision, nor does it happen with the CPU backend.
Versions
Collecting environment information... PyTorch version: 2.4.1 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A
OS: macOS 14.7 (arm64) GCC version: Could not collect Clang version: 15.0.0 (clang-1500.3.9.4) CMake version: Could not collect Libc version: N/A
Python version: 3.9.6 (default, Feb 3 2024, 15:58:27) [Clang 15.0.0 (clang-1500.3.9.4)] (64-bit runtime) Python platform: macOS-14.7-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Apple M3 Max
Versions of relevant libraries: [pip3] mypy-extensions==1.0.0 [pip3] numpy==2.0.2 [pip3] storchastic==0.3.7 [pip3] torch==2.4.1 [pip3] torchvision==0.19.1 [conda] No relevant packages
cc @kulinseth @albanD @malfet @DenisVieriu97 @jhavukainen