import sigpy as sp
#from cupy import cudnn
import torch
torch.backends.cudnn.deterministic = True
import torch.nn.functional as F
net_input = torch.randn(1, 10, 220, 220, dtype=torch.float32).to('cuda:0')
weight = torch.randn(10, 10, 1, 1).requires_grad_(True).to('cuda:0')
net_output = F.conv2d(net_input, weight, padding='same')
z = net_output
loss = torch.sum(torch.abs(z))
loss.backward() # Segfault occurs here
Expected behavior
I expect the code to finish without segfaulting.
Desktop (please complete the following information):
Ubuntu 22.04
NVIDIA RTX 3090, CUDA 12.1
Additional context
Problem only occurs when doing the backward pass on a 1x1 conv2d. (e.g. 3x3 conv2d is fine)
Problem is GPU only.
Problem disappears when torch is imported BEFORE sigpy (probably related to sigpy/config.py)
I think the problem is related to cuDNN version mismatch between torch and sigpy.
I was actually able to resolve the problem by installing a Pytorch-compatible cudnn directly from apt and NOT installing cudnn via conda, but this took some digging.
Can try to write a pull request to modify config.py to warn people more specifically about the cudnn version e.g. by using cudnn.getVersion and torch.backends.cudnn.version().
Describe the bug Segfault in backward pass when running on GPU with Pytorch and torch.backends.cudnn.deterministic is True
To Reproduce Steps to reproduce the behavior:
conda env create -f environment.yaml
conda activate sigseg
python segfault.py
environment.yaml
:segfault.py
Expected behavior I expect the code to finish without segfaulting.
Desktop (please complete the following information):
Additional context
sigpy/config.py
)Here's the full frozen environment: