I tried to run inference.py and encountered RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR.
I'm using Python3.7 per #93 on a Linux machine. I have a 8G VRAM GPU so I don't know whether this is a memory error.
Full error log:
loading model... done
loading wave source... done
stft of wave source... done
0%| | 0/19 [00:00<?, ?it/s]
Traceback (most recent call last):
File "inference.py", line 184, in <module>
main()
File "inference.py", line 156, in main
y_spec, v_spec = sp.separate(X_spec)
File "inference.py", line 77, in separate
mask = self._separate(X_mag_pad, roi_size)
File "inference.py", line 44, in _separate
pred = self.model.predict_mask(X_batch)
File "/home/user/vocal-remover/lib/nets.py", line 115, in predict_mask
mask = self.forward(x)
File "/home/user/vocal-remover/lib/nets.py", line 89, in forward
h2 = self.stg2_high_band_net(h2_in)
File "/home/user/vocal-remover/lib/nets.py", line 33, in __call__
h = self.aspp(e5)
File "/scratch/user/conda/envs/vocal-remover/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/user/vocal-remover/lib/layers.py", line 121, in forward
feat1 = F.interpolate(self.conv1(x), size=(h, w), mode='bilinear', align_corners=True)
File "/scratch/user/conda/envs/vocal-remover/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/scratch/user/conda/envs/vocal-remover/lib/python3.7/site-packages/torch/nn/modules/container.py", line 119, in forward
input = module(input)
File "/home/user/vocal-remover/lib/layers.py", line 26, in __call__
return self.conv(x)
File "/scratch/user/conda/envs/vocal-remover/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/scratch/user/conda/envs/vocal-remover/lib/python3.7/site-packages/torch/nn/modules/container.py", line 119, in forward
input = module(input)
File "/scratch/user/conda/envs/vocal-remover/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/scratch/user/conda/envs/vocal-remover/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 399, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/scratch/user/conda/envs/vocal-remover/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 396, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.
import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([4, 128, 1, 16], dtype=torch.float, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(128, 128, kernel_size=[1, 1], padding=[0, 0], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()
ConvolutionParams
data_type = CUDNN_DATA_FLOAT
padding = [0, 0, 0]
stride = [1, 1, 0]
dilation = [1, 1, 0]
groups = 1
deterministic = false
allow_tf32 = true
input: TensorDescriptor 0x641b5d0
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 4, 128, 1, 16,
strideA = 2048, 16, 16, 1,
output: TensorDescriptor 0x641b560
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 4, 128, 1, 16,
strideA = 2048, 16, 16, 1,
weight: FilterDescriptor 0x597c900
type = CUDNN_DATA_FLOAT
tensor_format = CUDNN_TENSOR_NCHW
nbDims = 4
dimA = 128, 128, 1, 1,
Pointer addresses:
input: 0x7f00593e8e00
output: 0x7f00593f0e00
weight: 0x7f0058d18000
Forward algorithm: 5
Per the suggestion in the error log, I tried running this snippet in Python and got no error so my torch installation should be alright.
I tried to run
inference.py
and encounteredRuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
. I'm using Python3.7 per #93 on a Linux machine. I have a 8G VRAM GPU so I don't know whether this is a memory error.Full error log:
Per the suggestion in the error log, I tried running this snippet in Python and got no error so my torch installation should be alright.
Is there anything I'm missing? Thanks!