lebedov / scikit-cuda

Python interface to GPU-powered libraries
http://scikit-cuda.readthedocs.org/
Other
986 stars 179 forks source link

CUDA 11 error (invalid resource handle) after destroying FFT plan & using a new one #308

Open vincefn opened 3 years ago

vincefn commented 3 years ago

Problem

I have found an issue when using CUDA 11.1, where creating a FFT plan, using it and doing another operation (simple sum reduction), then deleting the plan, re-creating another one and doing this again ends up with a cuFuncSetBlockShape failed: invalid resource handle

The following minimal example can be used to reproduce the issue (needs to be done in a fresh session for reproductibility)

import numpy as np
import pycuda.gpuarray as cua
import pycuda.autoinit
import skcuda.fft as cu_fft

fft_shape = (128, 128)

plan = cu_fft.Plan(fft_shape, np.complex64, np.complex64, batch=1)
a = cua.to_gpu(np.random.uniform(0,1, fft_shape).astype(np.complex64))
cu_fft.fft(a, a, plan)
tmp = cua.sum(a)

del plan

plan = cu_fft.Plan(fft_shape, np.complex64, np.complex64, batch=1)
cu_fft.fft(a, a, plan)
tmp = cua.sum(a)

Using the above code in a fresh python session always ends up with the following error:

---> 17 tmp = cua.sum(a)

~/dev/py38-env/lib/python3.8/site-packages/pycuda/gpuarray.py in sum(a, dtype, stream, allocator)
   1639     from pycuda.reduction import get_sum_kernel
   1640     krnl = get_sum_kernel(dtype, a.dtype)
-> 1641     return krnl(a, stream=stream, allocator=allocator)
   1642
   1643
~/dev/py38-env/lib/python3.8/site-packages/pycuda/reduction.py in __call__(self, *args, **kwargs)
    283
    284             # print block_count, seq_count, self.block_size, sz
--> 285             f((block_count, 1), (self.block_size, 1, 1), stream,
    286                     *([result.gpudata]+invocation_args+[seq_count, sz]),
    287                     **kwargs)

~/dev/py38-env/lib/python3.8/site-packages/pycuda/driver.py in function_prepared_async_call(func, grid, block, stream, *arg
s, **kwargs)
    547     def function_prepared_async_call(func, grid, block, stream, *args, **kwargs):
    548         if isinstance(block, tuple):
--> 549             func._set_block_shape(*block)    550         else:
    551             from warnings import warn

LogicError: cuFuncSetBlockShape failed: invalid resource handle

The error occurs during the pycuda sum reduction, but it seems triggered by the deletion of the plan and re-creation of another one, so it may be due to cuFFT. I noted than in CUDA 11.1 the release notes indicate: "After successfully creating a plan, cuFFT now enforces a lock on the cufftHandle. Subsequent calls to any planning function with the same cufftHandle will fail" but I have no idea if that can be related.

Environment

List the following info:

vincefn commented 3 years ago

I tested this also under windows 10 with CUDA 11.2 and the issue is reproduced with the above code snippet

In the CUDA 11.2 release notes you can read among known issues: "cuFFT planning and plan estimation functions may not restore correct context affecting CUDA driver API applications"

vincefn commented 2 years ago

Under linux with cuda toolkit 11.5 installed in a conda environment (cufftGetVersion() reports 106000 ; driver 460.91.03), the issue is still present, even if the cuda release notes do not mention the issue any more (?)...

dimitsev commented 2 years ago

Related? https://github.com/lebedov/scikit-cuda/issues/330