kunzmi / managedCuda

ManagedCUDA aims an easy integration of NVidia's CUDA in .net applications written in C#, Visual Basic or any other .net language.
Other
440 stars 79 forks source link

Asynchronous kernel run failure #89

Open serjl opened 4 years ago

serjl commented 4 years ago

Hello and thank you for the great wrapper!!! I have the following issue: i want to run a kernel in the asynchronous manner - with streams. Assume S streams and the kernel uses a d_input array of size SN and a d_output array of size SN. At stream 0<=k<=S-1 and 0<=tid<=N-1, the kernel reads data from d_input array at kN+tid, manipulates it and writes its result back to d_output at the position kN+tid. In this problem N is a constant number and the length of d_input is encoded by varying S. Beginning from a special S (depends on N) i get the following crash:

ManagedCuda.CudaException: 'ErrorIllegalAddress: While executing a kernel, the device encountered a load or store instruction on an invalid memory address. This leaves the process in an inconsistent state and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched.'

For sure the crash is not caused by the lack of the total memory (i have about 2GB array having about 20GB free on nvidia titan rtx ).

My question is what is this limitation in cuda/gpu and how / what gpu card characteristics can i ask in order to allocate the correct array length.

kunzmi commented 4 years ago

ErrorIllegalAddress indicates that you try to access memory that is not allocated. Most common mistake for that is wrong index computation for arrays or wrong array offsets/pitches etc. inside the kernels. You'd get other errors if you'd run out of resources, e.g. too many streams.

serjl commented 4 years ago

Thank you very much for reply. I also thought the same, but I found out experimentally that when the size S*N is above Maximum memory pitch: (2147483647 bytes in my GPU) which is also the Maximum sizes of X dimension of a grid then i get this error. Otherwise the code runs smoothly.

chenw11 commented 3 years ago

I got the same error when I enter the second iteration to perform aysnc copy. Does CudaPagedLockedHostMemory required cleanup after each iteration?

chenw11 commented 3 years ago

Apparently, in my case, this problem is caused by a bugs in the cuda functions. Since I can't step in the breakpoint in cuda files, the error pass along until I hit the async copy from host to device line. Once the cuda function error is fixed by passing the correct parameters this ErrorIllegalAddress is gone. Btw, can somebody help me to get into the breakpoints in cuda files? I have issue this problem, but it is not resolved yet. Thanks a lot.