kunzmi / managedCuda

ManagedCUDA aims an easy integration of NVidia's CUDA in .net applications written in C#, Visual Basic or any other .net language.
Other
440 stars 79 forks source link

template<typename T> causes a failure in CudaRuntimeCompiler.Compile() (nvrtcCompileProgram dll) #111

Open serjl opened 1 year ago

serjl commented 1 year ago

Hello Michael, I am using the latest version of the wrapper (Cuda 11.4). I get into a strange failure while compiling a kernel with a template attribute, e.g.

template <typename T>
__device__ __forceinline__ unsigned char HasBit(T bitmap, unsigned char pos)
{
    return (bitmap >> pos) & 1;
}

ManagedCuda.NVRTC.NVRTCException HResult=0x80131500 Message=ErrorCompilation: Compilation error. Source=NVRTC

Without it , it works fine, e.g.

__device__ __forceinline__ unsigned char HasBit(unsigned char bitmap, unsigned char pos)
{
    return (bitmap >> pos) & 1;
}

I use the following compilation code:

        string fileToCompile = File.ReadAllText(filename);
        string[] headers = new string[0];
        string[] includeNames = new string[0];
        CudaRuntimeCompiler rtc = new CudaRuntimeCompiler(fileToCompile, name, headers, includeNames);
        string[] options = new string[] { "--gpu-architecture=compute_75"};
        rtc.Compile(options);

Do you have any idea what is wrong here?

Your help as well as you amazing wrapper are highly appreciated and valuable.

kunzmi commented 1 year ago

Hi, I tried your example with and without templates on Cuda 11.4, both compile just fine. You have a full example with calling kernel code you can post (only the __device__-function will just be optimized away anyhow)? What is the output of rtc.GetLogAsString()?

serjl commented 1 year ago

Hi, I tried your example with and without templates on Cuda 11.4, both compile just fine. You have a full example with calling kernel code you can post (only the __device__-function will just be optimized away anyhow)? What is the output of rtc.GetLogAsString()?

Hi, Thanks a lot for the reply. From rtc.GetLogAsString() I get that "template declaration may not have extern "C" linkage". Indeed I put the function template <typename T> __device__ __forceinline__ unsigned char HasBit(T bitmap, unsigned char pos) { return (bitmap >> pos) & 1; } inside extern "C"{} block - my fault. After taking it out of it all works perfectly.

Many thanks again!