kunzmi / managedCuda

ManagedCUDA aims an easy integration of NVidia's CUDA in .net applications written in C#, Visual Basic or any other .net language.
Other
440 stars 79 forks source link

Conversion from "long" #43

Closed lorenzoviola closed 7 years ago

lorenzoviola commented 7 years ago

Hello Michael I'm having quite some headaches with "long" datatypes...

I've an application that compiles and runs 64bit kernels, inside a kernel i've a function like this

__device__ __forceinline__ long shorts_to_int64(short u1, short u2, short u3, short u4) {
    long check = (short)u1 | (short)(u2 << 16) | (short)(u3 << 32) | (short)(u4 << 48);

    return check;
}

the purpose was to avoid using too much memory, and placing a 0..65536 number inside an array of "long" types. Inside others kernels this shift-pack-unpack works correctly,and it's initialized like this :

CudaDeviceVariable dev_longvalues = new CudaDeviceVariable(nMax);

and the dev_longvalues is passed on the kernel like this :

global void myKernelFunc(long* arLongValues)

So on the CUDA device, the "shorts_to_int64" function is used to fill the "arLongValues" While on the device, the array looks correct.

When I copy back to host, I've the problem :

long[] hostLongValues = dev_longvalues;

At this point, looks like that inside the c# long[] array there is something completely different from the c++/cuda long datatype, looks like I've 128 bit placed inside a 64-bit.

Looking to VectorTypes.cs I read this :

/// long1. long stands here for the long .NET type, i.e. long long or a 64bit long in C++/CUDA 

I've tried to use long1 type, but without any better result.

In C#, i've tried to write something like this (something similar to the CUDA counterpart)

long idxValue = hostLongValues[i]; var lShortValues = LongToShortArray(idxValue).Select(i => (int)i);

    private short[] LongToShortArray(long input)
    {       
        return new[]
        {
            (short)(input & 0x0000ffff),
            (short)((input >> 16) & 0x0000ffff),
            (short)((input >> 32) & 0x0000ffff),
            (short)((input >> 48) & 0x0000ffff),
            (short)((input >> 64) & 0x0000ffff),
            (short)((input >> 80) & 0x0000ffff),
            (short)((input >> 96) & 0x0000ffff),
            (short)((input >> 112) & 0x0000ffff)
        };
    }

or even (fighting with casts....)

    private short[] LongToShortArray(long input)
    {
        return new[]
        {
            (short)((short)input & (long)0x0000ffff),
            (short)((short)(input >> 16) & (long)0x0000ffff),
            (short)((short)(input >> 32) & (long)0x0000ffff),
            (short)((short)(input >> 48) & (long)0x0000ffff),
            (short)((short)(input >> 64) & (long)0x0000ffff),
            (short)((short)(input >> 80) & (long)0x0000ffff),
            (short)((short)(input >> 96) & (long)0x0000ffff),
            (short)((short)(input >> 112) & (long)0x0000ffff)
        };
    }

but had no luck......

So I'm asking : what's the best practice to "download" from the device a c++/cuda LONG type, to the equivalent C# ? I've compiled the C# application with x64, nothing changed..

Can you give me some advice ?

kunzmi commented 7 years ago

This is likely to be wrong: long check = (short)u1 | (short)(u2 << 16) | (short)(u3 << 32) | (short)(u4 << 48);

The long datatype in C++/Cuda is usually only 32bit, so your kernel can't compute the numbers correctly. Further if you want to store four 16-Bit numbers, why don't you use the short4 datatype given in Cuda/ManagedCuda? Further I'd guess that you want to use the unsigned datatypes, if your range is 0..65535.

lorenzoviola commented 7 years ago

Well... of course I will try ushort4 !!

of course a much more elegant solution ... In the beginning I was dubious for speed (some people says that shifting is faster than adding) but of course now I will try it

Thanks a lot !

lorenzoviola commented 7 years ago

Hello, just to reply that using the short4 type was the best solution !

Thanks again !