Closed lorenzoviola closed 7 years ago
This is likely to be wrong:
long check = (short)u1 | (short)(u2 << 16) | (short)(u3 << 32) | (short)(u4 << 48);
The long datatype in C++/Cuda is usually only 32bit, so your kernel can't compute the numbers correctly. Further if you want to store four 16-Bit numbers, why don't you use the short4 datatype given in Cuda/ManagedCuda? Further I'd guess that you want to use the unsigned datatypes, if your range is 0..65535.
Well... of course I will try ushort4 !!
of course a much more elegant solution ... In the beginning I was dubious for speed (some people says that shifting is faster than adding) but of course now I will try it
Thanks a lot !
Hello, just to reply that using the short4 type was the best solution !
Thanks again !
Hello Michael I'm having quite some headaches with "long" datatypes...
I've an application that compiles and runs 64bit kernels, inside a kernel i've a function like this
the purpose was to avoid using too much memory, and placing a 0..65536 number inside an array of "long" types. Inside others kernels this shift-pack-unpack works correctly,and it's initialized like this :
CudaDeviceVariable dev_longvalues = new CudaDeviceVariable(nMax);
and the dev_longvalues is passed on the kernel like this :
global void myKernelFunc(long* arLongValues)
So on the CUDA device, the "shorts_to_int64" function is used to fill the "arLongValues" While on the device, the array looks correct.
When I copy back to host, I've the problem :
long[] hostLongValues = dev_longvalues;
At this point, looks like that inside the c# long[] array there is something completely different from the c++/cuda long datatype, looks like I've 128 bit placed inside a 64-bit.
Looking to VectorTypes.cs I read this :
I've tried to use long1 type, but without any better result.
In C#, i've tried to write something like this (something similar to the CUDA counterpart)
long idxValue = hostLongValues[i]; var lShortValues = LongToShortArray(idxValue).Select(i => (int)i);
or even (fighting with casts....)
but had no luck......
So I'm asking : what's the best practice to "download" from the device a c++/cuda LONG type, to the equivalent C# ? I've compiled the C# application with x64, nothing changed..
Can you give me some advice ?