kunzmi / managedCuda

ManagedCUDA aims an easy integration of NVidia's CUDA in .net applications written in C#, Visual Basic or any other .net language.
Other
440 stars 79 forks source link

cudaFftPlanMany.Exec only works with in place transforms #105

Open TheWhiteAmbit opened 3 years ago

TheWhiteAmbit commented 3 years ago

When calling my cuda plan with only one parameter, I can find a transformed Array on the original position. But whenever I call one of the methods with separate input and output parameters, the resulting array is always filled with just zeros. I have this problem on CudaDeviceVariable als well as with CudaPitchedDeviceVariable (mapped from texture as CudaDirectXInteropResource). Array size should be corrext, using cufftType.R2C with output arrays twice the size of input arrays.

   `public void Exec(CUdeviceptr iodata);                                                                //working
    public void Exec(CUdeviceptr iodata, TransformDirection direction);                                  //working
    public void Exec(CUdeviceptr idata, CUdeviceptr odata, TransformDirection direction);           //not working
    public void Exec(CUdeviceptr idata, CUdeviceptr odata);                                         //not working`
TheWhiteAmbit commented 3 years ago

This does not occur when using cufftType.C2C, maybe i can format just the input data different as a workaround, but this still seems to be a problem for cufftType.R2C

kunzmi commented 3 years ago

Can you provide a minimal example showing the problem? Is it a 1D, 2D or 3D transform? Are the input data padded?

Because you mention DirectX, I assume you are using 2D transforms, are you sure the array sizes are correct, given that "output arrays twice the size of input arrays" is not correct: For 2D R2C transforms, the output array must be of size (width / 2 + 1) x height and of datatype cuFloatComplex, float2 or twice the size of floats.

TheWhiteAmbit commented 3 years ago

First, thank you for the great work! I made a workaorund with C2C and don't have the original code anymore. I use a 1D transform with plan many and stride on a Texture2D and it works like charm now :) So this is my redone sample - it should work not to work :) hope I did not miss any edits:

`

     void CudaFFTPlanManyOnMappedResource(Texture1D inputTexture, Texture2D outputTexture, uint startIndexOfset = 0)
     {
        try
        {
            if (cudaContext == null)
                cudaContext = new CudaContext();
            cudaContext.SetCurrent();

            //int elementCount = inputTexture.Description.Width * inputTexture.Description.Height;
            //float[] floatArrayInput = new float[elementCount];
            //for (int i = 0; i < elementCount; i++) {
            //    floatArrayInput[i] = rand.Next(0, 65535);
            //}

            //float[] floatArrayOutput = new float[elementCount * 2];
            //CudaDeviceVariable<float> cudaDeviceInput = new CudaDeviceVariable<float>(elementCount);
            //cudaDeviceInput.CopyToDevice(floatArrayInput);
            //CudaDeviceVariable<float> cudaDeviceOutput = new CudaDeviceVariable<float>(elementCount * 2);            

            CudaPitchedDeviceVariable<float> cudaPitchedDeviceInput = new CudaPitchedDeviceVariable<float>(inputTexture.Description.Width, inputTexture.Description.Height);
            CudaPitchedDeviceVariable<ManagedCuda.VectorTypes.float2> cudaPitchedDeviceOutput = new CudaPitchedDeviceVariable<ManagedCuda.VectorTypes.float2>(outputTexture.Description.Width, outputTexture.Description.Height);

            using (CudaDirectXInteropResource resourceInput = new CudaDirectXInteropResource(inputTexture.NativePointer, CUGraphicsRegisterFlags.None, CudaContext.DirectXVersion.D3D11, CUGraphicsMapResourceFlags.None))
            {
                resourceInput.Map();
                using (var dataInput = resourceInput.GetMappedArray2D(startIndexOfset, 0))
                {
                    dataInput.CopyFromThisToDevice(cudaPitchedDeviceInput);
                }
                resourceInput.UnMap();
            }

            if (cudaFftPlanMany == null)
            {
                if (cudaFftPlanMany != null)
                {
                    cudaFftPlanMany.Dispose();
                }
                var cudaFftPlanManyWidth = inputTexture.Description.Width;
                var cudaFftPlanSizeHeight = inputTexture.Description.Height;
                var cudaFftPlanManyWidth = outputTexture.Description.Width;
                var cudaFftPlanSizeHeight = outputTexture.Description.Height;

                int[] inembed = { 0 };
                int istride = 1;                  
                int idist = cudaPitchedDeviceInput.Pitch / cudaPitchedDeviceInput.TypeSize;  ;
                int[] onembed = { 0 };
                int ostride = 1;
                int odist = cudaPitchedDeviceOutput.Pitch / cudaPitchedDeviceOutput.TypeSize;

                cudaFftPlanMany = new CudaFFTPlanMany(1, new int[] { cudaFftPlanSizeHeight }, cudaFftPlanManyWidth, cufftType.R2C, inembed, istride, idist, onembed, ostride, odist);
            }

            cudaFftPlanMany.Exec(cudaPitchedDeviceInput.DevicePointer, cudaPitchedDeviceOutput.DevicePointer, TransformDirection.Forward);
            cudaContext.Synchronize();

            using (CudaDirectXInteropResource resourceOutput = new CudaDirectXInteropResource(outputTexture.NativePointer, CUGraphicsRegisterFlags.None, CudaContext.DirectXVersion.D3D11, CUGraphicsMapResourceFlags.None))
            {
                resourceOutput.Map();
                using (var dataOutput = resourceOutput.GetMappedArray2D(0, 0))
                {
                    dataOutput.CopyFromDeviceToThis(cudaPitchedDeviceOutput);
                }
                resourceOutput.UnMap();
            }

            //cudaDeviceOutput.CopyToHost(floatArrayOutput);

            cudaPitchedDeviceInput.Dispose();
            cudaPitchedDeviceOutput.Dispose();

            //cudaDeviceInput.Dispose();
            //cudaDeviceOutput.Dispose();
        }
        catch (ManagedCuda.CudaException)
        {
        }
    }`

It does not work on neither Texture2D or the commented out CudaDeviceVariable buffers, the result is always all zero. Having changed that to cufftType.C2C with corresponding Textureformat R32G32_Float (from R32_Float ) and ManagedCuda.VectorTypes.float2 mappings now it's working. So I assume the array sizes are correct, I made no changes to the output buffers from my working code, input buffers of course half the output size for cufftType.R2C