kunzmi / managedCuda

ManagedCUDA aims an easy integration of NVidia's CUDA in .net applications written in C#, Visual Basic or any other .net language.
Other
440 stars 79 forks source link

do copy with CudaOpenGLImageInteropResource alwasy returns `ErrorInvalidValue` #125

Closed jcyuan closed 4 months ago

jcyuan commented 4 months ago

Hi basically i have 2 questions:

1, i use CUDA for ffmpeg to decode video frames, and from ffmpeg's doc, the AVFrame->data contains CUdeviceptr for NV12 (default format for CUDA decoding)'s 2 planes data, then i tried to use CudaOpenGLImageInteropResource and copy these 2 ptr into OpenGL texture, but failed, it always returns me ErrorInvalidValue, not sure why, here is the key snippet of code, not sure what the problem is:

(note: i have allocated the gl texture size as w, h*1.5 (because i have combined NV12's 2planes onto the same texture)

public override void CopyD2D(IntPtr data, ulong size)
    {
        lock (_lock)
        {
            if (_cudaRes == null || _gl == null)
                throw new NullReferenceException("GL or res not initialized");

            using (_context.EnsureCurrent())
            {
                _cuContext.PushContext();

                if (!_cudaRes.IsMapped)
                    _cudaRes.Map();

                var nv = _cudaRes.GetMappedArray2D(0, 0);

                var ptrArray = *(byte_ptrArray8*)data;
                var byteSize = Marshal.SizeOf<byte>();
                var nSize = nv.Width * (long)((long)nv.Height / 1.5) * byteSize;
                var err = DriverAPINativeMethods.SynchronousMemcpy_v2.cuMemcpyDtoA_v2(nv.CUArray, 0,
                    new CUdeviceptr((long)ptrArray[0]), nSize);
                CheckCudaErrors(err);  // alwasy returns ErrorInvalidValue.....

                var vSize = nv.Width * (long)((long)nv.Height / 1.5 * 0.5) * byteSize;
                err = DriverAPINativeMethods.SynchronousMemcpy_v2.cuMemcpyDtoA_v2(nv.CUArray, nSize,
                    new CUdeviceptr((long)ptrArray[1]), vSize);
                CheckCudaErrors(err);

                _cudaRes.UnMap();

                _cuContext.PopContext();
            }

            _texDirty = true;
        }
    }

what am i doing wrong? :(

2, about the license, i found that it's GPL license, if i need to use this lib in my close-source software, what should i do? if with commercial license, what is the price please?

thanks~

kunzmi commented 4 months ago

Hi,

for 1) where exactly is the exception thrown? How do you initialize the OpenGL texture? Are you sure that it can be mapped to CUArray? Another thing that I don't think that it can be correct is your copy method: you are using a 1D copy function for a 2D CUArray, which should also result in the mentioned ErrorInvalidValue. Why not use the copy methods coming with the CudaArray2D class, something like nv.CopyFromHostToThis(IntPtr aHostSrc, SizeT aElementSizeInBytes) or nv.CopyData(CUDAMemCpy2D aCopyParameters)?

for 2) please contact me by mail: managedcuda@articimaging.eu

Michael

jcyuan commented 4 months ago

Hi,

for 1) where exactly is the exception thrown? How do you initialize the OpenGL texture? Are you sure that it can be mapped to CUArray? Another thing that I don't think that it can be correct is your copy method: you are using a 1D copy function for a 2D CUArray, which should also result in the mentioned ErrorInvalidValue. Why not use the copy methods coming with the CudaArray2D class, something like nv.CopyFromHostToThis(IntPtr aHostSrc, SizeT aElementSizeInBytes) or nv.CopyData(CUDAMemCpy2D aCopyParameters)?

for 2) please contact me by mail: managedcuda@articimaging.eu

Michael

thank you for reply,

1, that error was the result returned by SynchronousMemcpy_v2 and i manually throw it in the CheckCudaErrors. for OpenGL texture, the allocation and parameter initialization should be fine, because i'm skilled with OpenGL. I planned to put Y plane and UV plane onto the same texture, so i allocated that texture with w, h 1.5 (wh=Y, UV=w(h0.5)), that's why in my code above i tried to copy 2 times with an offset although not sure if that's correct way to do it becausee i'm new to CUDA. here is how i allocate the OpenGL texture, nothing special actually:

_cuContext = new CudaContext(SGLContext.PrimaryCudaContext!.DeviceId);

_glTexture = _gl.GenTexture();
_gl.BindTexture(TextureTarget.Texture2D, _glTexture);
_gl.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureWrapS, (int)GLEnum.ClampToEdge);
_gl.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureWrapT, (int)GLEnum.ClampToEdge);
_gl.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMagFilter, (int)GLEnum.Linear);
_gl.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMinFilter,
  (int)(hasMipmap ? GLEnum.LinearMipmapNearest : GLEnum.Linear));
if (!hasMipmap)
{
    _gl.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMaxLevel, 0);
    _gl.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureBaseLevel, 0);
}

_gl.TexImage2D(TextureTarget.Texture2D, 0, _pxIFormat!.Value, _width, _height, 0,
        _pxFormat,
        _pxType, null);

// ......

if (_cudaRes == null)
{
        Debug.Assert(Handle != 0);

        _cuContext.PushContext();

        _cudaRes = new CudaOpenGLImageInteropResource(Handle, CUGraphicsRegisterFlags.WriteDiscard,
             CudaOpenGLImageInteropResource.OpenGLImageTarget.GL_TEXTURE_2D,
            CUGraphicsMapResourceFlags.WriteDiscard);

    _cuContext.PopContext();
}

(this snippet works fine, just copy will fail)

anyway, does nv.CopyFromHostToThis accept a pointer array to copy with? and, the nv data ffmpeg provided, i think should be a Device data but not a Host data (so should be nv.CopyFromDeviceToThis i think?), and, without an offset how does it know where to copy to? or should I create 2 OpenGL texture for this? very confused.... i prefer to just copy to 1 texture with an offset if it's possible, because i can re-use my class/shader for soft frame buffering.

2, ok thank you, i'll write an email later.

kunzmi commented 4 months ago

If it is the copy that fails, then it is for sure that it is because you use the 1D copy function for a 2D CUArray. You can either use directly DriverAPINativeMethods.SynchronousMemcpy_v2.cuMemcpy2D_v2 or its wrapped call in nv.CopyData(CUDAMemCpy2D aCopyParameters) and provide the custom offsets in the copy parameters struct. Check CudaArray2D.cs on how to fill the copy parameters struct.

jcyuan commented 4 months ago

If it is the copy that fails, then it is for sure that it is because you use the 1D copy function for a 2D CUArray. You can either use directly DriverAPINativeMethods.SynchronousMemcpy_v2.cuMemcpy2D_v2 or its wrapped call in nv.CopyData(CUDAMemCpy2D aCopyParameters) and provide the custom offsets in the copy parameters struct. Check CudaArray2D.cs on how to fill the copy parameters struct.

thanks so much, let me have a try, will reply later about my result~

jcyuan commented 4 months ago

even a simple memory block copy will faill too, this is really frustrating....... :(

_bufferPtr = _cudaContext.Value.AllocateMemory(length);
_destDeviceVar = new CudaDeviceVariable<byte>(_bufferPtr);

public override int CopyFrom(MediaFrame source)
    {
        _cudaContext.Value.PushContext();

        var offset = 0ul;
        for (uint i = 0; i < 8; i++)
        {
            if (vSrc.Pointer->data[i] == null)    //  data is nv planes (2 CUdevicePtr)
                continue;

            var plane = ffmpeg.av_frame_get_plane_buffer(vSrc.Pointer, (int)i);  //  get plane info
            var srcDevicePtr = new CUdeviceptr((long)plane->data);   // data is a CUdevicePtr

            _destDeviceVar.CopyToDevice(srcDevicePtr, 0, offset, plane->size);   // use device var copy to device with offset, but fail... the same result `ErrorInvalidValue`, i have to idea what to do...

            offset += plane->size;
        }

        _cudaContext.Value.PopContext();

        return (int)_bufferLength;
    }

image

could you help please... this is even not a interop, but a really simple GPU allocation & copy with ffmpeg CUdevicePtr....

kunzmi commented 4 months ago

Having only small pieces of code, I can only guess what might be wrong... MemCpy returns ErrorInvalidValue if you provide input parameters that the CUDA api cannot handle, basically this means that one of the device pointers is not a correct device pointer or that the size doesn't match. So in order to understand why Cuda is unhappy about your parameters, you'd need to debug a bit further to see if the values are what you think they should be:

//Create a CudaDeviceVariable<byte> from the CUdevicePtr coming from FFMPEG:
//This gathers the size of the allocation from the CUDA API, if this fails, plane->data is not a CUdevicePtr!
var testDevVar = new CudaDeviceVariable<byte>((long)plane->data); 
//check size, is it correct?
var testSize = testDevVar.SizeInBytes;
//copy data to host, check if data is what is expected:
byte[] testData = testDevVar;

and before the CopyToDevice-call, check that offset + plane->size is smaller than length

jcyuan commented 4 months ago

var testDevVar = new CudaDeviceVariable((long)plane->data); //check size, is it correct? var testSize = testDevVar.SizeInBytes; //copy data to host, check if data is what is expected: byte[] testData = testDevVar;

thanks man!

yes that pointer is not valid, i'm not sure why...

step into this method, image

and from here, it pops that the cuMemGetAddressRange_v2 calling was failed. image

this is werid, from ffmpeg's doc, it says: image

anyway this is not a problem ManagedCUDA related, i'll dig into it further to find the reason. really thanks for your help! appreciate it.

jcyuan commented 4 months ago

@kunzmi sir, thank you very much, i have solved this problem, it's quite simple, because i didn't push the context from ffmpeg which contains the memory to current.

and i found a problem, the CudaContext class does not have a constructor which accepts an external context pointer.

kunzmi commented 4 months ago

yes, if ffmpeg library uses a different CudaContext, the allocations are not valid in your application... ManagedCuda doesn't have a constructor with a pointer to external context as this is not a normal use case. What you can do instead is, making the context of ffmpeg current and fetch that context with the constructor CudaContext(int deviceId, bool createNew) where you set createNew = false. You likely don't have control over the ffmpeg context creation settings, for example binding it to OpenGL, etc... Because sharing contexts along libraries is messy, Nvidia created the concept of a PrimaryContext, so instead of using a standard CudaContext, go with a PrimaryContext instead, it is always the same in one process and it is always bound to OpenGL, so likely PrimaryContext is what you are looking for.

jcyuan commented 4 months ago

solved~ thanks so much. :) ❤