jlennox / NvEncSharp

MIT License
20 stars 7 forks source link

NvEncSharp.Sample.VideoDecode shows only blue channel #6

Open vlt-arizona opened 3 years ago

vlt-arizona commented 3 years ago

Following your earlier comment about this test code, I converted my video to raw h264 like so: ffmpeg -i input.mp4 -c copy output.h264 The file output.h264 displays the converted video but I only see the blue channel. Is there any special setup involved?

jlennox commented 3 years ago

No, that is correct.

If you try opening the file using ffplay or perhaps vlc (I believe it supports raw h264?) does it decode properly?

vlt-arizona commented 3 years ago

Thank you for getting back to me so quickly!

Yes, it decodes properly in both ffplay and vlc.

FFprobe gives the following stats:

Input #0, h264, from 'Big_Buck_Bunny_1080_10s_1MB.h264': 0B f=0/0

Duration: N/A, bitrate: N/A

Stream #0:0: Video: h264 (High), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1200k tbn

nan M-V:    nan fd=   0 aq=    0KB vq=   92KB sq=    0B f=0/0

I’ve tried this with a number of other converted videos; they all show blue only. I also created a version of this video with saturation=0 to verify that the phenomenon had nothing to do with the colors of the original. Just now I tried converting the mp4 video with a different version of ffmpeg; the result is the same. Maybe it could be the display code?

Vaughn

From: Joseph Lennox @.> Sent: Wednesday, October 20, 2021 7:01 PM To: jlennox/NvEncSharp @.> Cc: vlt-arizona @.>; Author @.> Subject: Re: [jlennox/NvEncSharp] NvEncSharp.Sample.VideoDecode shows only blue channel (Issue #6)

No, that is correct.

If you try opening the file using ffplay or perhaps vlc (I believe it supports raw h264?) does it decode properly?

— You are receiving this because you authored the thread. Reply to this email directly, https://github.com/jlennox/NvEncSharp/issues/6#issuecomment-948187865 view it on GitHub, or https://github.com/notifications/unsubscribe-auth/AVIFSERIYIFEDPEULJ3W473UH5X6RANCNFSM5GMVUSGQ unsubscribe. Triage notifications on the go with GitHub Mobile for https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 iOS or https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub Android. https://github.com/notifications/beacon/AVIFSER7IVGLI2KI4QEMRZTUH5X6RA5CNFSM5GMVUSG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOHCCDFWI.gif

vlt-arizona commented 3 years ago

No, it’s not the display code. The “Save as bitmap” images are also blue.

Vaughn

From: @. @.> Sent: Thursday, October 21, 2021 8:28 AM To: 'jlennox/NvEncSharp' @.***> Subject: RE: [jlennox/NvEncSharp] NvEncSharp.Sample.VideoDecode shows only blue channel (Issue #6)

Thank you for getting back to me so quickly!

Yes, it decodes properly in both ffplay and vlc.

FFprobe gives the following stats:

Input #0, h264, from 'Big_Buck_Bunny_1080_10s_1MB.h264': 0B f=0/0

Duration: N/A, bitrate: N/A

Stream #0:0: Video: h264 (High), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1200k tbn

nan M-V:    nan fd=   0 aq=    0KB vq=   92KB sq=    0B f=0/0

I’ve tried this with a number of other converted videos; they all show blue only. I also created a version of this video with saturation=0 to verify that the phenomenon had nothing to do with the colors of the original. Just now I tried converting the mp4 video with a different version of ffmpeg; the result is the same. Maybe it could be the display code?

Vaughn

From: Joseph Lennox @. @.> > Sent: Wednesday, October 20, 2021 7:01 PM To: jlennox/NvEncSharp @. @.> > Cc: vlt-arizona @. @.> >; Author @. @.> > Subject: Re: [jlennox/NvEncSharp] NvEncSharp.Sample.VideoDecode shows only blue channel (Issue #6)

No, that is correct.

If you try opening the file using ffplay or perhaps vlc (I believe it supports raw h264?) does it decode properly?

— You are receiving this because you authored the thread. Reply to this email directly, https://github.com/jlennox/NvEncSharp/issues/6#issuecomment-948187865 view it on GitHub, or https://github.com/notifications/unsubscribe-auth/AVIFSERIYIFEDPEULJ3W473UH5X6RANCNFSM5GMVUSGQ unsubscribe. Triage notifications on the go with GitHub Mobile for https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 iOS or https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub Android. https://github.com/notifications/beacon/AVIFSER7IVGLI2KI4QEMRZTUH5X6RA5CNFSM5GMVUSG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOHCCDFWI.gif

jlennox commented 3 years ago

Interesting.

Sorry, this was a temporary pet project while I was between jobs so I don't understand the ins and outs of it all very well.

I'll try and put some time aside to experiment and see if it reproduces for me.

I wonder if it's something with the colorspaces, but even that shouldn't give you just a blue hue'ed movie.

Are you using GPU or CPU memory? You can switch to CPU by passing in --host-memory

One big difference between them is how they do the NV12 to RGB conversion:

switch (buffer.MemoryType)
{
    case CuMemoryType.Host:
        LibYuvSharp.LibYuv.NV12ToARGB(
            (byte*)buffer.Bytes, YuvInfo.LumaPitch,
            (byte*)buffer.Bytes + YuvInfo.ChromaOffset,
            YuvInfo.ChromaPitch,
            destinationPtr, width * rgbBpp, width, height);

        break;
    case CuMemoryType.Device:
        using (var destPtr = CuDeviceMemory.Allocate(rgbSize))
        {
            LibCudaLibrary.Nv12ToBGRA32(
                buffer.DeviceMemory.Handle, width,
                destPtr, width * rgbBpp, width, height);

            destPtr.CopyToHost(destinationPtr, rgbSize);
        }

        break;
    default:
        throw new ArgumentOutOfRangeException(nameof(buffer.MemoryType));
}

Do note that one being a BGRA32 and the other being ARGB is because they use opposite byte ordering nomenclature.

vlt-arizona commented 3 years ago

Joseph,

Thanks; I very much appreciate this. I’ve been working on an FFmpeg integration project for a month now and I’m barely scratching the surface.

For a pet project, it’s pretty impressive. It answers a real need for C# wrapping of ffmpeg Nvidia code.

I’m currently running a comparison between this and Nvidia’s C++ example project. I’ll let you know if I find anything.

Regards,

Vaughn Treude

From: Joseph Lennox @.> Sent: Thursday, October 21, 2021 9:23 AM To: jlennox/NvEncSharp @.> Cc: vlt-arizona @.>; Author @.> Subject: Re: [jlennox/NvEncSharp] NvEncSharp.Sample.VideoDecode shows only blue channel (Issue #6)

Interesting.

Sorry, this was a temporary pet project while I was between jobs so I don't understand the ins and outs of it all very well.

I'll try and put some time aside to experiment and see if it reproduces for me.

I wonder if it's something with the colorspaces, but even that shouldn't give you just a blue hue'ed movie.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jlennox/NvEncSharp/issues/6#issuecomment-948775564 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AVIFSEWPBKAVY4OXJBDHN53UIA46DANCNFSM5GMVUSGQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub . https://github.com/notifications/beacon/AVIFSEVQ2XKHCNQHDIUTBTTUIA46DA5CNFSM5GMVUSG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOHCGSVDA.gif

vlt-arizona commented 3 years ago

I suspect the problem is something to do with C# structure alignment on a 64-bit system.

1 attachment is a bitmap created with the following code in DecodeToHostRgba32

case CuMemoryType.Device: using (var destPtr = CuDeviceMemory.Allocate(rgbSize)) { LibCudaLibrary.Nv12ToBGRA32( buffer.DeviceMemory.Handle, width, destPtr, width * rgbBpp, width, height);

    destPtr.CopyToHost(destinationPtr, rgbSize);
}

2 happened when I fiddled with the format:

case CuMemoryType.Device: using (var destPtr = CuDeviceMemory.Allocate(rgbSize)) { LibCudaLibrary.Nv12ToRGB24( buffer.DeviceMemory.Handle, width, destPtr, width * rgbBpp, width, height);

    destPtr.CopyToHost(destinationPtr, rgbSize);
}

Vaughn

From: @. @.> Sent: Thursday, October 21, 2021 9:49 AM To: 'jlennox/NvEncSharp' @.>; 'jlennox/NvEncSharp' @.> Cc: 'Author' @.***> Subject: RE: [jlennox/NvEncSharp] NvEncSharp.Sample.VideoDecode shows only blue channel (Issue #6)

Joseph,

Thanks; I very much appreciate this. I’ve been working on an FFmpeg integration project for a month now and I’m barely scratching the surface.

For a pet project, it’s pretty impressive. It answers a real need for C# wrapping of ffmpeg Nvidia code.

I’m currently running a comparison between this and Nvidia’s C++ example project. I’ll let you know if I find anything.

Regards,

Vaughn Treude

From: Joseph Lennox @. @.> > Sent: Thursday, October 21, 2021 9:23 AM To: jlennox/NvEncSharp @. @.> > Cc: vlt-arizona @. @.> >; Author @. @.> > Subject: Re: [jlennox/NvEncSharp] NvEncSharp.Sample.VideoDecode shows only blue channel (Issue #6)

Interesting.

Sorry, this was a temporary pet project while I was between jobs so I don't understand the ins and outs of it all very well.

I'll try and put some time aside to experiment and see if it reproduces for me.

I wonder if it's something with the colorspaces, but even that shouldn't give you just a blue hue'ed movie.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jlennox/NvEncSharp/issues/6#issuecomment-948775564 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AVIFSEWPBKAVY4OXJBDHN53UIA46DANCNFSM5GMVUSGQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub . https://github.com/notifications/beacon/AVIFSEVQ2XKHCNQHDIUTBTTUIA46DA5CNFSM5GMVUSG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOHCGSVDA.gif

jlennox commented 3 years ago

I was not able to see any of the attachments.

vlt-arizona commented 3 years ago

Hmm, my first emain had bmp’s which were rather big and bounced the message.

For the second, I converted to jpg’s; much smaller but perhaps that bounced as well.

Vaughn

From: Joseph Lennox @.> Sent: Thursday, October 21, 2021 1:54 PM To: jlennox/NvEncSharp @.> Cc: vlt-arizona @.>; Author @.> Subject: Re: [jlennox/NvEncSharp] NvEncSharp.Sample.VideoDecode shows only blue channel (Issue #6)

I was not able to see any of the attachments.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jlennox/NvEncSharp/issues/6#issuecomment-948994532 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AVIFSEVJ4HPBGSGOS5EV4CTUIB4WFANCNFSM5GMVUSGQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub . https://github.com/notifications/beacon/AVIFSEVBSRLNQO7L26KZWNTUIB4WFA5CNFSM5GMVUSG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOHCIIDZA.gif

jlennox commented 3 years ago

They're unfortunately not making it. Are you able to reply and attach them directly on github?

vlt-arizona commented 3 years ago

OK, trying again. output-00000000-1 output-00000000-2 n.

vlt-arizona commented 3 years ago

Here's a bit more info: Reading the frame in NvEncSharp using Nv12ToBGRA32 gives this kind of data in hex: 38 00 00 00 38 00 00 00 Reading the frame in NvEncSharp using Nv12ToRGB24 gives this kind of data: 38 32 00 00 38 39 00 00 Reading the frame in CUDA's C++ driver gives this: 38 44 39 00 38 44 39 00 So there's apparently some kind of data casting problem going on.

jlennox commented 3 years ago

My steps:

  1. I downloaded Big_Buck_Bunny_1080_10s_1MB.mp4
  2. I ran ffmpeg -i Big_Buck_Bunny_1080_10s_1MB.mp4 example.mp4 -c copy Big_Buck_Bunny_1080_10s_1MB.h264
  3. I added --input "C:\users\joe\desktop\delete\Big_Buck_Bunny_1080_10s_1MB.h264" as the Application Arguments in the Visual Studio program debug panel.
  4. Pressed F5.

And for me the video came out just fine.

2 questions:

1: Is the video the same?

When I run ffmpeg -i Big_Buck_Bunny_1080_10s_1MB.h264 I get this:

Input #0, h264, from 'Big_Buck_Bunny_1080_10s_1MB.h264':
  Duration: N/A, bitrate: N/A
    Stream #0:0: Video: h264 (High), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1200k tbn, 60 tbc

2: Does using host memory work?

Pass in --host-memory to the playback and see if it changes anything.

A hypothesis

Technically there's a few layers of decoding happening here. Any one of them having the wrong byte order, the wrong color format, the wrong pixel size, wrong chroma/luma pitch, the wrong stride, etc, can cause issues.

  1. The NAL frames being turned into Nv12 frames by NvEnc.
  2. That frame being copied from the data buffer into our own. (VideoDisplayCallback)
  3. The Nv12 frame being converted to "BGRA32." (DecodeToDeviceRgba32)
  4. That frame being copied to our window. (DisplayWindow.FrameArrivedDevice)

--host-memory changes the above path a good deal which is why I'm curious if it works. It still decodes on GPU, but it copies the GPUs result into system memory, which is a requirement anyways if you're looking to access the pixels directly in any way (vs only displaying them, in which case the pixels never need to leave the GPU).

The "// copy luma" and "// copy chroma" parts could be going wrong here?

Is your display set to 10bit color? If so, can you turn that off to test?

jlennox commented 3 years ago

I did not see your previous message before I sent the previous one I just wrote.

To confirm I understood -- this is working fine in the C++ example?

I believe my code has made assumptions about the input and output, but I did do my best to cover everything I could. Since it has been ~18 months I do not recall where or what they are, though I do usually leave code comments when that's the case.

I'm presuming something is happening in an early or later layer than DecodeToDeviceRgba32 but I could be wrong. I suspect the way the luma/chroma are copied in VideoDisplayCallback (but that is just suspicion).

If you do figure this out that would be great. I would help more if I had a local reproduction of the problem.

How are you dumping the hex from Nv12ToBGRA32? I will grab what my local one gives.

vlt-arizona commented 3 years ago
  1. As far as I know, the video is the same. I passed that video to ffmpeg to create the raw video.
  2. No. When I put in --host-memory, it fails with the following error: System.ArgumentOutOfRangeException: 'Unsupported memory type. (Parameter 'MemoryType') Actual value was Host.' Yes, it's working in Nvidia's C++ example. As for dumping the hex, I added the following code in SaveAsBitmap under DecodeToHostRgb32: IntPtr ptr = locked.Scan0; int bytes = width height 4; byte[] rgbValues = new byte[bytes]; System.Runtime.InteropServices.Marshal.Copy(ptr, rgbValues, 0, bytes); In VisualStudio, I got the bytes by inspecting rgbValues. This is running Visual Studio 2019 on 64-bit Windows 10.
jlennox commented 3 years ago

Interesting. I got matching output to the C++ sample.

So by testing the bitmap output, you have confirmed that it's not fixed by following a predominately host memory format (it has to copy to and decode in host memory to be able for the CPU to create a bitmap). I must of broke the fully host memory code path at some point (woops).

I do still suspect it's an issue with the way chroma or luma is being copied or told to decode. Would you mind running that ffmpeg command I sent before? yuv420p is the part I am most interested in. Something like 422 would be a pretty large deviation.

vlt-arizona commented 3 years ago

Do you mean this command? ffmpeg -i Big_Buck_Bunny_1080_10s_1MB.h264 My output is very similar to yours: Input #0, h264, from 'Big_Buck_Bunny_1080_10s_1MB.h264': Duration: N/A, bitrate: N/A Stream #0:0: Video: h264 (High), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1200k tbn

jlennox commented 3 years ago

I am at a loss unfortunately.

The places I'd inspect behaving differently between the the two are if there's differences between the CuVideoParserParams structs and the CuVideoProcParams structs (this might be a prime suspect). I suspect the NV12->BGRA portions are fine.

If you do figure this out or have any thing you want tested on my end, let me know!

vlt-arizona commented 3 years ago

That makes sense. I will check those out. Thanks!

vlt-arizona commented 3 years ago

Something I forgot to mention: In the method YuvToRgbKernel in Colorspace.cu, the following lines gave me errors: (RgbIntx2 )pDst = RgbIntx2 { YuvToRgbForPixel(l0.x, ch.x, ch.y).d, YuvToRgbForPixel(l0.y, ch.x, ch.y).d, }; (RgbIntx2 )(pDst + nRgbPitch) = RgbIntx2 { YuvToRgbForPixel(l1.x, ch.x, ch.y).d, YuvToRgbForPixel(l1.y, ch.x, ch.y).d, }; For each line containing YuvToRgbForPixel I saw this. Error invalid narrowing conversion from "unsigned short" to "unsigned char" NvEncSharp.Cuda.Library c:\develop\thirdparty\nvencsharp\src\nvencsharp.cuda.library\colorspace.cu 116 etc. To get it compiling, I put a cast to (unsigned char) at the beginning of each of the 4 lines containing YuvToRgbForPixel. This allowed the code to compile, but it could very well be part of my problem. Have you encountered this issue? Thanks! Vaughn

vlt-arizona commented 3 years ago

I added these casts to colorspace.cu in the Nvidia project, and the blue-channel-only problem appeared. This is a problem I caused when I first built the project, naively thinking it was just one of those dumb strict typing errors -- yet I don't understand why this code compiles in Nvidia's AppDecD3D project but not in NvEncSharp.

vlt-arizona commented 3 years ago

I don't understand why, but even after changing all the settings on your example project to be the same as the Nvidia sample project, the same code in ColorSpace.cu compiles differently! I don't know how the cu files get pre-compiled; maybe they've changed or fixed something. In any case, by hacking the code in YuvToRgbForPixel in ColorSpace.cu I got it to work: uint8_t pSrc = pYuv + x sizeof(YuvUnitx2) / 2 + y nYuvPitch; uint32_t pDst1 = (uint32_t )(pRgb + x sizeof(Rgb) + y nRgbPitch); uint32_t pDst2 = (uint32_t)(pRgb + x sizeof(Rgb) + (y + 1) nRgbPitch); YuvUnitx2 l0 = (YuvUnitx2 )pSrc; YuvUnitx2 l1 = (YuvUnitx2 )(pSrc + nYuvPitch); YuvUnitx2 ch = (YuvUnitx2 )(pSrc + (nHeight - y / 2) nYuvPitch); pDst1++ = YuvToRgbForPixel(l0.x, ch.x, ch.y).d; pDst1 = YuvToRgbForPixel(l0.y, ch.x, ch.y).d; pDst2++ = YuvToRgbForPixel(l1.x, ch.x, ch.y).d; pDst2 = YuvToRgbForPixel(l1.y, ch.x, ch.y).d;

Not sure how we could prevent this problem from happening elsewhere. I'll let you know if I figure out how to make sure the Nvidia code compiles correctly. Thanks for all your help!

jlennox commented 2 years ago

@vlt-arizona I'm glad you were able to find the issue. Any luck with this working now?

If --host-memory was fixed it would work around this because it uses libyuv for that, but that becomes the biggest bottleneck in processing by far.

vlt-arizona commented 2 years ago

It works fine with the changes to the ColorSpace.cu file, though I'd rather get it working with the original code. I'll investigate this further when I get time. I don't yet understand the tradeoffs between using device memory and host memory. Our priority is to offload as much processing as we can to the GPU.

jlennox commented 2 years ago

Host memory means normal system memory, device memory means GPU memory.

Only the CPU can access host memory, only the GPU can access device memory. If you're looking to optimize for speed and GPU offloading, then you're def on the right track.

vlt-arizona commented 2 years ago

I'm glad to hear it. My question was why one would want to use host memory if device memory is faster.

jlennox commented 2 years ago

It's incase you need CPU access. For example, for saving it to bitmap, or doing some sort of CPU based processing.

My intended use for this was to make a modern VNC type program, so I needed to send the encoded frames over the network. Which is why you see a sample project for encoding the screen, and another for decoding raw h264 :)

vlt-arizona commented 2 years ago

That makes sense. Your project has been very helpful; much more straightforward than FFMPEG's Nvidia driver code.