intel / media-driver

Intel Graphics Media Driver to support hardware decode, encode and video processing.
https://github.com/intel/media-driver/wiki
Other
1.01k stars 347 forks source link

[Bug]: Corrupted H264 Encode on DG2 #1685

Closed VMFortress closed 1 year ago

VMFortress commented 1 year ago

Which component impacted?

Encode

Is it regression? Good in old configuration?

No, this issue exist a long time

What happened?

It seems DG2 GPUs create corrupted output when using hardware encode on Chromium-based browsers. This does not appear to be an issue on other Intel platforms, including Skylake.

Steps to Reproduce: (Prerequisites: webcam or similar device, DG2 GPU) 1) Use latest libva, gmmlib, and media-driver compiled from source as of June 15th, 2023. 2) Launch a Chromium 114 based browser with the following command: LIBVA_DRIVER_NAME=iHD chromium --enable-accelerated-video-encode --enable-features=VaapiVideoEncoder 3) Navigate to https://codec-compare.glitch.me/ 4) Leave all settings at default and hit 'Start' 5) Allow webcam permissions in Chromium 6) Observe input compared to single frame of corrupted output 7) Very low usage (typically <1%) of Video and VideoEnhance engines can be seen in intel_gpu_top.

The issue appears in all H264 codecs available but does not occur on AV1 or VP9 codecs in same case.

Additional debugging details provided below.

What's the usage scenario when you are seeing the problem?

Transcode for media delivery

What impacted?

Any websites or Electron-based applications using H264 encoding will be unable to use acceleration. A mainstream example is Discord screenshare running within Chromium.

Debug Information

Logs:

Additional debug info:

I am happy to provide any other details requested or peform any tests that may help.

Do you want to contribute a patch to fix the issue?

None

nyanmisaka commented 1 year ago

You don't need a webcam. Using screen recording plugin can trigger the issue on DG2. https://chrome.google.com/webstore/detail/screen-recorder/hniebljpgcogalllopnjokppmgbhaden

And it produces broken bitstream that cannot be decoded by FFmpeg. screen-capture.7z.zip

[h264 @ 0000021658e5c940] cbp too large (3199971767) at 3 0
[h264 @ 0000021658e5c940] error while decoding MB 3 0
[h264 @ 0000021658e5c940] concealing 8160 DC, 8160 AC, 8160 MV errors in P frame
[h264 @ 0000021658e5c940] cbp too large (3199971767) at 3 0
[h264 @ 0000021658e5c940] error while decoding MB 3 0
[h264 @ 0000021658e5c940] concealing 8160 DC, 8160 AC, 8160 MV errors in P frame
[h264 @ 0000021658e5c940] cbp too large (3199971767) at 3 0
[h264 @ 0000021658e5c940] error while decoding MB 3 0
[h264 @ 0000021658e5c940] concealing 8160 DC, 8160 AC, 8160 MV errors in P frame
[h264 @ 0000021658e5c940] cbp too large (3199971767) at 3 0
[h264 @ 0000021658e5c940] error while decoding MB 3 0
[h264 @ 0000021658e5c940] concealing 8160 DC, 8160 AC, 8160 MV errors in P frame
[h264 @ 0000021658e5c940] cbp too large (3199971767) at 3 0
[h264 @ 0000021658e5c940] error while decoding MB 3 0
[h264 @ 0000021658e5c940] concealing 8160 DC, 8160 AC, 8160 MV errors in P frame
[h264 @ 0000021658e5c940] cbp too large (3199971767) at 3 0
[h264 @ 0000021658e5c940] error while decoding MB 3 0
[h264 @ 0000021658e5c940] concealing 8160 DC, 8160 AC, 8160 MV errors in P frame
Input #0, matroska,webm, from 'R:\screen-capture (1).webm':
  Metadata:
    encoder         : Chrome
  Duration: N/A, start: 0.000000, bitrate: N/A
  Stream #0:0(eng): Video: h264 (Constrained Baseline), yuv420p(progressive), 1920x1080, SAR 1:1 DAR 16:9, 30.30 fps, 25 tbr, 1k tbn (default)

I doubt there could be some incompatible parameters between the chromium::h264_vaapi_video_encoder_delegate.cc and the DG2 settings in iHD driver.

Sherry-Lin commented 1 year ago

@daijh could you help take a look?

VMFortress commented 1 year ago

I found another, slightly more repeatable test that may help finding the issue: https://canonical.github.io/inbrowser-encode-test/

It provides a hardware encode of the same short clip each time and provides a webm for download once finished. It may make it easier for direct comparison.

I attached example h264 encodes from it for DG2 and Skylake here: h264_webm.7z.zip

daijh commented 1 year ago

@VMFortress could you report the issue to chromium as well? Issues - chromium: https://bugs.chromium.org/p/chromium/issues/list

daijh commented 1 year ago

Could you provide the detailed chromium logs, and perhaps VA logs on DG2?

VMFortress commented 1 year ago

@daijh Chromium issue can be found here: https://bugs.chromium.org/p/chromium/issues/detail?id=1458942

Running LIBVA_TRACE=./dg2_trace.log chromium --enable-accelerated-video-encode --enable-features=VaapiVideoEncoder --vmodule=*/vaapi/*=4,*/media/*=4 --enable-logging=stderr > ./dg2_chromium.log 2>&1:

daijh commented 1 year ago

Sorry, I don't have a DG2 at hand. The logs showed the iHD driver has error to create the RGBA VASurface. There are indeed a few fixes were landed on both Chromium and iHD regarding to this specify issue.

[1] https://github.com/intel/media-driver/issues/1210 [2] https://chromium-review.googlesource.com/c/chromium/src/+/4637884

Suggest to include those patches to check this issue.

VMFortress commented 1 year ago

@daijh I had the chance to test again with the latest intel-media-driver and chromium 117.0.5875.0 which should contain the patches mentioned. Unfortunately, there doesn't not seem to be any improvement. Doing the canonical test again, VP9 encodes fine but H264 results in the corrupted output on DG2 but not on Skylake.

Here are the same logs as before for this test:

DenWolf commented 1 year ago

Hi @VMFortress ,

Please help to try this patch - if it will resolve this issue, we will prepare the official fix.

potential_fix_for_ChromeOs.patch

Need to change one line - https://github.com/intel/media-driver/blob/03f4cde9fb8a88b9b12a9edd5fd5b43f71226ee5/media_driver/linux/common/codec/ddi/media_ddi_encode_avc.cpp#L1299

from "seqParams->GopPicSize = seq->intra_period;" to "seqParams->GopPicSize = seq->intra_period ? seq->intra_period : seq->intra_idr_period;"

If it will not help - we will continue the analysis.

Best regards, Denis

intel-mediadev commented 1 year ago

Auto Created VSMGWL-66368 for further analysis.

nyanmisaka commented 1 year ago

potential_fix_for_ChromeOs.patch

It does the trick. No issue anymore in my aforementioned screen recording plugin testing.

VMFortress commented 1 year ago

@DenWolf I can confirm this resolves all my test cases as well!

DenWolf commented 1 year ago

Great! @nyanmisaka , @VMFortress - thank you very much for a very quick verification and confirming that fix is working on your side! I've started to prepare the official fix, I will provide an updates based here on my progress.

DenWolf commented 1 year ago

Hi @VMFortress , @nyanmisaka , Fix has been merged into Media driver repo - https://github.com/intel/media-driver/commit/02a5604273ecf385efca0acf16679aec650c95b7

Please help to verify it (or the latest Media Driver OS code-base - for now it's the latest commit) on your side again, thank you.

Best regards, Denis

VMFortress commented 1 year ago

@DenWolf I compiled the latest source and can confirm again it seems to be working in all my cases.

Thanks for the support! It is very appreciated!

nyanmisaka commented 1 year ago

@DenWolf So for so good. Thanks for your help!

Sherry-Lin commented 1 year ago

Fixed in https://github.com/intel/media-driver/commit/02a5604273ecf385efca0acf16679aec650c95b7 so close it. Please feel free to reopen it if the issue is not resolved on your environment.

DenWolf commented 1 year ago

Great! @VMFortress , @nyanmisaka - thank you very much for your feedback and verification support!