intel / media-driver

Intel Graphics Media Driver to support hardware decode, encode and video processing.
https://github.com/intel/media-driver/wiki
Other
965 stars 344 forks source link

GPU hangs while encoding with Ubuntu repo driver #733

Closed MonkeySon closed 3 years ago

MonkeySon commented 4 years ago

Hi,

currently I am on an Ubuntu Server 19.04 machine with the J4105 Gemini Lake CPU. I followed these wiki guides to get the Media SDK working with ffmpeg:

https://github.com/Intel-Media-SDK/MediaSDK/wiki/Intel-media-stack-on-Ubuntu https://github.com/Intel-Media-SDK/MediaSDK/wiki/Build-and-use-ffmpeg-with-MediaSDK

I got all libraries/drivers from the repository and built ffmpeg with the necessary configuration flags.

The decoding examples run well but encoding fails:

MFX sample_encode:

GPU hang happened
[ERROR], sts=MFX_ERR_GPU_HANG(-21), Run, m_pmfxENC->EncodeFrameAsync failed at /build/intel-mediasdk-AzMSof/intel-mediasdk-18.4.1/samples/sample_encode/src/pipeline_encode.cpp:2085
[ERROR], sts=MFX_ERR_GPU_HANG(-21), main, pPipeline->Run failed at /build/intel-mediasdk-AzMSof/intel-mediasdk-18.4.1/samples/sample_encode/src/sample_encode.cpp:1364
Frame number: 2

ffmpeg encoding/transcoding:

[hevc_qsv @ 0x55a354f29340] Unknown FrameType, set pict_type to AV_PICTURE_TYPE_NONE.
[hevc @ 0x55a354fa5f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 3 >= 0
[hevc_qsv @ 0x55a354f29340] Error during encoding: unknown error (-21)
Video encoding failed

Is this a problem with an old version of the driver? According to apt, I am currently using:

Package: libmfx1
Version: 18.4.1-0ubuntu1

Package: intel-media-va-driver
Version: 18.4.1+dfsg1-2ubuntu1

Kernel version:

uname -a
Linux ... 5.0.0-29-generic #31-Ubuntu SMP Thu Sep 12 13:05:32 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

I opened an issue at the MediaSDK but apparently this is an user mode driver issue and got sent over here.

dvrogozh commented 4 years ago

postpone this, please. see next comment

Intel-Media-SDK/MediaSDK#1663 - original issue on mediasdk side

@MonkeySon : wouldn't you mind to rebuild the stack and try with the latest code? Here are instructions:

git clone https://github.com/intel/gmmlib.git && cd gmmlib
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_LIBDIR=/usr/lib/x86_64-linux-gnu ..
make -j8
sudo make install

git clone https://github.com/intel/libva.git && cd libva
./autogen.sh --prefix=/usr --libdir=/usr/lib/x86_64-linux-gnu 
make -j8
sudo make install

git clone https://github.com/intel/media-driver.git && cd media-driver
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_LIBDIR=/usr/lib/x86_64-linux-gnu ..
make -j8
sudo make install

git clone https://github.com/Intel-Media-SDK/MediaSDK.git && cd MediaSDK
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_LIBDIR=/usr/lib/x86_64-linux-gnu ..
make -j8
sudo make install

You might need to apt-get some development packages to satisfy dependencies.

dvrogozh commented 4 years ago

Before you will attempt to rebuild the driver, could you, please, try to switch to intel-media-va-driver-non-free driver and try again? I.e.: apt-get install intel-media-va-driver-non-free

Also, please, post exact command line you are running with mediasdk?

MonkeySon commented 4 years ago

Before you will attempt to rebuild the driver, could you, please, try to switch to intel-media-va-driver-non-free driver and try again? I.e.: apt-get install intel-media-va-driver-non-free

Also, please, post exact command line you are running with mediasdk?

@dvrogozh

Okay, i re-run the configuration with following commands and results:

Install Media SDK stack:

sudo apt update
sudo apt install libva-dev libva-drm2 libmfx-dev libmfx-tools intel-media-va-driver-non-free

BTW: I am using a headless server, so i installed the libva-drm2 lib

Build ffmpeg:

according to: https://github.com/Intel-Media-SDK/MediaSDK/wiki/Build-and-use-ffmpeg-with-MediaSDK https://trac.ffmpeg.org/wiki/CompilationGuide/Ubuntu

sudo apt install autoconf automake build-essential cmake pkg-config
git clone https://github.com/ffmpeg/ffmpeg
cd ffmpeg
./configure --arch=x86_64 --disable-yasm --enable-vaapi --enable-libmfx

Status checking

export LIBVA_DRIVER_NAME=iHD

vainfo
error: XDG_RUNTIME_DIR not set in the environment.
error: can't connect to X server!
libva info: VA-API version 1.4.0
libva info: va_getDriverName() returns 0
libva info: User requested driver 'iHD'
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_4
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.4 (libva 2.4.0)
vainfo: Driver version: Intel iHD driver - 1.0.0
vainfo: Supported profile and entrypoints
      ...
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointEncSlice
      VAProfileH264Main               : VAEntrypointFEI
      VAProfileH264Main               : VAEntrypointEncSliceLP
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointEncSlice
      VAProfileH264High               : VAEntrypointFEI
      VAProfileH264High               : VAEntrypointEncSliceLP
      ...

~/ffmpeg# ./ffmpeg -encoders | grep 264
ffmpeg version N-95086-g84974c6fb5 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 8 (Ubuntu 8.3.0-6ubuntu1)
  configuration: --arch=x86_64 --disable-yasm --enable-vaapi --enable-libmfx
  libavutil      56. 35.100 / 56. 35.100
  libavcodec     58. 59.100 / 58. 59.100
  libavformat    58. 33.100 / 58. 33.100
  libavdevice    58.  9.100 / 58.  9.100
  libavfilter     7. 59.100 /  7. 59.100
  libswscale      5.  6.100 /  5.  6.100
  libswresample   3.  6.100 /  3.  6.100
 V..... h264_qsv             H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (Intel Quick Sync Video acceleration) (codec h264)
 V..... h264_v4l2m2m         V4L2 mem2mem H.264 encoder wrapper (codec h264)
 V..... h264_vaapi           H.264/AVC (VAAPI) (codec h264)

Testing:

MediaSDK decoding examples work, ffmpeg prints errors but finishes anyway:


~/ffmpeg/test# ../ffmpeg -hwaccel qsv -c:v h264_qsv -i AUD_MW_E.264 -vf hwdownload,format=nv12 -pix_fmt yuv420p AUD_MW_E_ffmpeg.yuv
ffmpeg version N-95086-g84974c6fb5 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 8 (Ubuntu 8.3.0-6ubuntu1)
  configuration: --arch=x86_64 --disable-yasm --enable-vaapi --enable-libmfx
  libavutil      56. 35.100 / 56. 35.100
  libavcodec     58. 59.100 / 58. 59.100
  libavformat    58. 33.100 / 58. 33.100
  libavdevice    58.  9.100 / 58.  9.100
  libavfilter     7. 59.100 /  7. 59.100
  libswscale      5.  6.100 /  5.  6.100
  libswresample   3.  6.100 /  3.  6.100
Input #0, h264, from 'AUD_MW_E.264':
  Duration: N/A, bitrate: N/A
    Stream #0:0: Video: h264 (Constrained Baseline), yuv420p(progressive), 176x144, 25 fps, 25 tbr, 1200k tbn, 50 tbc
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (h264_qsv) -> rawvideo (native))
Press [q] to stop, [?] for help
Output #0, rawvideo, to 'AUD_MW_E_ffmpeg.yuv':
  Metadata:
    encoder         : Lavf58.33.100
    Stream #0:0: Video: rawvideo (I420 / 0x30323449), yuv420p, 176x144, q=2-31, 7603 kb/s, 25 fps, 25 tbn, 25 tbc
    Metadata:
      encoder         : Lavc58.59.100 rawvideo
[h264_qsv @ 0x55ee749bd600] A decode call did not consume any data: expect more data at input (-10)
    Last message repeated 2 times
frame=  100 fps=0.0 q=-0.0 Lsize=    3712kB time=00:00:04.28 bitrate=7105.8kbits/s speed=  28x    
video:3712kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000%

~/ffmpeg/test# ls -l
total 7488
-rw-r--r-- 1 root root   54828 Jan 19  2008 AUD_MW_E.264
-rw-r--r-- 1 root root 3801600 Sep 26 13:41 AUD_MW_E.yuv
-rw-r--r-- 1 root root 3801600 Sep 26 13:45 AUD_MW_E_ffmpeg.yuv

Encoding tests:

~/ffmpeg/test# /usr/share/mfx/samples/sample_encode h264 -w 176 -h 144 -f 30 -b 3000 -i AUD_MW_E.yuv -o encoded_AUD_MW_E.264
libva info: VA-API version 1.4.0
libva info: va_getDriverName() returns 0
libva info: User requested driver 'iHD'
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_4
libva info: va_openDriver() returns 0
Encoding Sample Version 8.3.26.

Input file format   YUV420
Output video        AVC 
Source picture:
    Resolution  176x144
    Crop X,Y,W,H    0,0,176,144
Destination picture:
    Resolution  176x144
    Crop X,Y,W,H    0,0,176,144
Frame rate  30.00
Bit rate(Kbps)  3000
Gop size    0
Ref dist    0
Ref number  0
Idr Interval    0
Target usage    balanced
Memory type system
Media SDK impl      hw
Media SDK version   1.28

Processing started
GPU hang happened

[ERROR], sts=MFX_ERR_GPU_HANG(-21), Run, m_pmfxENC->EncodeFrameAsync failed at /build/intel-mediasdk-AzMSof/intel-mediasdk-18.4.1/samples/sample_encode/src/pipeline_encode.cpp:2085

[ERROR], sts=MFX_ERR_GPU_HANG(-21), main, pPipeline->Run failed at /build/intel-mediasdk-AzMSof/intel-mediasdk-18.4.1/samples/sample_encode/src/sample_encode.cpp:1364
Frame number: 2

~/ffmpeg/test# ../ffmpeg -loglevel debug -init_hw_device qsv=hw -filter_hw_device hw -f rawvideo -pix_fmt yuv420p -s:v 176x144 -i AUD_MW.yuv -vf hwupload=extra_hw_frames=64,format=qsv -c:v h264_qsv -b:v 5M -frames:v 10 -y ./encoded_AUD_MW_E_ffmpeg.h264

Whole output: https://gist.github.com/MonkeySon/66ea736ac8a194df197198cbb0f090db But essentially it is: [h264_qsv @ 0x55791582c000] Error during encoding: unknown error (-21)

Tests with intel-media-va-driver (free)

~/ffmpeg/test# /usr/share/mfx/samples/sample_encode h264 -w 176 -h 144 -f 30 -b 3000 -i AUD_MW_E.yuv -o encoded_AUD_MW_E.264
libva info: VA-API version 1.4.0
libva info: va_getDriverName() returns 0
libva info: User requested driver 'iHD'
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_4
libva info: va_openDriver() returns 0
sample_encode: /build/intel-mediasdk-AzMSof/intel-mediasdk-18.4.1/_studio/mfx_lib/shared/src/mfx_h264_encode_vaapi.cpp:1422: virtual mfxStatus MfxHwH264Encode::VAAPIEncoder::CreateAuxilliaryDevice(VideoCORE*, GUID, mfxU32, mfxU32, bool): Assertion `0x00000000 == vaSts' failed.
Aborted (core dumped)

ffmpeg error: ffmpeg: /build/intel-mediasdk-AzMSof/intel-mediasdk-18.4.1/_studio/shared/src/mfx_vpp_vaapi.cpp:242: mfxStatus MfxHwVideoProcessing::VAAPIVideoProcessing::Init(void**, mfxVideoParam*): Assertion `0x00000000 == vaSts' failed.

MonkeySon commented 4 years ago

Short update on the topic: Today I upgraded my server from Ubuntu 19.04 to 19.10 but the errors are still the same as before. Following versions are now installed:

Package: libmfx1
Version: 19.2.1-1
Package: intel-media-va-driver
Version: 19.2.1+dfsg1-2ubuntu1

UPDATE: Building all libs from source did not bring any improvements, errors are the same

felixbuenemann commented 4 years ago

I can confirm that iHD h.264 hardware encoding is completely broken on Gemini Lake.

I'm running Ubuntu 19.10, Kernel 5.3.0-23-generic x86_64 on Pentium Silver J5005 with 32 GB RAM.

Kernel Logs:

[    0.268498] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.3.0-23-generic root=UUID=03d19249-aae3-11e8-ab94-7085c27f46b4 ro mitigations=off scsi_mod.use_blk_mq=1 mem_sleep_default=s2idle zswap.enabled=1 zswap.compressor=lz4 zswap.max_pool_percent=20 zswap.zpool=z3fold i915.fastboot=1 i915.modeset=1 i915.enable_guc=2
[    0.528842] smpboot: CPU0: Intel(R) Pentium(R) Silver J5005 CPU @ 1.50GHz (family: 0x6, model: 0x7a, stepping: 0x1)
[    4.350943] i915 0000:00:02.0: vgaarb: deactivate vga console
[    4.351156] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    4.354131] [drm] Finished loading DMC firmware i915/glk_dmc_ver1_04.bin (v1.4)
[    4.738301] mei_hdcp mei::b638ab7e-94e2-4ea2-a552-d1c54b627f04:01: bound 0000:00:02.0 (ops i915_hdcp_component_ops [i915])
[    5.468049] [drm] HuC: Loaded firmware i915/glk_huc_ver03_01_2893.bin (version 3.1)
[    5.473910] [drm] GuC: Loaded firmware i915/glk_guc_32.0.3.bin (version 32.0)
[    5.480545] i915 0000:00:02.0: GuC firmware version 32.0
[    5.480547] i915 0000:00:02.0: GuC submission disabled
[    5.480548] i915 0000:00:02.0: HuC enabled
[    5.483451] [drm] Initialized i915 1.6.0 20190619 for 0000:00:02.0 on minor 0

Tested driver combinations:

Using Ubuntu packages:

Package: libmfx1
Version: 19.2.1-1
Package:  intel-media-va-driver-non-free
Version: 19.2.1+ds1-2ubuntu1
LIBVA_DRIVER_NAME=iHD /usr/share/mfx/samples/sample_encode h264   -w 176 -h 144 -f 30 -cqp -qpi 30 -qpp 30 -qpb 30 -qsv-ff    -i AUD_MW_E.yuv -o encoded_AUD_MW_E.264
libva info: VA-API version 1.5.0
libva info: va_getDriverName() returns 0
libva info: User requested driver 'iHD'
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_5
libva info: va_openDriver() returns 0
Encoding Sample Version 8.4.27.

Input file format   YUV420
Output video        AVC
Source picture:
    Resolution  176x144
    Crop X,Y,W,H    0,0,176,144
Destination picture:
    Resolution  176x144
    Crop X,Y,W,H    0,0,176,144
Frame rate  30.00
QPI 30
QPP 30
QPB 30
Gop size    0
Ref dist    1
Ref number  0
Idr Interval    0
Target usage    balanced
Memory type system
Media SDK impl      hw
Media SDK version   1.30

Processing started

[ERROR], sts=MFX_ERR_GPU_HANG(-21), Run, m_pmfxENC->EncodeFrameAsync failed at /build/intel-mediasdk-3Sb0AY/intel-mediasdk-19.2.1/samples/sample_encode/src/pipeline_encode.cpp:2210

[ERROR], sts=MFX_ERR_GPU_HANG(-21), main, pPipeline->Run failed at /build/intel-mediasdk-3Sb0AY/intel-mediasdk-19.2.1/samples/sample_encode/src/sample_encode.cpp:1587
Frame number: 0

Using latest builds:

LD_LIBRARY_PATH=/usr/local/lib LIBVA_DRIVERS_PATH=/usr/local/lib/dri LIBVA_DRIVER_NAME=iHD /usr/share/mfx/samples/sample_encode h264   -w 176 -h 144 -f 30 -cqp -qpi 30 -qpp 30 -qpb 30 -qsv-ff    -i AUD_MW_E.yuv -o encoded_AUD_MW_E.264
libva info: VA-API version 1.6.0
libva info: va_getDriverName() returns 0
libva info: User requested driver 'iHD'
libva info: Trying to open /usr/local/lib/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_6
libva info: va_openDriver() returns 0
Encoding Sample Version 8.4.27.

Input file format   YUV420
Output video        AVC
Source picture:
    Resolution  176x144
    Crop X,Y,W,H    0,0,176,144
Destination picture:
    Resolution  176x144
    Crop X,Y,W,H    0,0,176,144
Frame rate  30.00
QPI 30
QPP 30
QPB 30
Gop size    0
Ref dist    1
Ref number  0
Idr Interval    0
Target usage    balanced
Memory type system
Media SDK impl      hw
Media SDK version   1.30

Processing started

[ERROR], sts=MFX_ERR_GPU_HANG(-21), Run, m_pmfxENC->EncodeFrameAsync failed at /build/intel-mediasdk-3Sb0AY/intel-mediasdk-19.2.1/samples/sample_encode/src/pipeline_encode.cpp:2210

[ERROR], sts=MFX_ERR_GPU_HANG(-21), main, pPipeline->Run failed at /build/intel-mediasdk-3Sb0AY/intel-mediasdk-19.2.1/samples/sample_encode/src/sample_encode.cpp:1587
Frame number: 0

I also tested with plexmediaserver 1.18.2.2041-3d469cb32 which includes the iHD driver:

LD_LIBRARY_PATH=/usr/lib/plexmediaserver/lib LIBVA_DRIVERS_PATH=/usr/lib/plexmediaserver/lib/dri LIBVA_DRIVER_NAME=iHD vainfo
libva info: VA-API version 1.5.0
libva info: va_getDriverName() returns 0
libva info: User requested driver 'iHD'
libva info: Trying to open /usr/lib/plexmediaserver/lib/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_5
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.5 (libva 2.5.0)
vainfo: Driver version: Intel iHD driver - 1.0.0

Hardware transcoding in Plex Transcoder either completely hand or produces very broken output, which hangs most of the time but sometimes decodes a few frames with visual artifacts.

Hardware transcoding works fine with LIBVA_DRIVER_NAME=i965 and i965 driver 2.3.0.


If there are any useful debugging steps, please let me know.

dvrogozh commented 4 years ago

Can you, please, reboot the system, reproduce gpu hang and get the following i915 error state file: cp /sys/class/drm/card0/error i915_error_state

Please, attach it to this issue as-is (avoid posting content in comment - file needs some decoding).

felixbuenemann commented 4 years ago

@dvrogozh Do I need an additional kernel flag for that to work?

cat /sys/class/drm/card0/error
No error state collected

Even though sample_encode reported:

Processing started
Frame number: 1
[ERROR], sts=MFX_ERR_GPU_HANG(-21), SynchronizeFirstTask, SyncOperation failed at /build/intel-mediasdk-3Sb0AY/intel-mediasdk-19.2.1/samples/sample_encode/src/pipeline_encode.cpp:157

[ERROR], sts=MFX_ERR_GPU_HANG(-21), GetFreeTask, m_TaskPool.SynchronizeFirstTask failed at /build/intel-mediasdk-3Sb0AY/intel-mediasdk-19.2.1/samples/sample_encode/src/pipeline_encode.cpp:1993

[ERROR], sts=MFX_ERR_GPU_HANG(-21), Run, m_pmfxENC->EncodeFrameAsync failed at /build/intel-mediasdk-3Sb0AY/intel-mediasdk-19.2.1/samples/sample_encode/src/pipeline_encode.cpp:2210

[ERROR], sts=MFX_ERR_GPU_HANG(-21), main, pPipeline->Run failed at /build/intel-mediasdk-3Sb0AY/intel-mediasdk-19.2.1/samples/sample_encode/src/sample_encode.cpp:1587
Frame number: 1

The first two errors don't show up on subsequent runs.

felixbuenemann commented 4 years ago

I found the i915.error_capture flag, will retry with that.

felixbuenemann commented 4 years ago

Even with i915.fastboot=1 i915.modeset=1 i915.enable_guc=2 i915.error_capture=1 i915.verbose_state_checks=1 I'm still getting No error state collected.

I don't think the GPU gets actually reset:

cat /sys/kernel/debug/dri/0/i915_reset_info
full gpu reset = 0
rcs0 = 0
bcs0 = 0
vcs0 = 0
vecs0 = 0
dvrogozh commented 4 years ago

Hm. That's interesting. Maybe that's not a gpu hang actually, but some other error met in mediasdk or driver and returned back to application level as GPU hang error status.

Let's double check though. Basically you don't need any additional i915 module parameters to get gpu hang error state. The only thing you need to have is enabled debugfs. Some questions to double check:

  1. Do you have single graphics card on the system? Any chance you looked into incorrect card?
  2. If that's a real GPU hang you should see 2 things: 2.1. dmesg should have a corresponding log. Here is example:
    [2436684.992514] i915 0000:00:02.0: GPU HANG: ecode 9:0:0x00000000, hang on rcs0
    [2436684.992515] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
    [2436684.992515] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
    [2436684.992516] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
    [2436684.992516] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
    [2436684.992517] [drm] GPU crash dump saved to /sys/class/drm/card0/error
    [2436684.993527] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0

    2.2. Error state should be dumped. dmesg actually says where to look for it (/sys/class/drm/card0/error). Copy the file. Both dmesg log and i915 error state are essential for debug.

Let's double check whether that's the real gpu hang. If it is not - we need to find the place in the stack which actually returned an error. We will appreciate if you will help to do that. To narrow down the place, try to enable Libva trace dump. Create the file /etc/libva.conf with the content: LIBVA_TRACE=/path/to/trace/file

Mind that /path/to/trace should be existing folder. And file will be a prefix to generated output. Mind that one application can dump few trace files (due to threads).

Please, attach libva trace files. And let's look for errors in them.

dvrogozh commented 4 years ago

By the way, what is your Geminilake device id? Can you, please, provide the dump of lspci -nn command?

felixbuenemann commented 4 years ago

It's a Pentium Silver J5005, so it has the UHD Graphics 605 (8086:3184):

lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Device [8086:31f0] (rev 03)
00:00.1 Signal processing controller [1180]: Intel Corporation Celeron/Pentium Silver Processor Dynamic Platform and Thermal Framework Processor Participant [8086:318c] (rev 03)
00:02.0 VGA compatible controller [0300]: Intel Corporation UHD Graphics 605 [8086:3184] (rev 03)
00:0f.0 Communication controller [0780]: Intel Corporation Celeron/Pentium Silver Processor Trusted Execution Engine Interface [8086:319a] (rev 03)
00:12.0 SATA controller [0106]: Intel Corporation Device [8086:31e3] (rev 03)
00:13.0 PCI bridge [0604]: Intel Corporation Device [8086:31d8] (rev f3)
00:13.1 PCI bridge [0604]: Intel Corporation Device [8086:31d9] (rev f3)
00:13.2 PCI bridge [0604]: Intel Corporation Device [8086:31da] (rev f3)
00:13.3 PCI bridge [0604]: Intel Corporation Device [8086:31db] (rev f3)
00:15.0 USB controller [0c03]: Intel Corporation Device [8086:31a8] (rev 03)
00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:31e8] (rev 03)
00:1f.1 SMBus [0c05]: Intel Corporation Celeron/Pentium Silver Processor Gaussian Mixture Model [8086:31d4] (rev 03)
03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
04:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
dvrogozh commented 4 years ago

Ok, I think this might be mediasdk issue after all. Basically mediasdk is not aware of Geminilake device ids (0x3184) and treats platform as... IVB... executing incorrect code path. Basically there could very strange things happen on such a code path...

Can you, please, try this mediasdk PR: https://github.com/Intel-Media-SDK/MediaSDK/pull/1770 where I add GLK device IDs? I very much hope this will address the issue. If not - please, report the behavior which you see after the change.

felixbuenemann commented 4 years ago

I completely missed the post asking for the libva trace file.

I think your change to mediasdk makes sense, I will try it out.

However encoding is also broken when encoding with ffmpeg using vaapi, where libmfx should not be involbed, so I think it is also a driver issue.

If I encode a 10 second sample from Big Buck Bunny using i965 vaapi driver, it is completely fine. If I encode with iHD only less than 10% of the 300 frames are decodable, the remaining frames are corrupt:

ffmpeg -i bbb_trans_new_ihd.mkv -f null /dev/null
ffmpeg version 4.1.4-1build2 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.2.1-4ubuntu1)
  configuration: --prefix=/usr --extra-version=1build2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 22.100 / 56. 22.100
  libavcodec     58. 35.100 / 58. 35.100
  libavformat    58. 20.100 / 58. 20.100
  libavdevice    58.  5.100 / 58.  5.100
  libavfilter     7. 40.101 /  7. 40.101
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  3.100 /  5.  3.100
  libswresample   3.  3.100 /  3.  3.100
  libpostproc    55.  3.100 / 55.  3.100
Input #0, matroska,webm, from 'bbb_trans_new_ihd.mkv':
  Metadata:
    title           : Big Buck Bunny, Sunflower version
    COMMENT         : Creative Commons Attribution 3.0 - http://bbb3d.renderfarming.net
    MAJOR_BRAND     : isom
    MINOR_VERSION   : 512
    COMPATIBLE_BRANDS: isomiso2avc1mp41
    ARTIST          : Blender Foundation 2008, Janus Bager Kristensen 2013
    COMPOSER        : Sacha Goedegebure
    GENRE           : Animation
    ENCODER         : Lavf58.20.100
  Duration: 00:00:10.00, start: 0.000000, bitrate: 1529 kb/s
    Stream #0:0: Video: h264 (High), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1k tbn, 60 tbc (default)
    Metadata:
      HANDLER_NAME    : VideoHandler
      ENCODER         : Lavc58.35.100 h264_vaapi
      DURATION        : 00:00:10.000000000
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> wrapped_avframe (native))
Press [q] to stop, [?] for help
[h264 @ 0x55861b3b94c0] co located POCs unavailable
[h264 @ 0x55861b380180] co located POCs unavailable
[h264 @ 0x55861b380180] error while decoding MB 44 54, bytestream -6
[h264 @ 0x55861b380180] concealing 1685 DC, 1685 AC, 1685 MV errors in B frame
Output #0, null, to '/dev/null':
  Metadata:
    title           : Big Buck Bunny, Sunflower version
    COMMENT         : Creative Commons Attribution 3.0 - http://bbb3d.renderfarming.net
    MAJOR_BRAND     : isom
    MINOR_VERSION   : 512
    COMPATIBLE_BRANDS: isomiso2avc1mp41
    ARTIST          : Blender Foundation 2008, Janus Bager Kristensen 2013
    COMPOSER        : Sacha Goedegebure
    GENRE           : Animation
    encoder         : Lavf58.20.100
    Stream #0:0: Video: wrapped_avframe, yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn, 30 tbc (default)
    Metadata:
      HANDLER_NAME    : VideoHandler
      DURATION        : 00:00:10.000000000
      encoder         : Lavc58.35.100 wrapped_avframe
[h264 @ 0x55861b37f5c0] co located POCs unavailable
[h264 @ 0x55861b6de540] co located POCs unavailable
[h264 @ 0x55861b3b94c0] error while decoding MB 43 10, bytestream -15
[h264 @ 0x55861b3b94c0] concealing 6966 DC, 6966 AC, 6966 MV errors in P frame
[null @ 0x55861b3576c0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 22 >= 22
[h264 @ 0x55861b37f5c0] co located POCs unavailable
[h264 @ 0x55861b380180] error while decoding MB 24 53, bytestream -6
[h264 @ 0x55861b380180] concealing 1825 DC, 1825 AC, 1825 MV errors in B frame
[h264 @ 0x55861b39cbc0] co located POCs unavailable
[h264 @ 0x55861b3b94c0] error while decoding MB 46 13, bytestream -6
[h264 @ 0x55861b3b94c0] concealing 6603 DC, 6603 AC, 6603 MV errors in P frame
[h264 @ 0x55861b380180] co located POCs unavailable
[h264 @ 0x55861b37f5c0] co located POCs unavailable
[h264 @ 0x55861b6de540] co located POCs unavailable
[h264 @ 0x55861b39cbc0] co located POCs unavailable
[h264 @ 0x55861b3b94c0] error while decoding MB 19 12, bytestream -5
[h264 @ 0x55861b3b94c0] concealing 6750 DC, 6750 AC, 6750 MV errors in P frame
frame=    6 fps=0.0 q=-0.0 Lsize=N/A time=00:00:00.90 bitrate=N/A speed= 3.4x
video:3kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

Looking at the file sizes shows a similar difference:

-rw-rw-r-- 1 felix felix  18M Nov 22 15:13 bbb_trans_new_i965.mkv
-rw-rw-r-- 1 felix felix 1.9M Nov 22 16:51 bbb_trans_new_ihd.mkv

I will get you the trace files from libva you requested, but looking at dmesg there is definitely no GPU hang happening.

felixbuenemann commented 4 years ago

Here is the output from sample_encode with mediasdk master and your patch:

LD_LIBRARY_PATH=/opt/intel/mediasdk/lib:/usr/local/lib LIBVA_DRIVERS_PATH=/usr/local/lib/dri LIBVA_DRIVER_NAME=iHD LIBVA_TRACE=/tmp/libva_trace /opt/intel/mediasdk/share/mfx/samples/sample_encode h264 -w 176 -h 144 -f 30 -cqp -qpi 30 -qpp 30 -qpb 30 -qsv-ff -i AUD_MW_E.yuv -o encoded_AUD_MW_E.264
libva info: Open new log file /tmp/libva_trace.151739.thd-0x00001459 for the thread 0x00001459
libva info: LIBVA_TRACE is on, save log into /tmp/libva_trace.151739.thd-0x00001459
libva info: VA-API version 1.6.0
libva info: va_getDriverName() returns 0
libva info: User requested driver 'iHD'
libva info: Trying to open /usr/local/lib/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_5
libva info: va_openDriver() returns 0
libva info: Save context 0x20000000 into log file /tmp/libva_trace.151739.thd-0x00001459
Encoding Sample Version 8.4.27.

Input file format   YUV420
Output video        AVC
Source picture:
    Resolution  176x144
    Crop X,Y,W,H    0,0,176,144
Destination picture:
    Resolution  176x144
    Crop X,Y,W,H    0,0,176,144
Frame rate  30.00
QPI 30
QPP 30
QPB 30
Gop size    0
Ref dist    1
Ref number  0
Idr Interval    0
Target usage    balanced
Memory type system
Media SDK impl      hw
Media SDK version   1.30

Processing started
libva info: Open new log file /tmp/libva_trace.151739.thd-0x0000145a for the thread 0x0000145a
Frame number: 1
[ERROR], sts=MFX_ERR_GPU_HANG(-21), SynchronizeFirstTask, SyncOperation failed at /usr/src/intel-media-sdk/samples/sample_encode/src/pipeline_encode.cpp:153

[ERROR], sts=MFX_ERR_GPU_HANG(-21), GetFreeTask, m_TaskPool.SynchronizeFirstTask failed at /usr/src/intel-media-sdk/samples/sample_encode/src/pipeline_encode.cpp:1988

[ERROR], sts=MFX_ERR_GPU_HANG(-21), Run, m_pmfxENC->EncodeFrameAsync failed at /usr/src/intel-media-sdk/samples/sample_encode/src/pipeline_encode.cpp:2207

[ERROR], sts=MFX_ERR_GPU_HANG(-21), main, pPipeline->Run failed at /usr/src/intel-media-sdk/samples/sample_encode/src/sample_encode.cpp:1628
Frame number: 1
Encoding fps: 36

For comparison here's the same with the current i965 driver from master:

LD_LIBRARY_PATH=/opt/intel/mediasdk/lib:/usr/local/lib LIBVA_DRIVERS_PATH=/usr/local/lib/dri LIBVA_DRIVER_NAME=i965 LIBVA_TRACE=/tmp/libva_trace /opt/intel/mediasdk/share/mfx/samples/sample_encode h264 -w 176 -h 144 -f 30 -cqp -qpi 30 -qpp 30 -qpb 30 -qsv-ff -i AUD_MW_E.yuv -o encoded_AUD_MW_E.264
libva info: Open new log file /tmp/libva_trace.152119.thd-0x0000157f for the thread 0x0000157f
libva info: LIBVA_TRACE is on, save log into /tmp/libva_trace.152119.thd-0x0000157f
libva info: VA-API version 1.6.0
libva info: va_getDriverName() returns 0
libva info: User requested driver 'i965'
libva info: Trying to open /usr/local/lib/dri/i965_drv_video.so
libva info: Found init function __vaDriverInit_1_5
libva info: va_openDriver() returns 0
libva info: Save context 0x02000000 into log file /tmp/libva_trace.152119.thd-0x0000157f
Encoding Sample Version 8.4.27.

Input file format   YUV420
Output video        AVC
Source picture:
    Resolution  176x144
    Crop X,Y,W,H    0,0,176,144
Destination picture:
    Resolution  176x144
    Crop X,Y,W,H    0,0,176,144
Frame rate  30.00
QPI 30
QPP 30
QPB 30
Gop size    0
Ref dist    1
Ref number  0
Idr Interval    0
Target usage    balanced
Memory type system
Media SDK impl      hw
Media SDK version   1.30

Processing started
libva info: Open new log file /tmp/libva_trace.152119.thd-0x00001580 for the thread 0x00001580
Frame number: 100
Encoding fps: 1079

Processing finished

I'm including both the iHD and i965 trace files to allow comparison.

libva_trace.151739.thd-0x0000145a.gz libva_trace.151739.thd-0x00001459.gz libva_trace.152119.thd-0x0000157f.gz libva_trace.152119.thd-0x00001580.gz

Note that I've recompiled everything against libva 2.5.0 and gmmlib 19.3.3, so that I can easily swap the new and system libs against each other.

I can redo the tests with libva and gmmlib master, if you like, but I don't think that will change the result.

dvrogozh commented 4 years ago

So, adding device ids to msdk did not help.... Well, I still believe that's a correct patch. We just have some other issues. Is my understanding correct that:

  1. ffmpeg-vaapi with iHD leads to corruption bitstream
  2. sample_encode with iHD leads to errors out. Nothing is encoded.
  3. sample_encode with i965 works fine

Thank you for the traces. We will take a look. @dmitryermilov : can you, please, help to review?

felixbuenemann commented 4 years ago

Yes, your observations are correct.

Here are the file sizes from sample_encode:

ls -hal encoded_i*
-rw-r--r-- 1 root root 13K Nov 23 17:21 encoded_i965_AUD_MW_E.264
-rw-r--r-- 1 root root   0 Nov 23 17:21 encoded_iHD_AUD_MW_E.264
felixbuenemann commented 4 years ago

@dvrogozh I could probably create a docker container with SSH access to the hardware tomorrow, if that would help.

Oh and since I didn't mention before, the system is running on an Asrock J5005-ITX motherboard with firmware 1.40.

dvrogozh commented 4 years ago

Access to the system might help.

Also, can you, please, try to play with some encoding parameters to narrow down the issue. I would suggest:

/opt/intel/mediasdk/share/mfx/samples/sample_encode h264 -w 176 -h 144 -f 30 -cqp -qpi 30 -qpp 30 -qpb 30 -qsv-ff -i AUD_MW_E.yuv -o encoded_AUD_MW_E.264

felixbuenemann commented 4 years ago

All of the encoding modes fail to produce a single frame with the iHD driver.


I have set up a Docker container with all the necessary tools preinstalled and working access to /dev/dri/renderD128.

In order to access it through ssh you need to install cloudflared and add the following lines to your ~/.ssh/config:

Host *.trycloudflare.com
  ProxyCommand cloudflared access ssh --hostname %h

(This is needed because the SSHd in the Docker container is tunelled through Cloudflare Argo to make it publicly accessible.)

You can send me a public key through the email address listed on my GitHub profile or through Keybase.

I've installed the following libs / tools from their current master branches:

eero-t commented 4 years ago

I tried few transcoding test-cases on GLK J4005 (0x3185), with yesterday evening Git versions of drm-tip kernel, libdrm, gmmlib, libva, intel-driver and MediaSDK (+ FFmpeg Git version from few weeks ago).

With FFmpeg VA-API backend, AVC seems to work OK with iHD, but MPEG2 encoding and transcoding 4K 10-bit HEVC fail. MPEG2 encoding fails also with i965 driver, but 4K 10-bit HEVC transocode runs to finish, so I would have assumed iHD also to support that on GLK.

However, with MediaSDK And FFmpeg QSV backend pretty much everything fails, although they work fine with other GEN9 devices => is there MediaSDK bug about that?

SirBryan commented 4 years ago

I cloned a beautifully working CentOS 7 setup with an i5-8259U-based NUC and recent versions of everything (as mentioned by @eero-t) to two NUCs with Celeron J4005's. The same exact ffmpeg command line that works on the i5 is hitting a -21 error on the Celeron's (QSV decoding of MPEG2 to QSV H264).

However, MediaSDK's "sample_encode" binary converts an MPEG2 .ts 10-second file to H264 just fine.

eero-t commented 4 years ago

What about "sample_multi_transcode"? Everything my nightly runs do, fail on GLK, but work fine on other GEN9 platforms.

For example this: sample_multi_transcode -i::mpeg2 1920x1080i_29.97_20mb_mpeg2_high.mpv -o::h264 output.h264 -b 6000 -u 7 -n 2400 -async 4 -hw

Fails (with two days old git versions of kernel/libdrm/libva/gmmlib/media-driver/msdk) to:

[ERROR], sts=MFX_ERR_GPU_HANG(-21), PutBS, Encode: SyncOperation failed at /opt/builder/source0/media-sdk/samples/sample_multi_transcode/src/pipeline_transcode.cpp:1916

[ERROR], sts=MFX_ERR_ABORTED(-12), Transcode, PutBS failed at /opt/builder/source0/media-sdk/samples/sample_multi_transcode/src/pipeline_transcode.cpp:1878

[ERROR], sts=MFX_ERR_ABORTED(-12), Run, CTranscodingPipeline::Run::Transcode() [0x5591776caab0] failed at /opt/builder/source0/media-sdk/samples/sample_multi_transcode/src/pipeline_transcode.cpp:4448

[ERROR], sts=MFX_ERR_ABORTED(-12), main, transcode.ProcessResult failed at /opt/builder/source0/media-sdk/samples/sample_multi_transcode/src/sample_multi_transcode.cpp:1150

And running 50 parallel instances of this GPU hangs now and then: sample_multi_transcode -i::h264 1280x720p_29.97_10mb_h264_cabac.264 -o::h264 output.h264 -b 800 -u 4 -n 1200 -f 15 -w 352 -h 240 -FRC::PT -async 4 -hw

SirBryan commented 4 years ago

Similar results. (I'm using MPEG2 segments from ATSC broadcasts for samples.)

sudo /opt/intel/mediasdk/share/mfx/samples/sample_multi_transcode -i::mpeg2 testout.mp2 -o::h264 testout.mp4 -b 6000 -u 7 -n 2400 -async 2 -hw
Multi Transcoding Sample Version 8.4.27.

libva info: VA-API version 1.6.0
libva info: Trying to open /usr/lib64/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_6
libva info: va_openDriver() returns 0
Session 0:
Pipeline surfaces number (DecPool): 9
MFX HARDWARE Session 0 API ver 1.31 parameters: 
Input  video: MPG2
Output video: AVC 

Session 0 was NOT joined with other sessions

Transcoding started

[ERROR], sts=MFX_ERR_GPU_HANG(-21), PutBS, Encode: SyncOperation failed at /home/bscott/buildmedia/MediaSDK/samples/sample_multi_transcode/src/pipeline_transcode.cpp:1909

[ERROR], sts=MFX_ERR_ABORTED(-12), Transcode, PutBS failed at /home/bscott/buildmedia/MediaSDK/samples/sample_multi_transcode/src/pipeline_transcode.cpp:1871

[ERROR], sts=MFX_ERR_ABORTED(-12), Run, CTranscodingPipeline::Run::Transcode() [0x560841a30cc0] failed at /home/bscott/buildmedia/MediaSDK/samples/sample_multi_transcode/src/pipeline_transcode.cpp:4440

The errors point to this section of code from the MediaSDK:

mfxStatus CTranscodingPipeline::PutBS()
{
    mfxStatus       sts = MFX_ERR_NONE;
    ExtendedBS *pBitstreamEx  = m_BSPool.front();
    MSDK_CHECK_POINTER(pBitstreamEx, MFX_ERR_NULL_PTR);

    // get result coded stream, synchronize only if we still have sync point                           
    if(pBitstreamEx->Syncp)
    {
        sts = m_pmfxSession->SyncOperation(pBitstreamEx->Syncp, MSDK_WAIT_INTERVAL);
        HandlePossibleGpuHang(sts);
        MSDK_CHECK_ERR_NONE_STATUS(sts, MFX_ERR_ABORTED, "Encode: SyncOperation failed");
    }

...which tells me that we're not getting anything back from the GPU.

Interestingly enough, on the system where ffmpeg qsv is working fine, running this same sample_transcode command fails with a completely different result:

Multi Transcoding Sample Version 8.4.27.

libva info: VA-API version 1.6.0
libva info: Trying to open /usr/lib64/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_6
libva info: va_openDriver() returns 0
Session 0:

[ERROR], sts=MFX_ERR_NULL_PTR(-2), Init, m_fSource pointer is NULL at /home/bscott/buildmedia/MediaSDK/samples/sample_common/src/sample_utils.cpp:657

[ERROR], sts=MFX_ERR_NULL_PTR(-2), Init, reader->Init failed at /home/bscott/buildmedia/MediaSDK/samples/sample_multi_transcode/src/sample_multi_transcode.cpp:348

[ERROR], sts=MFX_ERR_NULL_PTR(-2), main, transcode.Init failed at /home/bscott/buildmedia/MediaSDK/samples/sample_multi_transcode/src/sample_multi_transcode.cpp:1169

On the i5, It doesn't get as far along in the process as it does with the Celeron.

SirBryan commented 4 years ago

To add to the above...

The Celeron boxes were (until today) using a 3.10 kernel. Upgraded to 5.5.6, with no difference.

Did a fresh install of Ubuntu 18.04 with kernel 4.15 on one of them and installed the drivers and compiled ffmpeg following this guide: https://gist.github.com/Brainiarc7/4f831867f8e55d35cbcb527e15f9f116 (I skipped the OpenCL and Vorbis pieces as I really just need a clean MPEG2 to H264 pipeline.)

Both ffmpeg and sample_multi_transcode ended up with error -21 (GPU hang).

eero-t commented 4 years ago

Is the original issue still reproducible with VA-API, or has that been fixed? If only MediaSDK issues remain, there should be a separate bug against that, as it's not doing all of its HW accesses through the media-driver.

SirBryan commented 4 years ago

Simplifying my comment.

My experience on the J4005's is exactly the same as what @felixbuenemann and @dvrogozh mentioned above: hardware-assisted encoding with GLK (and, more specifically my J4005's) and the iHD driver doesn't work. h264_qsv (usually) dies on error -21 and h264vaapi creates a corrupt (or nonexistent) bitstream. hevc* doesn't work at all.

(MPEG2 encode isn't supported on this CPU/GPU anyway, so that's a non-issue.)

SirBryan commented 4 years ago

I've done some more poking around. On a whim I took a sample MPEG2 file I had and sent it over the wire to another machine using the software decoder and "manually" uploading the frames into the qsv encoder. It actually played on the other machine (using ffplay -i udp://ip.add.ress...), so I decided to investigate further.

h264_qsv worked the best and longest (for around 10 seconds) with the profile set to high and level set to 4. It died much faster (if it starts at all) when set to main or baseline.

ffmpeg -loglevel debug -re  -init_hw_device qsv=hw -filter_hw_device hw \
-f mpegts -c:v mpeg2video  -i "$srcurl" \
-vf "hwupload=extra_hw_frames=32,format=qsv" -r 30000/1001  \
-c:v h264_qsv -profile:v high -level:v 4.0 -b:v 2M  \
-c:a aac -b:a 192k -ac 2\
-f mpegts $dsturl

It turns out that with hwaccel decode off and the higher complexity of the encode was causing ffmpeg to feed the ~30fps video at speeds lower than 1x.

In other words, as soon as ffmpeg comes up to 1x speed, it errors out to -21. I'm wondering if a queue is getting starved somewhere, and instead of waiting, it dies out.

That would explain @eero-t 's comment:

And running 50 parallel instances of this GPU hangs now and then

SirBryan commented 4 years ago

Also, @eero-t, hardware MPEG-2 encoding is not supported on J4005.

From https://www.intel.com/content/dam/support/us/en/documents/mini-pcs/nuc-kits/NUC7xJY_TechProdSpec.pdf

Video decode hardware acceleration supporting H.265/HEVC @ Level 5.1 8b/10b, H.264 @ Level 5.2, MPEG2, MVC, VC-1, WMV9, JPEG, VP8 and VP9 formats Video encode hardware acceleration supporting H.265/HEVC @ Level 4 8b, H.264 @ Level 5.2, JPEG, MVC, VP8 and VP9 (SW only) formats

Thus, vainfo shows:

      VAProfileH264Main               : VAEntrypointEncSlice
      VAProfileH264Main               : VAEntrypointEncSliceLP
      VAProfileH264High               : VAEntrypointEncSlice
      VAProfileH264High               : VAEntrypointEncSliceLP
      VAProfileJPEGBaseline           : VAEntrypointEncPicture
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
      VAProfileHEVCMain               : VAEntrypointEncSlice
      VAProfileHEVCMain10             : VAEntrypointEncSlice
Xiaogangli-intel commented 4 years ago

So @eero-t @SirBryan, the issue still exist with the latest MSDK/media driver/KMD, right? If yes, is that posible to provide a reproduce enviroment? Then we can do a quick debug on it.

eero-t commented 4 years ago

Ubuntu 18.04 with latest updates installed, and day old Git versions built on 18.04:

Will fail MSDK H.264 transcoding on J4005: sample_multi_transcode -i::h264 1280x720p_29.97_10mb_h264_cabac.264 -o::h264 output.h264 -b 2000 -u 4 -n 2400 -async 4 -hw

Always with:

[ERROR], sts=MFX_ERR_GPU_HANG(-21), Transcode, <EncodeOneFrame|Surface2BS> failed at samples/sample_multi_transcode/src/pipeline_transcode.cpp:1852

[ERROR], sts=MFX_ERR_GPU_HANG(-21), Run, CTranscodingPipeline::Run::Transcode() [0x55e4e4d2cbc0] failed at samples/sample_multi_transcode/src/pipeline_transcode.cpp:4449

[ERROR], sts=MFX_ERR_GPU_HANG(-21), main, transcode.ProcessResult failed at samples/sample_multi_transcode/src/sample_multi_transcode.cpp:1150

Note: kernel doesn't complain about GPU hang in dmesg for this or earlier failing MSDK tests, only for one later MSDK test, which is quite odd.

This is with BIOS version JYGLKCPX.86A.0053.2019.1015.1510, and HuC loaded:

[    2.244693] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/glk_dmc_ver1_04.bin (v1.4)
[    2.256099] [drm] GuC communication enabled
[    2.262923] i915 0000:00:02.0: GuC firmware i915/glk_guc_33.0.0.bin version 33.0 submission:disabled
[    2.262926] i915 0000:00:02.0: HuC firmware i915/glk_huc_4.0.0.bin version 4.0 authenticated:yes

But the exact versions shouldn't matter as it's been failing since we added that HW to testing months ago. There's no GPU hang with FFmpeg + VA-API.

Xiaogangli-intel commented 4 years ago

@eero-t, thank you, will take a look this issue.

Xiaogangli-intel commented 4 years ago

Reproduced, seems some commands were not executed, in debugging.

Jqh63 commented 4 years ago

@felixbuenemann can you confirm iHD driver is still broken for Gemini lake? It is for me, using LSIO plex docker image.

ovaar commented 4 years ago

Any progress regarding this issue?

Avatat commented 3 years ago

I have a similar issue.

Software: Ubuntu 20.04 Kernel 5.4.0-48-generic Intel Media SDK 20.2.1 FFmpeg 4.3.1

CPU: Intel J4105 GPU:

00:02.0 VGA compatible controller [0300]: Intel Corporation UHD Graphics 605 [8086:3185] (rev 03)
EricTheMagician commented 3 years ago

Same issue here: Intel N5000 with UHD 605.

felixbuenemann commented 3 years ago

@felixbuenemann can you confirm iHD driver is still broken for Gemini lake?

It is for me, using LSIO plex docker image.

I haven't tested in a long time and am relying on dpkg-divert to automatically disable the iHD driver in Plex Media Server by renaming it.

Xiaogangli-intel commented 3 years ago

I checked @XinfengZhang 's change should already fix this issue. Could you try the latest source and try again?

felixbuenemann commented 3 years ago

@Xiaogangli-intel: I can confirm h.264 hardware encoding is working fine with master, but I can't get HEVC to work:

/opt/intel/mediasdk/share/mfx/samples/sample_encode h265 -w 176 -h 144 -f 30 -cqp -qpi 30 -qpp 30 -qpb 30 -qsv-ff -i /usr/src/samples/AUD_MW_E.yuv -o encoded_AUD_MW_E.265
libva info: VA-API version 1.9.0
libva info: Trying to open /usr/local/lib/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_9
libva info: va_openDriver() returns 0

[ERROR], sts=MFX_ERR_UNSUPPORTED(-3), AllocFrames, Query (for encoder) failed at /usr/src/intel-media-sdk/samples/sample_encode/src/pipeline_encode.cpp:946

[ERROR], sts=MFX_ERR_UNSUPPORTED(-3), ResetMFXComponents, AllocFrames failed at /usr/src/intel-media-sdk/samples/sample_encode/src/pipeline_encode.cpp:2053

[ERROR], sts=MFX_ERR_UNSUPPORTED(-3), Init, ResetMFXComponents failed at /usr/src/intel-media-sdk/samples/sample_encode/src/pipeline_encode.cpp:1834

[ERROR], sts=MFX_ERR_UNSUPPORTED(-3), main, pPipeline->Init failed at /usr/src/intel-media-sdk/samples/sample_encode/src/sample_encode.cpp:1647
Frame number: 0
Encoding fps: -nan

When trying with ffmpeg (which works fine with h.264) I get:

[hevc_vaapi @ 0x558f4307a580] No quality level set; using default (25).
[hevc_vaapi @ 0x558f4307a580] Failed to end picture encode issue: 24 (internal encoding error).
[hevc_vaapi @ 0x558f4307a580] Encode failed: -5.

However the sample_encode help mentions:

Supported codecs, <msdk-codecid>:
   <codecid>=h264|mpeg2|vc1|mvc|jpeg - built-in Media SDK codecs
   <codecid>=h265|vp9                - in-box Media SDK plugins (may require separate downloading and installation)

I am using the latest master of gmmlib and media-driver with libva 2.9.1, all firmware is loaded:

[    0.274975] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-56-generic root=UUID=af9f2f62-679f-4313-88eb-7867f5304f09 ro mitigations=off scsi_mod.use_blk_mq=1 mem_sleep_default=s2idle zswap.enabled=1 zswap.compressor=lz4 zswap.max_pool_percent=20 zswap.zpool=z3fold i915.fastboot=1 i915.modeset=1 i915.enable_guc=2 i915.nuclear_pageflip=1 i915.enable_dc=2 i915.enable_fbc=1 i915.enable_psr=1 i915.error_capture=1 i915.verbose_state_checks=1 fbcon=font:TER16x32 consoleblank=0 snd_hda_intel.pm_blacklist=0 reboot=efi
[    5.259311] i915 0000:00:02.0: vgaarb: deactivate vga console
[    5.259509] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    5.293672] [drm] Finished loading DMC firmware i915/glk_dmc_ver1_04.bin (v1.4)
[    5.633228] mei_hdcp 0000:00:0f.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_component_ops [i915])
[    6.387227] i915 0000:00:02.0: GuC firmware i915/glk_guc_33.0.0.bin version 33.0 submission:disabled
[    6.387230] i915 0000:00:02.0: HuC firmware i915/glk_huc_ver03_01_2893.bin version 3.1 authenticated:yes
[    6.389360] [drm] Initialized i915 1.6.0 20190822 for 0000:00:02.0 on minor 0
[    6.391626] snd_hda_intel 0000:00:0e.0: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
vainfo
error: can't connect to X server!
libva info: VA-API version 1.9.0
libva info: Trying to open /usr/local/lib/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_9
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.9 (libva 2.6.0)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 20.4.3 (b7b1b0061)
vainfo: Supported profile and entrypoints
      VAProfileNone                   : VAEntrypointVideoProc
      VAProfileNone                   : VAEntrypointStats
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointEncSlice
      VAProfileH264Main               : VAEntrypointFEI
      VAProfileH264Main               : VAEntrypointEncSliceLP
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointEncSlice
      VAProfileH264High               : VAEntrypointFEI
      VAProfileH264High               : VAEntrypointEncSliceLP
      VAProfileVC1Simple              : VAEntrypointVLD
      VAProfileVC1Main                : VAEntrypointVLD
      VAProfileVC1Advanced            : VAEntrypointVLD
      VAProfileJPEGBaseline           : VAEntrypointVLD
      VAProfileJPEGBaseline           : VAEntrypointEncPicture
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
      VAProfileH264ConstrainedBaseline: VAEntrypointFEI
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
      VAProfileVP8Version0_3          : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointEncSlice
      VAProfileHEVCMain               : VAEntrypointFEI
      VAProfileHEVCMain10             : VAEntrypointVLD
      VAProfileHEVCMain10             : VAEntrypointEncSlice
      VAProfileVP9Profile0            : VAEntrypointVLD
      VAProfileVP9Profile2            : VAEntrypointVLD

It is my understanding that HuC firmware is needed for HEVC, but it is already loaded.

What exactly needs to be downloaded to enable h265 and vp9 codecs on GLK?

andatche commented 3 years ago

I'm seeing the same issue as @felixbuenemann when trying to encode HEVC.

Xiaogangli-intel commented 3 years ago

@felixbuenemann and @andatche, could you provide the latest commit id of the media driver you are using?

felixbuenemann commented 3 years ago

@Xiaogangli-intel I was testing with b7b1b00619e135d613ad563876fbe0506db63341.

esmorun commented 3 years ago

Any news on this issue? Hardware encoding is still completely broken with the current package in the Ubuntu 20.04LTS repository (intel-media-va-driver-non-free (20.1.1+ds1-1build1)).

Asrock j5005-board.

Xiaogangli-intel commented 3 years ago

@felixbuenemann Seems sample_encode changed the default Encode to lowpower, so could you try: /opt/intel/mediasdk/share/mfx/samples/sample_encode h265 -w 176 -h 144 -f 30 -cqp -qpi 30 -qpp 30 -qpb 30 -qsv-ff -i /usr/src/samples/AUD_MW_E.yuv -o encoded_AUD_MWE.265 **-lowpower:off_**

it works on my side. For other guys, could you provide the command line?