intel / media-driver

Intel Graphics Media Driver to support hardware decode, encode and video processing.
https://github.com/intel/media-driver/wiki
Other
994 stars 346 forks source link

[Bug]: QSV+OpenCL interop get stuck in ffmpeg in linux 5.18+ #1456

Closed nyanmisaka closed 1 year ago

nyanmisaka commented 2 years ago

Which component impacted?

Video Processing

Is it regression? Good in old configuration?

Yes, it's good in old version

What happened?

  1. Linux 5.18+(5.17 or older works fine), ffmpeg 4.4.x/5.x/master
  2. libva 2.14, media-driver 22.4.3/master, latest NEO runtime
  3. Run ffmpeg cli multiple times

(vaapi dec + opencl filter + qsv enc):

./ffmpeg -threads 0 -v verbose -init_hw_device vaapi=va:,driver=iHD -init_hw_device qsv=qs@va -filter_hw_device va -hwaccel vaapi -hwaccel_output_format vaapi -autorotate 0 -i 4k_hevc.mp4 -an -sn -c:v h264_qsv -t 5 -preset veryfast -b:v 10M -maxrate 10M -bufsize 20M -vf scale_vaapi=w=1920:h=1080:format=nv12,hwmap=derive_device=opencl,avgblur_opencl,hwmap=derive_device=qsv:reverse=1:extra_hw_frames=16,format=qsv -y /tmp/out.mp4

(qsv dec + opencl filter + qsv enc):

./ffmpeg -threads 0 -v verbose -init_hw_device vaapi=va:,driver=iHD -init_hw_device qsv=qs@va -filter_hw_device qs -hwaccel qsv -hwaccel_output_format qsv -autorotate 0 -c:v hevc_qsv -i 4k_hevc.mp4 -an -sn -c:v h264_qsv -t 5 -preset veryfast -b:v 10M -maxrate 10M -bufsize 20M -vf scale_qsv=w=1920:h=1080:format=nv12,hwmap=derive_device=opencl,avgblur_opencl,hwmap=derive_device=qsv:reverse=1:extra_hw_frames=16,format=qsv -y /tmp/out.mp4
  1. There is a high chance that ffmpeg will get stuck while enqueueing any opencl kernel.

Note that this issue can be more obvious if you create the opencl context with CL_CONTEXT_INTEROP_USER_SYNC set to CL_TRUE.

Also in the first vaapi-qsv hybrid ffmpeg cli this issue can be avoided by cap the input threads to 1, but it's not suitable for the second pure qsv ffmpeg cli since qsv requires 2+ threads.

What's the usage scenario when you are seeing the problem?

Transcode for media delivery

What impacted?

https://github.com/jellyfin/jellyfin QSV/VAAPI transcoding with OpenCL filtering on Intel Gen9+ iGPUs. older HD/UHD630 and new Xe graphics are both affected.

Debug Information

  1. latest stable/master + latest NEO runtime
  2. yes
  3. vainfo.txt
  4. libva.zip
  5. There's no gpu hang in dmesg, but only ffmpeg get stuck.

Do you want to contribute a patch to fix the issue?

I have bisected the issue to this kernel commit introduced in linux 5.18: https://github.com/torvalds/linux/commit/7e00897be8bf13ef9c68c95a8e386b714c29ad95 Removing grab_vma() and ungrab_vma() calls works fine again in this case.

It seems to be a i915 kernel issue and it indeed affects the userspace programs such as iHD+NEO+ffmpeg use case but I don't really understand it's mechanism and how to fix it in i915, iHD or NEO, so I put it here to let you know.

haichund1 commented 1 year ago

@nyanmisaka since this i915 issue, can you help open bug in i915 project? thanks

FurongZhang commented 1 year ago

@XinfengZhang , @Sherry-Lin

nyanmisaka commented 1 year ago

@haichund1 @FurongZhang Fixed by upstream i915 https://github.com/torvalds/linux/commit/3f882f2d4f689627c1566c2c92087bc3ff734953