intel / media-driver

Intel Graphics Media Driver to support hardware decode, encode and video processing.
https://github.com/intel/media-driver/wiki
Other
1k stars 347 forks source link

[Bug]: A critical error happened with HDR tone mapping in opencl #1665

Open ghost opened 1 year ago

ghost commented 1 year ago

Which component impacted?

Decode, Encode, Video Processing

Is it regression? Good in old configuration?

No, this issue exist a long time

What happened?

I am using proxmox ve host and using Ubuntu 22.04 LTS VM.

UHD770 enables SR-IOV.

when they use the decoding function using opencl in ubuntu, it have these video errors.

Find out more here: https://github.com/strongtz/i915-sriov-dkms/issues/57

237618399-1cc2553d-fa9e-4d5e-9d56-9fbb61f5baa8

What's the usage scenario when you are seeing the problem?

Transcode for media delivery

What impacted?

No response

Debug Information

No response

Do you want to contribute a patch to fix the issue?

None

ghost commented 1 year ago

@MicroYY Any help?

MicroYY commented 1 year ago

Could you pls try this WIKI and gather driver logs? https://github.com/intel/media-driver/wiki/Video-Processing-Debug-Tool

FurongZhang commented 1 year ago

"HDR tone mapping in opencl", what do you mean?

intel-mediadev commented 1 year ago

Auto Created VSMGWL-64877 for further analysis.

nyanmisaka commented 1 year ago

In this case, an OpenCL kernel is required to be executed on the decoded VA-API surface and then encode the processed surface with VA-API. The issue happens in VA-API<->OpenCL surface sharing.

It should be reproduced with any OpenCL filter that supported in ffmpeg, such as avgblur_opencl:

ffmpeg -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device opencl=ocl@va \
-hwaccel vaapi -hwaccel_output_format vaapi -i /path/to/4kClip.mp4 -an -sn \
-vf hwmap=derive_device=opencl,avgblur_opencl,hwmap=derive_device=vaapi:reverse=1,format=vaapi \
-c:v hevc_vaapi -b:v 10M -maxrate 10M -y out.mp4

It works fine on host but fails on VM, in which case the i915-sriov-dkms needs to be installed and configured to leverage the SR-IOV support on UHD7xx and Xe series iGPUs.

236111786-06c784e6-2d7e-4b42-ba10-d24a197fcb90

@MicroYY @FurongZhang The corrupted image leads me to guess that it is a memory modifier issue because the pixels are displayed correctly, but the positions are messed up.

MicroYY commented 1 year ago

Could you pls try this WIKI and gather driver logs? https://github.com/intel/media-driver/wiki/Video-Processing-Debug-Tool

如果对方在docker中运行,那么我应该在主机中收集log还是在docker中收集数据?

You should work on the docker. BTW, may I know if the whole pipeline is decode + OCL HDR? What's the decode output format? Is any VPP process such as scaling or color space conversion involved? It's helpful to provide more details. And from @nyanmisaka's comment, the issue may be caused by surface sharing between VAAPI and OCL. There is a related PR to fix another issue : https://github.com/intel/media-driver/pull/1670 worth trying.

nyanmisaka commented 1 year ago

@MicroYY vaapi decode(p010, 10bit HDR) -> opencl(color mapping and p010->nv12) -> vaapi encode(nv12, 8bit SDR)

vaapi VPP filtering is not used in the ffmpeg pipeline.

riverscn commented 1 year ago

Could you pls try this WIKI and gather driver logs? https://github.com/intel/media-driver/wiki/Video-Processing-Debug-Tool

如果对方在docker中运行,那么我应该在主机中收集log还是在docker中收集数据?

You should work on the docker. BTW, may I know if the whole pipeline is decode + OCL HDR? What's the decode output format? Is any VPP process such as scaling or color space conversion involved? It's helpful to provide more details. And from @nyanmisaka's comment, the issue may be caused by surface sharing between VAAPI and OCL. There is a related PR to fix another issue : #1670 worth trying.

tried that patch. it doesn't work. @MicroYY

image

nyanmisaka commented 1 year ago

@riverscn Did you replace the /usr/lib/jellyfin-ffmpeg/lib/dri/iHD_drv_video.so with the patched one?

riverscn commented 1 year ago

@riverscn Did you replace the /usr/lib/jellyfin-ffmpeg/lib/dri/iHD_drv_video.so with the patched one?

yes

FurongZhang commented 1 year ago

@riverscn , I noticed that we were in the same time zone. Can we have a quick call to understand your usage? Then, it will expedite the issue fixing.

riverscn commented 1 year ago

@FurongZhang OK. How?

MicroYY commented 1 year ago

May I know if https://github.com/intel/media-driver/pull/1675 can work? Some deprecated drm modifier is used which may cause confusion between linear/tiled surface.

zhtengw commented 1 year ago

May I know if #1675 can work? Some deprecated drm modifier is used which may cause confusion between linear/tiled surface.

No luck with this patch, it doesn't work. @MicroYY Snipaste_2023-06-01_15-42-53

MicroYY commented 1 year ago

As the issue may happen in vaExportSurfaceHandle, we are trying to enable trace log in libva. https://github.com/intel/libva/pull/724 Could you pls update libva to this PR and get va trace log? $ export LIBVA_TRACE=/tmp/libva.log This env variable is needed to set output log before running the app.

MicroYY commented 1 year ago

https://github.com/intel/libva/pull/724 has been merged in libva master branch. Pls update libva/media driver and get va trace log.

$ export LIBVA_TRACE=/tmp/libva.log Set this env variable to a proper path and then reproduce the issue. You will find the log in the path.

mvnixon commented 2 months ago

Hi, any update on this issue? I can replicate this.