[DG2/A380] AVS/HQ scaling doesn't work in ffmpeg vpp_qsv

nyanmisaka commented 2 years ago

System information

GPU information: Intel Arc A380 / DG2 128EU
Driver version: Latest release 31.0.101.3490 as of 10/21/2022

Issue behavior

Describe the current behavior

MFX_SCALING_MODE_QUALITY mode doesn't work on DG2/A380 on Windows.

When using the vpp_qsv filter in FFmpeg 5.1/master to downscale videos, SFC(low-power) is always used even if the AVS(HQ) mode is selected, thus it causes a huge loss of scaling performance. The same cli works fine on TGL/i7-1165G7 platforms. Also tried the FFmpeg patches from cartwheel-ffmpeg repo with no luck.

vpp_qsv help: scale_mode <int> ..FV....... scale mode: 0=auto, 1=low power, 2=high quality (from 0 to 2) (default 0)

ffmpeg cli: ffmpeg.exe -init_hw_device d3d11va=dx:0 -init_hw_device qsv=qs@dx -filter_hw_device qs -hwaccel qsv -hwaccel_output_format qsv -c:v hevc_qsv -i 3840_2160_hevc_10bit.mp4 -an -sn -vf vpp_qsv=w=1920:h=1080:format=nv12:scale_mode=2 -f null -

e.g. at 4k->1080p, SFC(VideoProcessing in taskmgr) seems to be the bottleneck of performance. frame= 7620 fps=232 q=-0.0 Lsize=N/A time=00:02:07.12 bitrate=N/A speed=3.87x
But without vpp_qsv, decoder can double the fps. frame= 7620 fps=409 q=-0.0 Lsize=N/A time=00:02:07.12 bitrate=N/A speed=6.82x
Even with my custom opencl bilinear scaler + ocl-dx11 interop, the fps can reach ~375fps. frame= 7620 fps=375 q=-0.0 Lsize=N/A time=00:02:07.12 bitrate=N/A speed=6.25x

Describe the expected behavior

MFX_SCALING_MODE_QUALITY can be used on DG2/A380 on Windows. MFX_SCALING_MODE_DEFAULT should map to the most performant scaling mode on dedicated Arc GPUs.

nyanmisaka commented 2 years ago

@FurongZhang Sorry to bother you, it seems that you have always been an expert on VPP HW under Linux.

Do you know what caused this issue? Or, any chance you can assign this to a more suitable engineer? Thanks!

FurongZhang commented 2 years ago

@nyanmisaka , on DG2/380, HW removed AVS sampler in render engine. Hence, all high quality scaling(advanced scaling) can be done(only can be done) by SFC.

FurongZhang commented 2 years ago

May I know what kinds of scaling algorithm you are using? If bilinear is ok to you, VPL new scaling flag MFX_SCALING_MODE_INTEL_GEN_COMPUTE can serve your purpose. If you request better scaling quality request, SFC is the only way on DG2.

nyanmisaka commented 2 years ago

@nyanmisaka , on DG2/380, HW removed AVS sampler in render engine. Hence, all high quality scaling(advanced scaling) can be done(only can be done) by SFC.

That make sense due to HW changes.

May I know what kinds of scaling algorithm you are using? If bilinear is ok to you, VPL new scaling flag MFX_SCALING_MODE_INTEL_GEN_COMPUTE can serve your purpose. If you request better scaling quality request, SFC is the only way on DG2.

Yes. We mainly use bilinear. I will enable VPL in ffmpeg and try that new mode.

nyanmisaka commented 2 years ago

Hi! @FurongZhang VPL is initialized via DX11 and I added MFX_SCALING_MODE_INTEL_GEN_COMPUTE to the vpp_qsv option list. But unfortunately, the fps and SFC occupancy have not changed.

Do we need additional changes in ffmpeg to take advantage of this new mode? Or is this mode not yet implemented in the Windows driver(Intel® oneVPL GPU Runtime (21.0.2.7))? Thanks!

FurongZhang commented 2 years ago

This mode is only implemented on Linux. I will work on Windows Implementation and provide a patch to you next Tue. In ffmpeg, please add this flag to submit the WL to compute @xhaihao

xhaihao commented 2 years ago

@FurongZhang These new flags have been added in cartwheel-ffmpeg (https://github.com/intel-media-ci/cartwheel-ffmpeg/blob/master/patches/0078-vpp_qsv-add-support-for-new-scaling-modes.patch), could you please try cartwheel-ffmpeg firstly ?

nyanmisaka commented 2 years ago

I've tried cartwheel-ffmpeg but the new scale_mode doesn't take effect too on Windows, which should be caused by this feature is not enabled in the driver side as @FurongZhang said.

This mode is only implemented on Linux. I will work on Windows Implementation and provide a patch to you next Tue. In ffmpeg, please add this flag to submit the WL to compute

Thank you for your help. I'll try it out as soon as it's available.

nyanmisaka commented 2 years ago

@xhaihao I also tried the av1_qsv encoder but it keeps reporting "Non-monotonous DTS in output stream", which causes stuttering in the encoded AV1 video, other qsv encoders worked fine.

[ivf @ 000001f2323ad700] Non-monotonous DTS in output stream 0:0; previous: 4, current: 1; changing to 5. This may result in incorrect timestamps in the output file.

[ivf @ 000001f2323ad700] Non-monotonous DTS in output stream 0:0; previous: 5, current: 2; changing to 6. This may result in incorrect timestamps in the output file.

[ivf @ 000001f2323ad700] Non-monotonous DTS in output stream 0:0; previous: 6, current: 3; changing to 7. This may result in incorrect timestamps in the output file.

[ivf @ 000001f2323ad700] Non-monotonous DTS in output stream 0:0; previous: 8, current: 5; changing to 9. This may result in incorrect timestamps in the output file.

xhaihao commented 2 years ago

@nyanmisaka Could you try the latest https://github.com/oneapi-src/oneVPL-intel-gpu/ ? Recently oneVPL team fixed timestamp issue.

nyanmisaka commented 2 years ago

@nyanmisaka Could you try the latest https://github.com/oneapi-src/oneVPL-intel-gpu/ ? Recently oneVPL team fixed timestamp issue.

@xhaihao Yes I can try them, but currently I'm on Windows. How long does it usually take for these fixes to be synced to the Windows driver?

xhaihao commented 2 years ago

@nyanmisaka I'm sorry I don't have the info about the driver on Windows,

FurongZhang commented 2 years ago

@nyanmisaka , I am also working on Windows driver part. And from your description, You are mainly using bilinear.

nyanmisaka commented 2 years ago

Yes. May I assume that I should patch or replace the libmfx64-gen.dll onevpl library under system32?

FurongZhang commented 2 years ago

We will check in the VPL code into this branch. Meanwhile, you also need to update GFX driver. We also did some changes in driver.

nyanmisaka commented 2 years ago

We will check in the VPL code into this branch. Meanwhile, you also need to update GFX driver. We also did some changes in driver.

Thanks, I will try the new driver once it is available.

One additional question, I tried the av1_qsv encoder with b-frame enabled, but it doesn't seem to contain b-frame when I check the encoded video via ffprobe, may I ask this Is it a hardware limitation or software not enabled?

nyanmisaka commented 2 years ago

Any update on this? The beta driver 31.0.101.3793 released in 10/27 still not including these improvements.

FurongZhang commented 2 years ago

@nyanmisaka , we need some time to prepare for that. We will provide you the driver by the end of 11/11. Is this ok for you?

nyanmisaka commented 2 years ago

No problem.

mikk9 commented 2 years ago

We will check in the VPL code into this branch. Meanwhile, you also need to update GFX driver. We also did some changes in driver.

Thanks, I will try the new driver once it is available.

One additional question, I tried the av1_qsv encoder with b-frame enabled, but it doesn't seem to contain b-frame when I check the encoded video via ffprobe, may I ask this Is it a hardware limitation or software not enabled?

AV1 doesn't support b-frames at all, this is something different even if it's called b-frames. But it helps compression efficiency. The AV1 timestamp issue is very old by the way, I tried older drivers. it's a bit odd to me they fixed it so late.

nyanmisaka commented 2 years ago

AV1 doesn't support b-frames at all, this is something different even if it's called b-frames. But it helps compression efficiency. The AV1 timestamp issue is very old by the way, I tried older drivers. it's a bit odd to me they fixed it so late.

https://github.com/rigaya/QSVEnc/blob/master/GPUFeatures/QSVEnc_DG2_Arc_A380_Win.txt

I ask this because QSVEnc reports that the A380 supports AV1 b-frame via querying VPL interface. But I googled for a while and found no evidence that several AV1 software encoders support b-frame.

FurongZhang commented 2 years ago

AV1 doesn't support b-frames at all, this is something different even if it's called b-frames. But it helps compression efficiency. The AV1 timestamp issue is very old by the way, I tried older drivers. it's a bit odd to me they fixed it so late.

https://github.com/rigaya/QSVEnc/blob/master/GPUFeatures/QSVEnc_DG2_Arc_A380_Win.txt

I ask this because QSVEnc reports that the A380 supports AV1 b-frame via querying VPL interface. But I googled for a while and found no evidence that several AV1 software encoders support b-frame.

Regarding AV1, is it possible for you to open a separate thread to discuss it? I can help to pull in the AV1 expert for detailed discussion.

nyanmisaka commented 2 years ago

I just tried the 31.0.101.3802 driver, but the relevant changes are still not included. Perhaps you need more time to figure it out.

FurongZhang commented 2 years ago

We have already checked in to our development branch. It may need some time to release.

oviano commented 1 year ago

We have already checked in to our development branch. It may need some time to release.

Hello

The fix looks like it was made back in April. How long before it is released in a driver on Windows?

nyanmisaka commented 1 year ago

Checked with the latest beta driver 31.0.101.3959 released today. The compute scaling mode is still not there.

But hopefully the AV1 timestamp issue is finally fixed.

nyanmisaka commented 1 year ago

This has been fixed in 31.0.101.4032 driver. Thanks again! @FurongZhang

compute

FurongZhang commented 1 year ago

@nyanmisaka , glad to know. Thank you for your patience.

intel / vpl-gpu-rt