intel / media-driver

Intel Graphics Media Driver to support hardware decode, encode and video processing.
https://github.com/intel/media-driver/wiki
Other
974 stars 345 forks source link

Cloud gaming performance degredation with iHD #925

Open daegalus opened 4 years ago

daegalus commented 4 years ago

So, just wanted to post this as it caught me off guard when my Manjaro Linux install upgraded from 19 to 20. It is Arch linux based. I think the default changed from i965 to iHD during this upgrade.

I use cloud and game streaming software to play games on my Linux laptop from my gaming desktop. I use Parsec primarily, and Rainway as a backup.

When the iHD driver is active, Parsec and Rainway both degrade within a few seconds to severe decoding latency, making these unusable.

I only found out about it because I went to the Parsec discord, and a community manager happened to have the same issue and pointed it out.

As soon as I set LIBVA_DRIVER_NAME=i965 all my performance came back fully.

Parsec uses a desktop app (thought I believe its an Electron app of some sort with Native bindings for streaming), and Rainway runs entirely in chromium. Both use h264 and/or h265 if you support it.

Even with h264, it was a huge difference in performance between the 2 drivers.

I don't know where and what logs/debug information would be beneficial, so if there is anything I can provide to help, I can definitely provide it.

dvrogozh commented 4 years ago

What's the HW you are using? Could you, please, provide output of these 2 commands:

uartie commented 4 years ago

@Daegalus the default driver changed since libva 2.7.1. The new change will try to load iHD first. If it is not available or fails to load, then it will try to load i965 second. Setting LIBVA_DRIVER_NAME=i965 overrides that default behavior.

Nonetheless, please provide the details that @dvrogozh requested above.

daegalus commented 4 years ago

GPU:

❯ lspci -nn | grep VGA
00:02.0 VGA compatible controller [0300]: Intel Corporation UHD Graphics 620 [8086:5917] (rev 07)

CPU:

❯ cat /proc/cpuinfo | grep "model name" | uniq
model name      : Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz

And I understand that the default driver changed, makes sense since the new packages for my distro included an upgrade to libva 2.7.1 sine its a rolling update distribution. Just wanted to help point out there was a performance regression.

dvrogozh commented 4 years ago

So, this is KBL: https://ark.intel.com/content/www/us/en/ark/products/124967/intel-core-i5-8250u-processor-6m-cache-up-to-3-40-ghz.html.

Let's try to check which components (h264/h265, decoders/encoders) you are actually using. Can you, please, enable libva tracing with (substitute with the real user, please):

mkdir /home/<user>/trace
echo "LIBVA_TRACE=/home/<user>/trace/file" | sudo tee /etc/libva.conf

Mind that having /file on the very end of LIBVA_TRACE is not a mistake. Libva will actually create a bunch of files with the template /home/<user>/trace/file*

Once done, please, rerun your application (make it a single run to avoid mess in the logs), get all the trace files from /home/<user>/trace (there likely will be more than one if libva/driver is used by multiple threads) and send them back to us.

Note: remove /etc/libva.conf once you will be done.

daegalus commented 4 years ago

Just to clarify, you want this information for iHD? Do you also want a clean copy of the same stuff for i965 also?

dvrogozh commented 4 years ago

We need this for the iHD at the first place. But if possible - collect for i965 as well, please. Maybe we will spot some difference in initialization sequence which makes same component behave differently for differnt drivers.

daegalus commented 4 years ago

Put the trace files on my self-hosted storage site. iHD Trace: https://ocean.yuli.dev/f/1c2292d257b34822b9c2/?dl=1 (iHD-trace.tar.gz) i965 Trace: https://ocean.yuli.dev/f/788b9e1206204f92af9c/?dl=1 (i965-trace.tar.gz)

Some observations: ~Parsec has some basic stats like Decode, Encode, and Network latency. I noticed with the iHD driver, my Decode latency was 2.44ms roughly, and on the i965 driver its 0.37ms.~ [edit] Was informed that the latency for decode is wrong.

Also the size of the files is different, but they are both roughly 20 seconds of streaming. I launch the app, connect to the box with the game already running. Try to move around for 20 seconds (using WoW as a test), and then disconnecting and shutting it off.

FurongZhang commented 4 years ago

Thanks @Daegalus for the information. We are trying to take care of the issue from media driver. I have brought up this issue to our expert of cloud gaming.

DiegoGuidaF commented 4 years ago

I also have this issue with the latest released version (20.1.1) however it seems to be fixed if using the latest pre release version 20.2.pre (3640b64c). @Daegalus Could you test it again to see if it is fixed for you too?

I have installed latest version from Archlinux AUR: intel-media-driver: 3640b64c intel-gmmlib: 3f1ff23 libva: 7fde463

daegalus commented 4 years ago

20.2.pre/master did not fix it for me, same issue.

DiegoGuidaF commented 4 years ago

Before trying out the 20.2.pre I had an older version I found out to work pretty good, if you have the time you could try it out and see if that one also works for you better, I have to say for me it seems that it had better performance than current one with Parsec, but it might also be placebo effect since I haven't done any numbers to compare them.

Previous version I was using had the following commits: Driver version Commit
intel-media-driver 2020.2.pre 5d8ad341
intel-gmmlib 20.1.1 09324e1
libva 2.7.0.pre 8212296

BTW:

❯ lspci -nn | grep VGA
00:02.0 VGA compatible controller [0300]: Intel Corporation UHD Graphics 620 (Whiskey Lake) [8086:3ea0]
❯ cat /proc/cpuinfo | grep "model name" | uniq
model name      : Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
daegalus commented 4 years ago

I tried a few different 20.2 versions today and they all performed worse. I don't know if there is a difference in our 620 chips, as my CPU is a different model and the device ID for the 620 is different. It shouldn't be different, but I dont know.

For now if I use the i965 driver, I get about 1-2ms latency on decode. and 23-30ms latency on decode for iHD. Along wit it freezing or falling too far behind after a minute. Near impossible to play anything on the iHD using Parsec. This is h264 as the AMD host drivers arent working for h265 or something according to Parsec devs.

daegalus commented 4 years ago

Any updates on this? Anything else I can provide to help with this?

solnyshok commented 3 years ago

similar situation here. parsec decoding with iHD is ~20ms, with i965 1-2ms. how do I install 20.2pre versions on Fedora 32? (Thinkpad L490, i8365u, GPU Intel UHD 620, Wayland)

wangyan-intel commented 3 years ago

@Daegalus Thanks for your trace information. Could you please add another libva trace env for more information like the following? echo "LIBVA_TRACE=/home//trace/file" | sudo tee -a /etc/libva.conf echo "LIBVA_TRACE_BUFDATA=1" | sudo tee -a /etc/libva.conf Thanks. Yan Wang

daegalus commented 3 years ago

This might take me a while, as I had to switch back to Windows for other reasons. I can setup a dual boot to test this though, but It might take a while.

wangyan-intel commented 3 years ago

This might take me a while, as I had to switch back to Windows for other reasons. I can setup a dual boot to test this though, but It might take a while.

No problem.

kvernNC commented 3 years ago

tried today with fedora 33 and parsec client. Decode latency seems better since last time I tried (with fedora 32), but after starting a video game, parsec client started to ask reducing quality and image frozen. It works perfectly with i965 driver.

Here are the traces with BUFDATA flags, connecting to parsec and launching a game until menu is shown. With ihd, menu was never displayed, only audio.

ihd: https://drive.google.com/file/d/1npAScpEJgR8hC3cj2fh92c6VXCDV1K1s/view?usp=sharing i915: https://drive.google.com/file/d/1WoBNAa6lYIhmQ5JBt4j6y214Bp2Ep1Ur/view?usp=sharing

GPU: 00:02.0 VGA compatible controller [0300]: Intel Corporation Skylake GT2 [HD Graphics 520] [8086:1916] (rev 07)

CPU: model name : Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz

packages: intel-media-driver 20.3.0-1 intel-gmmlib 20.3.2-1 libva 2.9.0-1

Hope this helps.

wangyan-intel commented 3 years ago

Thanks for your info. We will check it.

wangyan-intel commented 3 years ago

@Jexu Could you please take a look? Thanks.

Jexu commented 3 years ago

@wangyan-intel sure, will investigate it.

XinfengZhang commented 3 years ago

@Jexu , any update?

T4cC0re commented 3 years ago

Adding a few more observations.

This is reliably reproducible with the following parsec host settings:

OS: Ubuntu 20.04.2 LTS x86_64 Kernel: 5.4.0-71-generic CPU: Intel i9-9980HK GPU: NVIDIA Quadro T2000 Mobile / Max-Q (disabled) GPU: Intel UHD Graphics 630

 >>> dpkg -l | egrep 'intel'
ii  intel-media-va-driver:amd64                20.1.1+dfsg1-1                        amd64        VAAPI driver for the Intel GEN8+ Graphics family
ii  intel-microcode                            3.20201110.0ubuntu0.20.04.2           amd64        Processor microcode firmware for Intel CPUs
ii  libdrm-intel1:amd64                        2.4.102-1ubuntu1~20.04.1              amd64        Userspace interface to intel-specific kernel DRM services -- runtime
ii  libdrm-intel1:i386                         2.4.102-1ubuntu1~20.04.1              i386         Userspace interface to intel-specific kernel DRM services -- runtime
ii  xserver-xorg-video-intel                   2:2.99.917+git20200226-1              amd64        X.Org X server -- Intel i8xx, i9xx display driver

on iHD: (via LIBVA_TRACE=/dev/shm/iHD_trace/file LIBVA_DRIVER_NAME=iHD)

libva info: Open new log file /dev/shm/iHD_trace/file.171559.thd-0x0023efcb for the thread 0x0023efcb
libva info: LIBVA_TRACE is on, save log into /dev/shm/iHD_trace/file.171559.thd-0x0023efcb
libva info: VA-API version 1.7.0
libva info: User environment variable requested driver 'iHD'
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_7
libva info: va_openDriver() returns 0
[I 2021-04-08 19:15:59] VAAPI: (1.7) Intel iHD driver for Intel(R) Gen Graphics - 20.1.1 ()
libva info: Save context 0x10000000 into log file /dev/shm/iHD_trace/file.171559.thd-0x0023efcb
[D 2021-04-08 19:16:04] Decoder failure, queued_frames=45
[D 2021-04-08 19:16:10] Decoder failure, queued_frames=77

The display is very laggy (total time from input to display of multiple seconds). The more action is on screen, the more time passes for all frames to be rendered.

on i965: via (LIBVA_TRACE=/dev/shm/i965_trace/file LIBVA_DRIVER_NAME=i965)

libva info: Open new log file /dev/shm/i965_trace/file.171734.thd-0x0023f4bd for the thread 0x0023f4bd
libva info: LIBVA_TRACE is on, save log into /dev/shm/i965_trace/file.171734.thd-0x0023f4bd
libva info: VA-API version 1.7.0
libva info: User environment variable requested driver 'i965'
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so
libva info: Found init function __vaDriverInit_1_6
libva info: va_openDriver() returns 0
[I 2021-04-08 19:17:34] VAAPI: (1.7) Intel i965 driver for Intel(R) Coffee Lake - 2.4.0
libva info: Save context 0x02000000 into log file /dev/shm/i965_trace/file.171734.thd-0x0023f4bd

The display lag is nearly indistinguishable from non-streamed output.

To me queued_frames as well as the seconds long delay indicates some bottleneck in the iHD driver, that causes frames to be queued

The traces are attached below.

traces.zip

wangyan-intel commented 3 years ago

@Jexu Could you please take a look? Thanks.

T4cC0re commented 3 years ago

Since it seems that iHD has become the default driver in recent distributions (mine above is Ubuntu 20.04 on one machine) is there something you can communicate in terms of what priority this is for you?

While manually specifying a different driver is a viable workaround, I would much prefer if this would be looked at 😃

I only found this issue by accident and having this weird slowdown behavior is really unexpected as a user.

lUNuXl commented 2 years ago

Hello,

Is this bug being worked on? The software decoder takes less time (~10 ms) than the hardware one (45+ ms).

larry@toaster ~> lspci -nn | grep VGA ; cat /proc/cpuinfo | grep "model name" | uniq
00:02.0 VGA compatible controller [0300]: Intel Corporation UHD Graphics 620 [8086:5917] (rev 07)
model name      : Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
larry@toaster ~> cat /etc/os-release 
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"
LOGO=archlinux-logo
larry@toaster ~> uname -a
Linux toaster 5.10.88-2-lts #1 SMP Wed, 22 Dec 2021 19:16:31 +0000 x86_64 GNU/Linux

edit 3:

I don't think I've had the right drivers installed in the first place I guess, also I had some gpu virtualization modules and cmdline in the kernel, so I guess the previous logs (in edit history) can be ignored(?)

However now I'm pretty sure that everything in my system is setup correctly, so:

larry@toaster ~> parsecd
[D 2022-01-05 03:22:14] stun4         = 52.86.26.213:3478
[D 2022-01-05 03:22:16] net           = BUD|::ffff:74.82.28.6|22803
[D 2022-01-05 03:22:16] decoder       = software
[I 2022-01-05 03:22:23] VAAPI: (1.13) Intel iHD driver for Intel(R) Gen Graphics - 21.4.3 ()
[D 2022-01-05 03:22:32] decoder       = software
[I 2022-01-05 03:22:43] VAAPI: (1.13) Intel iHD driver for Intel(R) Gen Graphics - 21.4.3 ()
larry@toaster ~> export LIBVA_DRIVER_NAME=i965
larry@toaster ~> parsecd
[D 2022-01-05 03:23:03] stun4         = 52.86.26.213:3478
[D 2022-01-05 03:23:06] net           = BUD|::ffff:74.82.28.6|22803
[I 2022-01-05 03:23:06] * vaapi_init/WelsCreateDecoder[274] = -1
[D 2022-01-05 03:23:06] decoder       = software
[D 2022-01-05 03:23:19] decoder       = software
[I 2022-01-05 03:23:32] * vaapi_init/WelsCreateDecoder[274] = -1
[D 2022-01-05 03:23:32] decoder       = software

I was switching from software to hardware rendering just to make sure, and I'm confident that the software decoding takes at most 15ms and the hardware decoding takes more than 90ms - tested using the newest media-driver from archlinux repository ( https://archlinux.org/packages/community/x86_64/intel-media-driver/ )

dvrogozh commented 2 years ago

Can someone, please, help to collect strace from iHD and i965 runs?

Is that possible to dump incoming bitstream which is sent for decoding? Alternatively, which encoder is used to produce bitstream? is encoding cmdline known?

Can someone clarify whether encoding part remains the same in these experiments? I.e. is the issue narrowed down to decoding part? And to double check, encoding and game is running on other system rather than the one used for decoding?

lUNuXl commented 2 years ago

@dvrogozh

Can someone, please, help to collect strace from iHD and i965 runs?

Sure, just give me the instructions on how to do this and etc. :slightly_smiling_face:

Alternatively, which encoder is used to produce bitstream?

The issue at least in my case (I'm not using my laptop for encoding any video(s), and if I do, I'm doing it unknowingly) is with decoding video stream Decoder: (VAAPI) H.264

is encoding cmdline known?

You'll have to look into "parsec" application source for this information I guess

And to double check, encoding and game is running on other system rather than the one used for decoding?

Yes, the game is running on the remote host, the only thing running locally (on the affected machine) is hardware decoder, the only GPU present in the affected machine is Intel UHD Graphics 620 ( CPU: Intel i5-8250U)

atahrijouti commented 2 years ago

I only accidentally arrived at this issue today after 3 days of juggling every other parameter never thinking that the actual proprietary driver was behaving worse than the Open Source one. It could just be the fact that parsec is not actively maintained in Linux though, as they might be using some feature that was deprecated before and has since been modified in iHD

lUNuXl commented 2 years ago

Workaround seems to not work anymore after the newest parsec update.

This means that this application is literarlly unusable on linux with integrated intel GPUs due to this bug.

spiderkeys commented 1 year ago

We are having what appears to be the same problem. In the past, the following set of libraries/drivers worked well for real-time decoding of a 1080p@60fps stream:

intel-media-driver 20.3.0
gmmlib 20.3.2
libva 2.9.0

Using the latest intel-media-driver release (all built from source in release mode), we are now seeing a large increase in CPU usage in our app and decoding latency has increased to the point where we can no longer decode the stream in real time. The versions tested under these conditions are:

intel-media-driver 22.5.4
gmmlib 22.2.1
libva 2.16.0

This slowdown on the newer versions only seem to happen on certain hardware.

I haven't been able to figure out where the issue is coming from, whether it is an issue in the driver, gmmlib, libva, etc.

spiderkeys commented 1 year ago

Upon closer observation, the slowdown is happening on all hardware platforms, it's just that the i7700K and i7-10875H were fast enough to keep up in real time.

On the i7700K, I see an increase in CPU usage from 12% to 60% for decoding a single 1080p@30fps H264 stream - no recompilation of the application, just switching library and driver versions from old to new at launch. On the slower processors, the CPU usage goes from 40-60% to being maxed at 100% (hence why the stream can not be decoded in real-time anymore).

I am decoding via FFmpeg 4.x's VAAPI hw accel API in all cases, if that is helpful.

spiderkeys commented 1 year ago

I believe I was able to narrow the performance degradation down to a set of changes between releases 21.3.5 and 22.1.1.

I built and tested the following combination of library/driver versions (tested both free and non-free, but it made little difference) and got the following results.

The last "release" without performance regression (app using 10-15% CPU):

TAG_LIBVA="2.13.0"
TAG_INTEL_GMMLIB="intel-gmmlib-21.3.1"
TAG_INTEL_MEDIA_DRIVER="intel-media-21.3.5"

Release with performance regression (app using 60-65% CPU):

TAG_LIBVA="2.13.0"
TAG_INTEL_GMMLIB="intel-gmmlib-22.0.1"
TAG_INTEL_MEDIA_DRIVER="intel-media-22.1.1"

I then tried the last 21.x tag (21.4.3), which also had the issue (45-50% CPU usage):

TAG_LIBVA="2.13.0"
TAG_INTEL_GMMLIB="intel-gmmlib-21.3.5"
TAG_INTEL_MEDIA_DRIVER="intel-media-21.4.3"

On the 21.3.5 and earlier releases, I got a flamegraph for the decode call that looks like such: Screenshot from 2022-12-29 20-52-14

On 21.4.3 and newer releases, the flamegraph looked like: Screenshot from 2022-12-29 18-59-14

Culprit commit:

From here, I noticed a nearby commit that touched media_libva.cpp that seemed potentially relevant, aiming to optimize getimage() (497d9864aaba83214cbc2e6ae8cee4151934fe43).

I checked out the commit that also modified media_libva.cpp right before this and found no performance issues (10-15% CPU usage, commit e2011ffcb5dc2d04ffd320ebcfe57fb55522baa7).

Then I checked out the optimization commit (497d9864aaba83214cbc2e6ae8cee4151934fe43) and found that it did indeed introduce the regression (55-60% CPU usage).

I haven't tried all of the intermediate commits between these two, but the regression lies somewhere between.

Any insight into what might be going on here and whether this is expected/has a solution would be appreciated, as we would like to update our driver version to have stable hardware decoding support on newer processors.

Jexu commented 1 year ago

Thanks for your work to find this commit that may help a lot.

There is simiar issue reported in #1537 and there is a fixed in #1569, So can you quickly try if PR #1569 can solve your issue?

spiderkeys commented 1 year ago

Thanks, I had not tried master yet.

I can confirm now that it was indeed 497d9864aaba83214cbc2e6ae8cee4151934fe43 that introduced the regression, as the commit right before it (e2011ffcb5dc2d04ffd320ebcfe57fb55522baa7) is good.

Testing with the following combination (which has PR #1569 included):

TAG_LIBVA="2.16.0"
# Latest master for gmmlib and media-driver
TAG_INTEL_GMMLIB="26ffef5199abcec625651562eb319e983037297d"
TAG_INTEL_MEDIA_DRIVER="c0fbe0e13314e55f38b965ebf2eec4f4cf99ab26"

The issue is still present (app using ~57% CPU).

Jexu commented 1 year ago

So, the issue can reproduce with ffmpeg decode, right? Can you share your testing cmdline and testing bitstream as well, then we can try it on local side.

@MicroYY can you take a look, vaGetImage seems still has issue which is similar with issue in #1537

spiderkeys commented 1 year ago

My testing was done in a private Qt-based drone piloting application that uses ffmpeg's avcodec/avformat/avutil libraries directly for decoding, and my datastream is a live 12Mbps h264 stream from a vehicle, so I don't have a reproducible example I can share, unfortunately.

MicroYY commented 1 year ago

Hi @spiderkeys May I know what's the decode output format? Is it RGB or YUV? 420 or 422 or some other format?

spiderkeys commented 1 year ago

@MicroYY The output format is NV12.

eero-t commented 1 year ago

EDIT: ignore this, it turned out to be kernel regression from around same time, not media-driver one.


I'm also seeing significant decoding performance drop with decode around same date, in my HEVC tests.

However, I see that only with MSDK and FFmpeg QSV, not with FFmpeg VA-API.

Example command lines:

The above mentioned PR (merged couple of weeks ago) does not help iGPUs, as I do not see any decoding performance improvements in my daily media git perf trends for this year (perf drop was visible on all machines I had running when the regression was introduced; BXT, GLK, KBL, CML, and TGL).

eero-t commented 1 year ago

After investigating more, in my case the (10-20%) HEVC decode perf regression happened few days earlier (17th, not 23rd of Nov 2021), and appears to be from drm-tip kernel change, NOT from media driver change.

FYI: the trends I'm looking at, track git perf of the whole GPU driver stack, not just media driver. And that kernel change does not cause more CPU or GPU usage, decode just takes longer (regardless of whether perf or powersave governor is in use).

Note: I'm not seeing drop in H.264 decode perf on Nov 23rd, but I'm not explicitly testing decoding that, only transcode operations using H.264. It's possible that the kernel regression I'm seeing would impact also H.264 decode perf though, depending on which kernel is in use (I think those drm-tip changes were going to 5.16 upstream kernel).

Sorry for the confusion!

MicroYY commented 1 year ago

@spiderkeys I tried to reproduce it with FFMPEG vaapi, but cannot found the perf issue. (decode h264 1080p clip) cmd line: ffmpeg -v verbose -hwaccel vaapi -init_hw_device vaapi=hw:/dev/dri/renderD128 -hwaccel_output_format nv12 -hwaccel_flags allow_profile_mismatch -i 1080p.h264 -lavfi 'null' -c:v rawvideo -pix_fmt yuv420p -fps_mode passthrough -autoscale 0 -vframes 500 -y -f tee 'output.yuv|[f=md5]pipe:1'

The decode fps achieves nearly 200.

I was testing on an ADL machine, and will try to repro on CML. Most probably there should be no diff between them WRT vaGetImage. Also, from the given regression commit 497d9864aaba83214cbc2e6ae8cee4151934fe43, it won't introduce real impact on CML.

  1. Could you try the cmd line on your system?
  2. From the description, your issue is different from the submitter's one. Could you pls submit a new issue to track?
MicroYY commented 1 year ago

@spiderkeys I do observe perf drop from https://github.com/intel/media-driver/commit/497d9864aaba83214cbc2e6ae8cee4151934fe43 in Gen9. Here is a fix: https://github.com/intel/media-driver/pull/1589 Suggest submitting a new issue to further track as the root cause is not the original one. The issue should only exist on Gen8/9/10. Later platforms should be ok.

spiderkeys commented 1 year ago

Hi @MicroYY, I saw the performance degradation across gen7, gen9, and gen10, so I don't think it is limited to just gen9.

That said, I will make a new issue for this.

MicroYY commented 1 year ago

Hi @MicroYY, I saw the performance degradation across gen7, gen9, and gen10, so I don't think it is limited to just gen9.

That said, I will make a new issue for this.

That's expected. Pls try https://github.com/intel/media-driver/pull/1589 We may not fix Gen7 regression as the code has already been removed.

nyanmisaka commented 1 year ago

I think the Gen7 he said is referring to 7th Gen processor Kabylake, those are Gen9 graphics.

MicroYY commented 1 year ago

I think the Gen7 he said is referring to 7th Gen processor Kabylake, those are Gen9 graphics.

Oh yes... the 7700K is KabyLake which belongs to Gen9.

mooninite commented 1 year ago

Here is a fix: #1589

I can confirm this fixes performance for me on Gen 9 / Coffee Lake / Xeon E-2288G.

1080p 60fps transcoding before fix:

1080p 60fps transcoding after fix #1589 :

Thank you!!

spiderkeys commented 1 year ago

I think the Gen7 he said is referring to 7th Gen processor Kabylake, those are Gen9 graphics.

Yes this is what I meant. Sorry for the confusion, and thanks for clearing this up, as I wasn't aware of the differences in processor vs graphics generation versions.

AloncohenL commented 1 year ago

@spiderkeys I do observe perf drop from 497d986 in Gen9. Here is a fix: #1589 Suggest submitting a new issue to further track as the root cause is not the original one. The issue should only exist on Gen9. Later platforms should be ok.

Hi, Will start by saying thank you for the fix.

Long story short: In the last week I debugged a very similar performance degradation when decoding RGBA and NV12 with VAAPI at FHD resolution.

My CPU: 12 cores, Intel(R) Xeon(R) E-2286G CPU @ 4.00GHz GPU: Intel Coffeelake (Gen9), using IHD driver Ubuntu 22.04.1

media driver versions I used:

To reproduce the issue I ran this GStreamer pipeline that decodes mp4 to NV12: gst-launch-1.0 filesrc location= name=src ! qtdemux ! vaapidecodebin ! queue ! vaapipostproc ! video/x-raw, format=NV12, width=1920, height=1080, pixel-aspect-ratio=1/1 ! queue leaky=no max-size-buffers=30 max-size-bytes=0 max-size-time=0 ! fpsdisplaysink video-sink=fakesink sync=false text-overlay=false

The performance was a bit lower than I expected - around 90 FPS.

Another issue I encountered was decoding FHD to RGBA using vaapipostproc. I saw a large difference in framerate between a pipeline that only decodes to a pipeline that decodes and performs another task on the host. format convertsion/resize performed on the host cpu, increases the performance significantly.

Also tried to find a difference using perf record + flamegraph and saw a difference in the vaGetImage call (~28% vs ~8%).

Yesterday I almost gave up trying, and took a last look at this thread, To my surprise I saw that you committed this fix and tried to compile it myself.

media driver versions I moved to:

Tested again and seems like it helps: NV12 FHD decode now performs on 680 fps! instead of 90 fps, which helps a lot.

RGBA still providing the same behaviour which is a bit strange. If this behaviour is also related or known to you I would love to get a short explanation or an open issue.

Thanks