mpv-player / mpv

🎥 Command line video player
https://mpv.io
Other
28.79k stars 2.93k forks source link

mpv 0.10.0 vaapi-copy and high cpu use. #2317

Closed kokoko3k closed 9 years ago

kokoko3k commented 9 years ago

mpv 0.9.2:

mpv 1080ph264.mkv --hwdec=vaapi-copy -vo opengl
top -b|grep mpv
17003 koko      20   0 1074,7m 137,0m  23,5  3,6   0:03.95 S          `- mpv
17003 koko      20   0 1074,7m 137,0m  29,8  3,6   0:04.40 S          `- mpv
17003 koko      20   0 1074,7m 137,0m  27,0  3,6   0:04.81 S          `- mpv
^X17003 koko      20   0 1074,7m 137,0m  28,5  3,6   0:05.24 S          `- mpv

mpv 0.10.0

17173 koko      20   0 1075,9m  89,9m  81,2  2,3   0:04.80 R          `- mpv
17173 koko      20   0 1075,9m  90,1m  90,1  2,3   0:06.16 R          `- mpv
17173 koko      20   0 1075,9m  90,3m  88,1  2,3   0:07.49 S          `- mpv

happens (at least) with opengl video output and vaapi video output.

cpu is an i3-3110M CPU, ivybridge.

ghost commented 9 years ago

Please bisect git for the change that caused this.

kokoko3k commented 9 years ago

Little time to spend, but i'll do asap.

kokoko3k commented 9 years ago

Back on topic:

https://github.com/mpv-player/mpv/commit/7ef8f457a8631ff09a928ee5aa6bb3f5fabfce11: High cpu

https://github.com/mpv-player/mpv/commit/bc68794acc067d8862525caa41ab3922ef9239bc and https://github.com/mpv-player/mpv/commit/50bd2807ad4f94bcf215991566336060ebfdd249 seems like a transition to the high cpu bug; first a message saying hw decoding is failed and fallback to sw decoding; next the video stops for a couple of seconds, then it finally starts, with high cpu.

AV: 00:00:00 / 02:21:18 (0%) A-V: -0.000 Dropped: 1
[ffmpeg/video] h264: hardware accelerator failed to decode picture
Error while decoding frame!
Hardware decoding failed, falling back to software decoding.
AV: 00:00:00 / 02:21:18 (0%) A-V: -0.000 Dropped: 1

Audio/Video desynchronisation detected! Possible reasons include too slow
hardware, temporary CPU spikes, broken drivers, and broken files. Audio
position will not match to the video (see A-V status field).

finally, https://github.com/mpv-player/mpv/commit/e3e20f1431fa0c2d7988234787994d87fb6385c2: Low cpu, last good commit.

ghost commented 9 years ago

Can you post a log where it works, and one where it doesn't?

kokoko3k commented 9 years ago

http://wpage.unina.it/aorefice/sharevari/logs.7z logs with various debug level from all modules for (bad) commit 7ef8f45 and (good) commit e3e20f1

kokoko3k commented 9 years ago

I accidentally deleted the log files,did you managed to get them? i'll produce them again, if needed. Sorry for that...

ghost commented 9 years ago

Unfortunately not.

kokoko3k commented 9 years ago

I've still had a local copy on /tmp, online again.

Gusar321 commented 9 years ago

I've been researching this a bit... In the past, mpv worked like this:

hwdec/vo        | download      upload
---------------------------------------------
vaapi-copy/vaapi| GetImage      DeriveImage
vaapi-copy/gl   | GetImage      n/a
vaapi-copy/xv   | GetImage      n/a
no/vaapi        | n/a           DeriveImage

Now this is the case:

hwdec/vo        | download      upload
---------------------------------------------
vaapi-copy/vaapi| DeriveImage   PutImage
vaapi-copy/gl   | DeriveImage   n/a
vaapi-copy/xv   | DeriveImage   n/a
no/vaapi        | n/a           DeriveImage

It appears that while in theory DeriveImage is better than GetImage for downloading, practice does not match this theory. There's the issue of Xv needing software nv12->yuv420 conversion, which I already mentioned here: https://github.com/mpv-player/mpv/issues/2123#issuecomment-121706471 While OpenGL doesn't have that (it can deal with nv12 just fine), the need to constantly destroy and re-create the surface seems to be a performance killer.

This doesn't affect me personally, I don't care about vaapi-copy (I use vaapi/vaapi, with no/vaapi as the automatic fallback for formats the hardware can't decode), but for those who do care about vaapi-copy the current situation isn't quite good.

Reverting 50bd280 restores the old behavior.

ghost commented 9 years ago

Can you try the vaapi-gpumemcpy branch? Note that it will just crash if your CPU doesn't support sse4.1. Also, your compiler must support the intrinsics used here.

Gusar321 commented 9 years ago

First test, a hardcore video I just downloaded earlier today, the Honey Bees video available at http://4ksamples.com/. mpv mainline can't even play the video smoothly! The vaapi-gpumemcpy branch can, but it needs more resources than if I revert 50bd280. In turbostat numbers: opengl: http://pastebin.ca/3170288 xv: http://pastebin.ca/3170294

Note how mainline needs a minute to go through 20 secs of video (I used --end=20 for the test)! So gpumemcpy helps a lot, but it's still worse than GetImage. And Xv is much easier on the resources compared to OpenGL.

After seeing those crazy numbers, let's repeat the exercise, but with a more common 720p video: opengl: http://pastebin.ca/3170308 xv: http://pastebin.ca/3170312

Interesting, here gpumemcpy wins. Is 4K too much for a poor Haswell GPU? Or is downscaling that much more intensive than upscaling (both videos were scaled to fit my 1440x900 display)?

Next test, 1080p Blu-ray VC-1 video: opengl: http://pastebin.ca/3170375 xv: http://pastebin.ca/3170376

Here too gpumemcpy wins.

And finally, something a bit different, the Honey Bees video without copying, opengl vs vaapi scaling: http://pastebin.ca/3170379 <- that confirms it, downscaling with OpenGL is really stressful for an Intel GPU, while the fixed function VAAPI scaler doesn't even break a sweat.

In the end, I'd say go for gpumemcpy. It's the winner in the more common cases. Those who play 4K videos probably have a display to match, so they won't be hitting resource intensive downscaling. Or one could simply use vaapi output instead of opengl.

ghost commented 9 years ago

With what vo_opengl settings do you run this? What pixel formats were used on the VO?

Gusar321 commented 9 years ago

Plain --vo=opengl, no extra settings. Pixel format is nv12 with DeriveImage, yuv420p with GetImage.

ghost commented 9 years ago

You can try if --vo=opengl:dumb-mode=yes helps a little.

Gusar321 commented 9 years ago

Wow, huge difference with dumb-mode. This time gpumemcpy wins. http://pastebin.ca/3170997

Do Intel GPUs suck so much, or is something else not right? Perhaps a setting that's on by default that Intel doesn't like. Is there a simple list of things dumb-mode disables, so I could try disabling them one by one?

ghost commented 9 years ago

Do Intel GPUs suck so much, or is something else not right?

Both hardware and drivers are kind of weak.

Is there a simple list of things dumb-mode disables, so I could try disabling them one by one?

By now, normal mode does multiple things, at least: 1. merge chroma planes into a nv12 framebuffer (if the source format is yuv420), 2. convert yuv to a RGB framebuffer.

So with yuv420, there are 2 framebuffer indirections. Both are skipped with dumb-mode. This mode was in fact made for very weak hardware.

CC @haasn.

Gusar321 commented 9 years ago

Oh, right, FBO was made the default recently. So I tried different formats, fbo-format=rgb does better than the default (rgb16?), but still not as efficient as dumb-mode: http://pastebin.ca/3171933 Using 1080p instead of 4k video, the difference is much smaller, but still there: http://pastebin.ca/3171935

haasn commented 9 years ago

It's possible we could re-add the chroma merging skip in a more robust/clean way, if this causes noticeable issues on typical iGPU hardware watching typical content. (dumb-mode is a solution, but it's also an unnecessary hassle for users of iGPUs who don't necessarily know it even exists and are wondering why mpv stutters.)

Anyway, we made the change with the knowledge in mind that the case where you're watching 4K video on an iGPU using a low-res panel is uncommon enough to the point where we don't really worry about it too much. A small performance degrade was expected, but since the typical case is watching low res videos and upscaling them, this is a performance loss we were accepting intentionally.

Note: It would also theoretically be possible to downscale only the luma plane and keep chroma planes at their native resolutions if we're going to be downscaling either way. But this would not fit cleanly into the code and would also involve some loss of precision (not that it would matter much).

ghost commented 9 years ago

I hope this is sufficient now. I won't add anything special just for vo_xv. (I'd rather remove vo_xv completely instead.)

ghost commented 7 years ago

mpv git master if used with FFmpeg git master removes the "GPU memcpy" again. So if you care...

kokoko3k commented 7 years ago

If i understand correctly, high cpu use is expected again?

ghost commented 7 years ago

Hopefully not?