moonlight-stream / moonlight-qt

GameStream client for PCs (Windows, Mac, Linux, and Steam Link)
GNU General Public License v3.0
10.02k stars 584 forks source link

Intel UHD very high decode time at 60 fps framerate, not at 30 fps #1085

Open w0utert opened 11 months ago

w0utert commented 11 months ago

When streaming from a Linux host (Sunshine 0.20.0) to a Windows 11 client (Moonlight-qt 4.3.1), the 'average decode time' as printed by Moonlight-qt is very high (~40ms with spikes above ~100ms) when I set the client to 60 fps, while it is very low (less than ~1 ms) when I set the client to 30 fps.

Right now, for my setup decoding time is by far the biggest contributor to latency, the network latency and queue time together are less than ~4ms but the decoding time is ten times as much. I'm 100% sure the Intel UHD in the client (Lenovo T14 Gen 3) should be able to decode a stream like this effortlessly, so I've been banging my head against the wall trying all kinds of different settings on the encoder and decoder side to debug, to no avail. I never thought to lower the framerate in the client though, only now that I did I see it completely solves the decoder time which is now sub-ms. This is great for latency but obviously I would like to be able to stream at 60fps and not 30fps.

Steps to reproduce Set client to 4K@60Hz, connect, open performance overlay using ctrl-alt-shift-S, check numbers.

No other setting (vsync, frame pacing, fullscreen/borderless) except lowering the framerate improves the latency. Drop the framerate to 30 and the decoding time immediately drops to a very low value.

Screenshots 60fps:

decode_60fps

30fps:

decode_30fps

Affected games This is a remote desktop, not a game, but I expect this is a decoder problem and not game/application specific

Moonlight settings

Client PC details

Server PC details

Moonlight Logs (please attach) log_60fps.log log_30fps.log

w0utert commented 10 months ago

Just checked with the brand-new Moonlight-qt 5.0.0 version, and the behavior there is different and better in some ways, but worse in others

At 60 fps the decode time now seems entirely arbitrary, one session they hover around ~30ms, sometimes even dropping as low as ~1ms for a few seconds. Then when disconnecting/reconnecting they may again hover around ~150ms or more. There's also a lot more spikes up in decoding time especially when moving the mouse. It's very annoying because the decoder time immediately affects the mouse cursor latency, and having a variable latency that is all over the place makes it impossible for my brain to adjust.

Another possibly interesting data point is that the behavior seems slightly better when using H265 instead of H264, as in: more sessions will have lower decoding latencies more often. Not always though, the session I have open right now has been hovering around ~150ms for 20 minutes now. Moonlight-qt 4.3.1 H265 seemed to always have the high decoding time.

Considering it is possible to get low decoding times some of the time, and the observed randomness, I do suspect a Moonlight-qt bug here, maybe something timing-related where some part of the decoding process is holding a lock or synchronizes incorrectly? I don't remember ever seeing this problem using Sunshine + Moonlight-qt on my M1 MacBook Pro for example.

w0utert commented 10 months ago

Another data point: the decoder time seems to be directly related to other processes using the GPU, for example if I scroll a website in a browser on a second screen, decoder time will immediately jump from ~2ms to ~150ms, or when playing fullscreen video on the other screen sometimes even as high as ~500ms. Sometimes but not always, even with fullscreen video playing on the other screen the decoding sometimes stays arond ~2ms for multiple seconds at a time.

I was starting to suspect the issue may be related to thermal throttling because the spikes seemed less with the laptop lid open vs closed. But I don't think this explains it because loading the CPU to 100% makes no difference on decoding time, and I cannot believe just scrolling a web page will kick GPU throttling on/off immediately, and also I don't think any kind of throttling would cause GPU video decoding time to go from ~2ms to ~500ms. I have also never seen other fullscreen video playback breaking down completely when some other task does some light GPU work on a second screen, at 100+ms any video playblack would drop to unwatchable framerates. So there is definitely something fishy here.

w0utert commented 10 months ago

Just one more update with additional info and then I'll leave it at that for the time being.

To illustrate the weirdness of the effect:

Now I scroll the webpage in Edge a bit which brings into view the following region of the webpage, which contains a tiny dot that is animated (it pulsates and changes color). That's literally the only thing changing on screen 2:

ss

Obviously animating a tiny region of a webpage should not affect video decoding time like this, something is definitely broken in either Moonlight-qt, in D3D11VA, or in the Intel UHD driver. But as mentioned before I have not noticed any problem with any other application besides Moonlight-qt so far...