moonlight-stream / moonlight-common-c

Core implementation of Nvidia's GameStream protocol
GNU General Public License v3.0
458 stars 167 forks source link

Increase first frame timeout #93

Open jazeya opened 3 months ago

jazeya commented 3 months ago

The issue itself is well-described here in comments. All credits go to nenkoru. For some configurations it takes longer than 10 seconds to start receiving stream. I experienced it on Hyper-V machine as well. First few connection attempts always time out. Increasing FIRST_FRAME_TIMEOUT_SEC solves the issue. It looks like it takes about 15 seconds for my particular configuration to properly start. I believe changing timeout to at least 30 seconds looks reasonable.

cathyjf commented 1 month ago

Pull request #94 should be accepted, particularly because the situation has gotten a lot worse since LizardByte/Sunshine#3002 was merged.

When @cgutman closed pull request #94, I'm not sure he fully understood the situation. This pull request is not trying to work around a bug in Sunshine. It's trying to work around a bug in Hyper-V. Although it's possible that we might find a workaround for this Hyper-V bug, I am doubtful of that, at least in the short term.

Let me explain the issue a bit more clearly than the previous posters have, so that maybe @cgutman will merge pull request #94. The issue is basically that Hyper-V has a bug such that, when the virtual machine is configured to use GPU paravirutalization, some graphics API calls are inordinately slow. I don't know exactly which ones are, but to give you some examples, the following operations are inordinately slow:

  1. Displaying anything on the secure desktop (such as UAC prompts);
  2. Changing which monitor is the primary display;
  3. Changing resolution;
  4. Enabling or disabling "true" full-screen exclusive mode in certain older games that use it; and
  5. Running Sunshine's ddprobe application.

Most of these have easy workarounds, so GPU-P with Hyper-V remains very useful. For example, to deal with (1), I've simply disabled the secure desktop. To deal with (2) and (3), I don't change which monitor is the primary display and I don't change the resolution. To deal with (4), I don't use "true" full-screen exclusive mode, which is mostly easy to avoid because only certain older games actually use it (indeed, Windows will automatically prevent most games from using it, but some manage to use it). For those particularly recalcitrant games that insist on using full-screen exclusive, I inject github.com/SpecialKO/SpecialK to prevent them from making the API calls that would engage full-screen exclusive.

Unfortunately, there is no workaround for item (5). The ddprobe application is inordinately slow, no matter what you do, in certain GPU-P configurations. I don't know exactly what causes it, but it's highly likely to be a Hyper-V bug, like the other examples, and not really a problem with Sunshine.

Prior to the merger of LizardByte/Sunshine#3002, connecting to my Sunshine virtual machine host using Moonlight took around 8-12 seconds, with a roughly uniform distribution of times. As a result, the connection would fail around half the time. It was annoying having to attempt to connect more than once, but at least the average number of attempts was 2. Now, after the merger of LizardByte/Sunshine#3002, ddprobe is invoked quite a few more times for me, and the connection time is now around 10-14 seconds. As a result, connection attempts fail around 95% of the time now, requiring over 20 attempts to connect to the host, and it's purely a game of chance. This is totally unacceptable so obviously I have to revert LizardByte/Sunshine#3002, which I will do.

Merging pull request #94 would completely resolve this issue. It's a pain for me to merge it myself, because then I would need to maintain local builds for the iOS, Android, Windows, and macOS client binaries -- all of which I actively use. There is no real downside to merging pull request #94. Currently, the timeout just forces me to connect many times to succeed, even though nothing has failed on the server side -- it's just not finished being slow yet.

cgutman commented 1 month ago

Prior to the merger of https://github.com/LizardByte/Sunshine/pull/3002, connecting to my Sunshine virtual machine host using Moonlight took around 8-12 seconds, with a roughly uniform distribution of times. As a result, the connection would fail around half the time. It was annoying having to attempt to connect more than once, but at least the average number of attempts was 2. Now, after the merger of https://github.com/LizardByte/Sunshine/pull/3002, ddprobe is invoked quite a few more times for me, and the connection time is now around 10-14 seconds. As a result, connection attempts fail around 95% of the time now, requiring over 20 attempts to connect to the host, and it's purely a game of chance. This is totally unacceptable so obviously I have to revert https://github.com/LizardByte/Sunshine/pull/3002, which I will do.

This is exactly what I mean when I said in #94 that issues that would require more time to connect are bugs in Sunshine! We should absolutely not be papering over the regression in https://github.com/LizardByte/Sunshine/pull/3002 or any other bugs by letting Moonlight sit for ages waiting for the first frame. Did you report this regression anywhere to Sunshine?

The expensive stream startup operations on the Sunshine side are supposed to happen prior to replying to the /launch and /resume requests over HTTPS (which have a long enough client-side timeout to absorb the delay and a nice UI to tell users we're still busy starting the stream). Significant delays outside of that context are a bug.

cathyjf commented 1 month ago

Moonlight is raising the error in question because it hasn't received the first video data within 10 seconds. However, Sunshine can't send any video data unless the slow part of setting up the stream is already finished. So, the way I see it, this could be viewed as either a bug in Moonlight or in Sunshine or in both.

The potential bug in Moonlight is that Moonlight raises an error and ends the stream even though no actual error has occurred. That bug is what this ticket and the pull request you closed are trying to fix.

The potential bug in Sunshine is that Sunshine isn't able to send video data fast enough to meet Moonlight's needs. This is more difficult to describe as a bug at all in my opinion.

If you insist on the change being made in Sunshine, I suppose the solution has to be that Sunshine will need to send some all-black frames while it waits for the actual video stream to start. That way, Moonlight will not prematurely end the connection. This feels like a hack, but it might be necessary if there's nothing we can consider doing on the Moonlight side.

cathyjf commented 1 month ago

To further clarify this, I understand what you are saying, but the reason Sunshine starts the stream before the video is working is that everything except video is already working by that point (e.g., audio and inputs). If we blocked all initialization on the video stream starting up, this would actually be a regression in Sunshine too.

cgutman commented 1 month ago

To be clear, I'm not trying to throw the issue over the fence. I'm a Sunshine developer too, and it will likely be me debugging the issue there too. I've already found the bug in https://github.com/LizardByte/Sunshine/pull/3002 and will be preparing a PR shortly (the logic to remember we've run ddprobe was accidentally deleted in the refactoring).

Maybe it's true that GPU-Paravirtualization in Hyper-V is just really slow, but I would like to step through the Sunshine code in a debugger to see for myself. Sunshine's display enumeration logic for Windows is quite complex, because it has to deal with all sorts of things like Windows lying about which adapter is connected to a display, displays that might need to wake up before capture, transient capture failures due to the UAC Secure Desktop or Winlogon, emulated displays that can only capture black frames, etc. Given that Steam also complains about not being able to find a display in this scenario, it's very possible that our logic there is tripping up and doing a lot more waiting or retrying than it needs to.

pcl04dl3tt3r commented 1 month ago

I am using Hyper-V GPU-PV with Win11 VMs running the latest version of Sunshine. I use multiple different FireTV streaming sticks as my Moonlight clients, which all seem to suffer from this timeout more so than my Android phone FWIW. I also noticed that the problem is consistently worse if the VM is still at the login screen (takes 5 attempts rather than 2). It seems to me that it would be great if the Moonlight client just obeyed the timeout value set in Sunshine -- which I have set to 30s -- rather than timing out sooner. But maybe I'm not understanding the purpose of the timeout setting on the server.

cgutman commented 1 month ago

@pcl04dl3tt3r Have you tried the latest pre-release of Sunshine? The bug that delayed the first frame has already been fixed.

duracell commented 1 month ago

I tried it and it worked. I have no disconnect anymore. It stills needs the time until the desktop is shown, but as I understand this is can't be fixed and is normal, right?

cgutman commented 1 month ago

I'm not sure that the delay is unfixable, but nobody has found a way to fix it as of today.

devusr1x commented 1 month ago

A few points about this issue:

  1. I have just tried the latest pre-release build of Sunshine and it seems to be working, although there is a huge delay to connect, specially when the machine recently booted. Thank you. The same delay is also present on Parsec while connecting to a Hyper-V machine, but Moonlight's behavior seems to at least match what Parsec does now. I wanted to move back to Sunshine after the introduction of 4:4:4, and now it seems that connections through Moonlight are not a dice roll anymore.
  2. Regarding "GPU-Paravirtualization in Hyper-V being just really slow" - I don't think this is true at all. If you connect to the same machine using Windows' Hyper-V client (vmconnect), UAC prompts are fast, and running exclusive fullscreen applications is less of an issue (yes, even with the GPU perfectly working inside the virtual machine and rendering complex scenes). In fact, a workaround for games that default to exclusive fullscreen on boot is connecting through vmconnect, changing it to borderless windowed and connecting through Moonlight. Hyper-V's vmconnect seems to be using the same technology as Windows RDP, so it's also sending a video feed somehow, which does not have this kind of weird interactions with UAC/fullscreen, or at least they are way less obvious or likely to happen. Parsec has the exact same issue from what I can remember. This issue is rather weird, because if you leave an exclusive fullscreen application running for 1-2 minutes without touching anything, sometimes Sunshine seems to actually find a video feed and works properly.
  3. There are definitely some issues (actual old bugs) in Hyper-V, even in very recent builds (I'm running the latest build of 24H2, and this has been around for a while). If you search around the web, you will find people complaining about Hyper-V's own vmconnect dropping the feed randomly a while after connecting when booting the machine. This seems to be somehow related to Hyper-V's Enhanced Session feature opening a new user session. This is a very superficial description of the issue since I don't know exactly what's going on here, but I've never found an actual fix and my workaround was using my own script to boot the machine, then wait around ~10s before running vmconnect automatically through the script. If, for some ungodly reason, any of you want to know more about this, just google "hyper-v disconnecting" or something similar, you will notice that this is ancient. I have found this workaround (i.e. waiting) by accident and haven't seen it mentioned anywhere else. Some additional info that is related more to Hyper-V than Sunshine, but I might as well share:
    • Hyper-V boot + instant vmconnect connection: instant video feed -> feed has a likelihood to drop
    • Hyper-V boot + delayed (~10s) vmconnect connection: instant video feed -> likelihood to drop is basically zero unless your drive is really slow and the system has not properly booted yet
    • Hyper-V boot + Moonlight/Sunshine connection -> I can obviously only connect after Sunshine boots, and still have to wait some additional seconds on a black screen
    • Hyper-V boot + Moonlight/Sunshine connection after the first connection -> very fast connections, vmconnect-like behavior using the latest Sunshine preview build

I probably had more to note but this was just some late night fast typing with bad grammar, I will add more later if I can. It's also worth noting that in these tests I connected to Sunshine through a Hyper-V Internal switch, so any delay has nothing to do with router weirdness.