Open jazeya opened 3 months ago
Pull request #94 should be accepted, particularly because the situation has gotten a lot worse since LizardByte/Sunshine#3002 was merged.
When @cgutman closed pull request #94, I'm not sure he fully understood the situation. This pull request is not trying to work around a bug in Sunshine. It's trying to work around a bug in Hyper-V. Although it's possible that we might find a workaround for this Hyper-V bug, I am doubtful of that, at least in the short term.
Let me explain the issue a bit more clearly than the previous posters have, so that maybe @cgutman will merge pull request #94. The issue is basically that Hyper-V has a bug such that, when the virtual machine is configured to use GPU paravirutalization, some graphics API calls are inordinately slow. I don't know exactly which ones are, but to give you some examples, the following operations are inordinately slow:
ddprobe
application.Most of these have easy workarounds, so GPU-P with Hyper-V remains very useful. For example, to deal with (1), I've simply disabled the secure desktop. To deal with (2) and (3), I don't change which monitor is the primary display and I don't change the resolution. To deal with (4), I don't use "true" full-screen exclusive mode, which is mostly easy to avoid because only certain older games actually use it (indeed, Windows will automatically prevent most games from using it, but some manage to use it). For those particularly recalcitrant games that insist on using full-screen exclusive, I inject github.com/SpecialKO/SpecialK to prevent them from making the API calls that would engage full-screen exclusive.
Unfortunately, there is no workaround for item (5). The ddprobe application is inordinately slow, no matter what you do, in certain GPU-P configurations. I don't know exactly what causes it, but it's highly likely to be a Hyper-V bug, like the other examples, and not really a problem with Sunshine.
Prior to the merger of LizardByte/Sunshine#3002, connecting to my Sunshine virtual machine host using Moonlight took around 8-12 seconds, with a roughly uniform distribution of times. As a result, the connection would fail around half the time. It was annoying having to attempt to connect more than once, but at least the average number of attempts was 2. Now, after the merger of LizardByte/Sunshine#3002, ddprobe
is invoked quite a few more times for me, and the connection time is now around 10-14 seconds. As a result, connection attempts fail around 95% of the time now, requiring over 20 attempts to connect to the host, and it's purely a game of chance. This is totally unacceptable so obviously I have to revert LizardByte/Sunshine#3002, which I will do.
Merging pull request #94 would completely resolve this issue. It's a pain for me to merge it myself, because then I would need to maintain local builds for the iOS, Android, Windows, and macOS client binaries -- all of which I actively use. There is no real downside to merging pull request #94. Currently, the timeout just forces me to connect many times to succeed, even though nothing has failed on the server side -- it's just not finished being slow yet.
Prior to the merger of https://github.com/LizardByte/Sunshine/pull/3002, connecting to my Sunshine virtual machine host using Moonlight took around 8-12 seconds, with a roughly uniform distribution of times. As a result, the connection would fail around half the time. It was annoying having to attempt to connect more than once, but at least the average number of attempts was 2. Now, after the merger of https://github.com/LizardByte/Sunshine/pull/3002, ddprobe is invoked quite a few more times for me, and the connection time is now around 10-14 seconds. As a result, connection attempts fail around 95% of the time now, requiring over 20 attempts to connect to the host, and it's purely a game of chance. This is totally unacceptable so obviously I have to revert https://github.com/LizardByte/Sunshine/pull/3002, which I will do.
This is exactly what I mean when I said in #94 that issues that would require more time to connect are bugs in Sunshine! We should absolutely not be papering over the regression in https://github.com/LizardByte/Sunshine/pull/3002 or any other bugs by letting Moonlight sit for ages waiting for the first frame. Did you report this regression anywhere to Sunshine?
The expensive stream startup operations on the Sunshine side are supposed to happen prior to replying to the /launch
and /resume
requests over HTTPS (which have a long enough client-side timeout to absorb the delay and a nice UI to tell users we're still busy starting the stream). Significant delays outside of that context are a bug.
Moonlight is raising the error in question because it hasn't received the first video data within 10 seconds. However, Sunshine can't send any video data unless the slow part of setting up the stream is already finished. So, the way I see it, this could be viewed as either a bug in Moonlight or in Sunshine or in both.
The potential bug in Moonlight is that Moonlight raises an error and ends the stream even though no actual error has occurred. That bug is what this ticket and the pull request you closed are trying to fix.
The potential bug in Sunshine is that Sunshine isn't able to send video data fast enough to meet Moonlight's needs. This is more difficult to describe as a bug at all in my opinion.
If you insist on the change being made in Sunshine, I suppose the solution has to be that Sunshine will need to send some all-black frames while it waits for the actual video stream to start. That way, Moonlight will not prematurely end the connection. This feels like a hack, but it might be necessary if there's nothing we can consider doing on the Moonlight side.
To further clarify this, I understand what you are saying, but the reason Sunshine starts the stream before the video is working is that everything except video is already working by that point (e.g., audio and inputs). If we blocked all initialization on the video stream starting up, this would actually be a regression in Sunshine too.
To be clear, I'm not trying to throw the issue over the fence. I'm a Sunshine developer too, and it will likely be me debugging the issue there too. I've already found the bug in https://github.com/LizardByte/Sunshine/pull/3002 and will be preparing a PR shortly (the logic to remember we've run ddprobe was accidentally deleted in the refactoring).
Maybe it's true that GPU-Paravirtualization in Hyper-V is just really slow, but I would like to step through the Sunshine code in a debugger to see for myself. Sunshine's display enumeration logic for Windows is quite complex, because it has to deal with all sorts of things like Windows lying about which adapter is connected to a display, displays that might need to wake up before capture, transient capture failures due to the UAC Secure Desktop or Winlogon, emulated displays that can only capture black frames, etc. Given that Steam also complains about not being able to find a display in this scenario, it's very possible that our logic there is tripping up and doing a lot more waiting or retrying than it needs to.
I am using Hyper-V GPU-PV with Win11 VMs running the latest version of Sunshine. I use multiple different FireTV streaming sticks as my Moonlight clients, which all seem to suffer from this timeout more so than my Android phone FWIW. I also noticed that the problem is consistently worse if the VM is still at the login screen (takes 5 attempts rather than 2). It seems to me that it would be great if the Moonlight client just obeyed the timeout value set in Sunshine -- which I have set to 30s -- rather than timing out sooner. But maybe I'm not understanding the purpose of the timeout setting on the server.
@pcl04dl3tt3r Have you tried the latest pre-release of Sunshine? The bug that delayed the first frame has already been fixed.
I tried it and it worked. I have no disconnect anymore. It stills needs the time until the desktop is shown, but as I understand this is can't be fixed and is normal, right?
I'm not sure that the delay is unfixable, but nobody has found a way to fix it as of today.
A few points about this issue:
I probably had more to note but this was just some late night fast typing with bad grammar, I will add more later if I can. It's also worth noting that in these tests I connected to Sunshine through a Hyper-V Internal switch, so any delay has nothing to do with router weirdness.
The issue itself is well-described here in comments. All credits go to nenkoru. For some configurations it takes longer than 10 seconds to start receiving stream. I experienced it on Hyper-V machine as well. First few connection attempts always time out. Increasing FIRST_FRAME_TIMEOUT_SEC solves the issue. It looks like it takes about 15 seconds for my particular configuration to properly start. I believe changing timeout to at least 30 seconds looks reasonable.