Open isundaylee opened 2 years ago
Nice analysis. Since the delay was present when the packets were sent by the host PC, I'm going to move this to the issue tracker for Nvidia's bugs.
For next steps, I would suggest looking at DPC latency and/or using Windows Performance Analyzer on the host PC to try to identify what is preventing nvstreamer.exe from sending the frame out on schedule.
Thanks for the suggestions! Yeah, I agree that this sounds more like an NVIDIA driver issue. I tried out the DPC latency tool, but it didn't flag any issues. I also tried Windows Performance Analyzer with some progress but mixed success. Note that I mostly work with UNIX systems nowadays and am not super familiar with Windows, so let me know if any of my analysis below sounds incorrect. I've attached the WPA trace below.
First I noticed that this issue actually repros when I'm not running any games (i.e. just on the desktop/displaying some other program with static screen output), with arguably even more regularity. e.g.:
02:54:18 - 2022/08/14-16:38:43,443 - SDL Info (0): Frame minTicks 11 maxTicks 36 count 60 slow20/25/30 = 7 / 4 / 3
02:54:18 - 2022/08/14-16:38:43,722 - SDL Info (0): Slow frame 624772 elapsedTicks 26
02:54:18 - 2022/08/14-16:38:44,055 - SDL Info (0): Slow frame 624792 elapsedTicks 27
02:54:19 - 2022/08/14-16:38:44,472 - SDL Info (0): Slow frame 624817 elapsedTicks 32
02:54:19 - 2022/08/14-16:38:44,472 - SDL Info (0): Frame minTicks 9 maxTicks 32 count 61 slow20/25/30 = 7 / 3 / 1
02:54:19 - 2022/08/14-16:38:44,666 - SDL Info (0): Slow frame 624829 elapsedTicks 27
02:54:19 - 2022/08/14-16:38:44,755 - SDL Info (0): Slow frame 624834 elapsedTicks 25
02:54:19 - 2022/08/14-16:38:44,905 - SDL Info (0): Slow frame 624843 elapsedTicks 31
02:54:19 - 2022/08/14-16:38:45,021 - SDL Info (0): Slow frame 624850 elapsedTicks 31
02:54:19 - 2022/08/14-16:38:45,189 - SDL Info (0): Slow frame 624860 elapsedTicks 27
02:54:19 - 2022/08/14-16:38:45,272 - SDL Info (0): Slow frame 624865 elapsedTicks 26
02:54:20 - 2022/08/14-16:38:45,474 - SDL Info (0): Frame minTicks 9 maxTicks 31 count 61 slow20/25/30 = 11 / 6 / 2
02:54:20 - 2022/08/14-16:38:45,639 - SDL Info (0): Slow frame 624887 elapsedTicks 32
02:54:20 - 2022/08/14-16:38:45,919 - SDL Info (0): Slow frame 624904 elapsedTicks 25
02:54:20 - 2022/08/14-16:38:45,945 - SDL Info (0): Slow frame 624905 elapsedTicks 26
02:54:20 - 2022/08/14-16:38:46,225 - SDL Info (0): Slow frame 624922 elapsedTicks 25
02:54:21 - 2022/08/14-16:38:46,492 - SDL Info (0): Frame minTicks 10 maxTicks 32 count 60 slow20/25/30 = 12 / 4 / 1
02:54:22 - 2022/08/14-16:38:47,492 - SDL Info (0): Frame minTicks 10 maxTicks 22 count 60 slow20/25/30 = 11 / 0 / 0
02:54:23 - 2022/08/14-16:38:48,495 - SDL Info (0): Frame minTicks 10 maxTicks 22 count 60 slow20/25/30 = 10 / 0 / 0
Now onto my test session. To start with, here's the thread map of nvstreamer.exe
:
Looking at the networking events in WPA under System Activity -> Generic Events -> Microsoft-Windows-Networking-Correlation
events, seems like thread 21752 is the one sending out video data packets.
If I go to Computation -> CVPU Usage (Precise) -> Timeline by Process, Thread
, I can see the uneven gaps between thread 21752 time slices (longer ones being ~30ms):
This 30ms can be explained by the Waits (us) Max
column. If I look at the thread stack, it's in _Cnd_timedwait
. Seems like this thread is just blocked for 30ms waiting on a conditional variable (maybe for frame submission from another thread)?
I tried to see if I can identify the other thread that sets the cond var. From the timeline graph in the top half of the image, thread 5964 seems pretty closely correlated. Here's the stacks from that thread:
Note that for thread 5964, the wait samples that start in nvEncodeAPI64.dll
finish consistently ~10ms. However, the wait samples that start in StreamServerShared.dll
can block it for up to 20ms. These waits seem to correlate well to a lot of activity in thread 22712 (the selected row in the upper timeline graph in the image below):
This is where I'm currently at. Unfortunately there's no symbol info for code inside the NVIDIA driver. I tried setting it up to use debug symbols from https://developer.nvidia.com/nvidia-driver-symbol-server, which gave slightly better stack information, but still no symbol names (it says "Please note, we are only hosting the driver binaries, no PDBs will be distributed on this server." on the website, maybe that's why?)
Not sure how helpful this information is. But figured I'd post an update here, in case anyone has any further thoughts.
WPA trace: https://github.com/isundaylee/moonlight-qt/blob/my-custom-changes/troubleshooting/wpa_trace.zip
Ah, another piece of clue: tons and tons of SetPriority
calls from that thread 22712:
SetPriority
calls above might be a red-herring. I took another trace with more event sources enabled: during the time of a frame delay, there are a lot of DxgkEscape/DdiEscape
events:
Looks like a mechanism for DirectX kernel to request some info from user-mode driver? https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/d3dkmddi/nc-d3dkmddi-dxgkddi_escape
Edit: Read a bit more on this topic, seems like this is an iotcl
-like interface for implementing/requesting vendor-specific functionalities. Trying to see if I can somehow dump out the data here and see what commands are being sent. My current best guess is that these are actually "polling" commands waiting for the next frame, so is probably a symptom of the issue rather than the cause?
Another graph showing thread 15024 (another thread that looks relevant) as well:
Describe the bug I'm experiencing occasional microstutters when streaming game content from my PC desktop to my Mac client. The screen freezes very briefly causing a observable stutter before returning to normal. No frame drop is detected in the stats overlay. See below in the "Additional context" section for some troubleshooting I've tried so far.
Steps to reproduce This happens every once in a while (from couple seconds to couple minutes) when I stream game from my PC to my host.
Affected games
Moonlight settings
Client PC details
Server PC details
Moonlight Logs (please attach)
Attached. Log contains some additional debug information I added, including slow frame logs (https://github.com/isundaylee/moonlight-qt/tree/my-custom-changes).
Also attached two packet captures (on host and client respectively) that cover a subset of time period in the log. See "Additional context" below for more info.
moonlight.log pcaps.zip
Additional context
The game itself is FPS-locked to 60, with the MSI Afterburner perf overlay showing a steady 60FPS when this happens. GPU usage is between 40-50% and CPU usage is between 10-20%. Hardware-accelerated GPU scheduling is turned off (though this repros when it's turned on too).
I tried the following to debug this issue:
I added some debug logging in
FFmpegVideoDecoder::submitDecodeUnit
to detect/alert on frames that arrive later than the expected 16-17ms since the previous frame. The microstutter I observe corresponds pretty well to the slow frames detected/logged. e.g.:^ Note that
elapsedTicks
is 32 which is much higher than the expected 16-17 for 60fps.When this happens, Moonlight still receives a steady 60-61 fps. So this doesn't look like frame dropping - more like weird frame pacing.
To exclude networking as a potential issue, I took two packet captures simultaneously on my host (
pc.pcap
) and my client (mac.pcap
). I'm able to observe the same higher-than-normal delay between frames in both packet captures, which suggests that the frame delay already exists when the host sends out the data packets.e.g., this line in the log:
corresponds to the 0.031 seconds jump between the frame comprising of RTP seq nums 15879-15882 and the frame starting on RTP seq num 15883 (for port 47998). You can see it in
pc.pcap
by comparing the timestamps of packet 12317 and packet 12333, and inmac.pcap
by comparing the timestamps of packet 14072 and packet 14088. I also confirmed that these two frames have sequentialframeIndex
(as defined inNV_VIDEO_PACKET
), so no frame dropping.Note on the packet captures: There's some timestamp drift between the server and the client, but the timestamps in Moonlight log corresponds pretty well to the timestamps in
mac.pcap
.Any potential idea/suggestion is appreciated!