ishitatsuyuki / LatencyFleX

Vendor agnostic latency reduction middleware. An alternative to NVIDIA Reflex.
Apache License 2.0
797 stars 20 forks source link

Microstutter fix suggestion: wake early and spin. #15

Closed YellowOnion closed 2 years ago

YellowOnion commented 2 years ago

Looking through the code I notice you wait for the OS scheduler to wake on time and then begin the frame.

The problem with this model is the accuracy of the OS thread scheduler.

Blocks the execution of the current thread for at least the specified sleep_duration. This function may block for longer than sleep_duration due to scheduling or resource contention delays. The standard recommends that a steady clock is used to measure the duration. If an implementation uses a system clock instead, the wait time may also be sensitive to clock adjustments.

I would suggest we wake early using some sort of bias and then spin checking the clock until we meet the deadline.

I'm just setting up a dev environment now. :)

ishitatsuyuki commented 2 years ago

No, this does not matter on Linux. A typical hrtimer-enabled system has a timer accuracy of 50us (bounded by the default timer slack) which is more than sufficient.

The microstutter comes from the nature of algorithm itself, not from the implementation.

YellowOnion commented 2 years ago

You're forgetting the OS scheduler is under no obligation to schedule the thread, you're conflating a timer with a preemptively multi-tasking OS on a CPU.

the kernel tunables: sched_latency_ns and sched_min_granularity_ns are more applicable here.

ishitatsuyuki commented 2 years ago

Sure, if you want to talk about hypothetical things, the OS scheduler can delay waking up a thread for any reason.

But we're talking about cases where CPU headroom almost always exist. If you're completely CPU bound, LFX will likely provide no benefit. When another CPU is idle CFS will happily migrate a thread rather than delaying its wakeup; sched_latency_ns is by no means relevant since that only applies when another thread is trying to preempt the current thread.

And I have empirically confirmed these through measurements. I did tons of measurements during the development of LFX to find the algorithm's shortcomings and bottleneck. And wakeup latency was never anything large enough to be relevant.

YellowOnion commented 2 years ago

Did you try benchmark with say OBS running in the background using software encoding?

This definitely ruins my frame times.

ishitatsuyuki commented 2 years ago

OBS can ruin frame times for games in general, not only when using LFX. Does the behavior improve if you disable LFX?

ishitatsuyuki commented 2 years ago

Another suggestion would be putting OBS to a lower priority, allowing the game's processing (latency sensitive) to preempt encoding (less latency sensitive).

Spinning before sleep is not entirely an unreasonable thing to do, but it's a waste of CPU for those who don't need it, and likely it should be an opt-in thing. It also only helps with scheduling of the main thread only; worker threads also sleep (through condition variables), and since you can't make them spin, allowing preemption likely does a better job here.

The amount of wakeup latency can be obtained from the difference between wakeup and target here. If you're seeing values significantly larger than 50us, then maybe we can consider adding a spinning-based option.

YellowOnion commented 2 years ago

Oh I'm looking at solutions that improve micro-stutter and latency for situations like recording, I'm already running vkcapture and its' helped a lot, but a solution like RTSS scanline sync for my shitty 60hz monitor would be great.

I'm still trying to figure out how to compile and use the Vulkan overlay for NixOS.

[17/19] Compiling C++ object liblatencyflex_layer.so.p/latencyflex_layer.cpp.o
FAILED: liblatencyflex_layer.so.p/latencyflex_layer.cpp.o
g++ -Iliblatencyflex_layer.so.p -I. -I.. -I../.. -I../subprojects/funchook/include -Isubprojects/funchook/__CMake_build -I../subprojects/funchook/__CMake_build -I../subprojects/funchook/distorm/include -Isubprojects/funchook -I../subproje
cts/funchook -I/nix/store/0f1xnp9rph4m88bqr10ag6qhavz2wyhv-vulkan-headers-1.3.211.0/include -fvisibility=hidden -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wnon-virtual-dtor -std=c++17 -g -fPIC -pthread -MD -MQ liblatencyflex_layer.so.p/latencyflex_layer.cpp.o -MF liblatencyflex_layer.so.p/latencyflex_layer.cpp.o.d -o liblatencyflex_layer.so.p/latencyflex_layer.cpp.o -c ../latencyflex_layer.cpp
../latencyflex_layer.cpp:28:10: fatal error: vulkan/vk_layer_dispatch_table.h: No such file or directory
   28 | #include <vulkan/vk_layer_dispatch_table.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
ishitatsuyuki commented 2 years ago

The file is from the validation layers actually, so see if you can grab the includes from that.

YellowOnion commented 2 years ago

@ishitatsuyuki chur, after a bit of searching to figure out how to get the layer to load it seems to "work" (I get log messages), Do I need the wine layer if I'm using it from a non-reflex enabled game?

ishitatsuyuki commented 2 years ago

Which game? LFX does not work at all, even not as a frame limiter, if your game isn't supported by one of the hooking methods (Reflex or mods).

YellowOnion commented 2 years ago

Deep Rock Galactic.

proton log shows:

LatencyFleX: module loaded
LatencyFleX: Version v0.1.0-2-g589afdb+
LatencyFleX: setting target frame time to 16666666

But I'm still getting 250fps.

Oh I guess I need to patch the game? Any way to do that with a Windows only game? the game has official modding so I wonder if we can ship a mod for it.

ishitatsuyuki commented 2 years ago

The Wine extension forwards call to LFX APIs to the Unix side. See the unity mod for an idea of what you should be hooking in the game.

https://github.com/ishitatsuyuki/LatencyFleX/blob/master/layer/unity/Plugin.cs

YellowOnion commented 2 years ago

Arg, looks like I need to download the UE4 engine to make a mod for the game.

Oh well thanks for the help!