ishitatsuyuki / LatencyFleX

Vendor agnostic latency reduction middleware. An alternative to NVIDIA Reflex.
Apache License 2.0
797 stars 20 forks source link

Less invsive option that works in all games? #4

Open Atemu opened 2 years ago

Atemu commented 2 years ago

AFAICT, the methods LatencyFleX uses are invasive, sometimes require support from the game and need to be adapted for each game individually.

Would it be possible to add a universal uninvasive method that works for basically all games but is possibly less effective and/or efficient?

I don't know if what I'm describing is feasible at all but would it be possible to use libstrangle and the like to dynamically limit FPS throughput based on some metrics to keep the latency-inducing queues short?

The most basic form of this would be manually limiting FPS in a relatively static scene such that the GPU is no longer the bottleneck (target usage of <95% or a similar other metric). If this happened automatically and adapted to changing workloads, it could offer significant latency savings without any special sauce integration.

ishitatsuyuki commented 2 years ago

This is a good question. I've received similar question from others before, and I'll write down some of the arguments I have made and some of the ideas other have suggested.

AFAICT, the methods LatencyFleX uses are invasive, sometimes require support from the game and need to be adapted for each game individually.

I'm not sure if "invasive" is an accurate word for that (given that the integration step is literally a callback in the game loop), but you are right that LFX cannot be done solely within the driver.

Historically, driver-level limiters have suffered from giving poor results. Both AMD and Nvidia has attempted their own solution (Anti-Lag, Ultra Low-Latency), but my impression is that they are typically useless when you really need that reduction:

Results from Gamers Nexus

A point that I need to note is that limiting FPS through any external tool is more or less ineffective in terms of latency. Basically, the frame limiter then just becomes the latency-inducing bottleneck. If you want to see some benchmarks, Battlenonsense has an video on this.


Let's move on discussing potential solutions. One way is to put delay inside the first graphics API call that happens: for Vulkan, either delay any call after vkQueuePresent (a bit bogus), or delay vkAcquireNextImageKHR (more reliable, but not all engines acquires in the beginning of frame). This is more effective than delaying present, which practically does not make a difference in terms of latency compared to a GPU-bound case.

Another idea to try is to hook the input APIs directly. Remember, we need to put an delay before the input is polled, anything after that just becomes a part of input lag. So by putting a sleep right before feeding input to the game basically achieves the same effect as hooking into the main loop. This is however highly dependent on how the engine polls input; primarily, there are a lot of cases where input might be polled from locations that are not the beginning of the game loop. Overall, this is a thing that I haven't investigated yet, and I look forward to dig into this in the future.

Atemu commented 2 years ago

I'm not sure if "invasive" is an accurate word for that (given that the integration step is literally a callback in the game loop), but you are right that LFX cannot be done solely within the driver.

With "invasive" I'd define anything an anticheat would deem suspicious. Injecting something into the gameplay loop would definitely be in that category.

A point that I need to note is that limiting FPS through any external tool is more or less ineffective in terms of latency. Basically, the frame limiter then just becomes the latency-inducing bottleneck. If you want to see some benchmarks, Battlenonsense has an video on this.

This is true on windows but I don't think it is on Linux? Using the framerate limiter configurable via goverlay/mangohud, I do not notice any additional latency.
Even if there was, it'd be a lot better than GPU-bottleneck-induced latency so it'd still be a win.

There are also other potential ways of limiting FPS like limiting CPU resources the process has access to; intentionally causing a CPU bottleneck in the game which is the best way to eliminate latency I know of.

by putting a sleep right before feeding input to the game basically achieves the same effect as hooking into the main loop. This is however highly dependent on how the engine polls input; primarily, there are a lot of cases where input might be polled from locations that are not the beginning of the game loop. Overall, this is a thing that I haven't investigated yet, and I look forward to dig into this in the future.

Given that the other methods would all have to be adjusted on a per-game basis, that doesn't sound too bad. Though not really what I had in mind with this issue.

ishitatsuyuki commented 2 years ago

This is true on windows but I don't think it is on Linux? Using the framerate limiter configurable via goverlay/mangohud, I do not notice any additional latency.

They work literally the same way. You still get at least 1 extra frame of delay compared to an in-game frame limiter.

Even if there was, it'd be a lot better than GPU-bottleneck-induced latency so it'd still be a win.

Not really. I've already said that it becomes a bottleneck very similar to GPU bottleneck, because both of them are at the end of the pipeline.

ishitatsuyuki commented 2 years ago

I was informed that Special K has support for injecting Reflex support into any game.

There appears to be two mode:

If you have any particular game that needs latency reduction but doesn't support Reflex, I recommend you to give a try. The DXVK-NVAPI integration should be sufficient to support it, but let me know if there's any issue.

ishitatsuyuki commented 2 years ago

I gave a try on Special K, and it looks like it has numerous compatibility issues with Wine (notably, NVAPI wouldn't activate unless you patch out certain checks).

I eventually got it to work, but it looks like the input hook doesn't work really well, so if I were to implement it I need to do my own take on heuristics. I guess hooking PeekMessage will probably work well — a lot of game engines poll this until it returns no more messages in and only in the main loop.

Ph42oN commented 2 years ago

Have you heard dxvk.conf tweaks to reduce buffering? I run games with dxgi.maxFrameLatency=1 and dxgi.numBackBuffers=1, with them it feels like i get lower input lag. This should work in every game that runs on dxvk (in case of dx9 games replace dxgi with d3d9).

Also would be nice to know if and how much LatencyFleX can improve over that, i tried using it on Quake Champions but i could not feel any difference, it does have reflex support but as its not listed as supported im not sure if it was working really.

Edit: Ok it is working on Quake Champions, i checked dxvk-nvapi log, not sure if it worked on previous testing. I noticed some weirdness on fps with it enabled, when fps drops it may go back up slowly like there was some smoothing. This makes it really not good to use in this game. With settings i normally use i cant feel improvement but with heavier settings to be GPU bottlenecked i think there is latency improvement.

ishitatsuyuki commented 2 years ago

The DXGI tweak works, but driver level tweaks have more or less attempted that with very much mixed results (see the GN benchmarks above). I don't really recommend counting on gut feeling to measure latency, they are prone to placebo effect and the feeling can be easily affected by your mood.

LatencyFleX can improve against a driver-level limiter by typically 1-frame worth of latency. If you have set up it in a Reflex-capable game but didn't get it to work, there could be two reasons:

  1. The main thread CPU processing is the bottleneck. (There are no latency to add in this case)
  2. LFX is not correctly set up. Try enabling DXVK-NVAPI logging (https://github.com/jp7677/dxvk-nvapi#tweaks-debugging-and-troubleshooting), and check if calls to NvAPI_D3D_Sleep and NvAPI_D3D_SetSleepMode are succeeding.