Wishlist: Frame Pacing Subsystem

TylerGlaiel commented 4 months ago

So this is something I've personally spent a lot of time dealing with in my engine that I feel would greatly benefit from having a simple standardized solution baked into SDL just due to the sheer amount of non-obvious edge cases and detail involved, and how this seems to be a problem that every game has to solve on its own despite rarely needing any game specific behavior.

By frame pacing I mean measuring time between frames, keeping "fixed updates" coming at a specified update rate (ex, 60hz), handling anomalies in frame timing (rounding errors / edge cases / accuracy errors when vsync is involved / timing hiccups), calculating interpolation values for rendering (ex if the fixed update rate is 10ms per frame, and a render/vsync occurs at 12.5ms, then the interpolation value here would be 0.25 so you can interpolate or extrapolate your game state by 25% to keep a smooth framerate)

I am aware of #8247 but as far as I can tell that is just an alternate way to receive SDL events and doesn't actually handle the timing stuff I'm talking about here

I'm not sure exactly what the API for this should look like but what I would want is

some way to specify a "target fixed update rate" (ex 60hz) (possibly more than one?)
some function that you can call to have SDL accumulate time, or bake this into calls to PumpEvents
some way for SDL to report back that the application should do a fixed-rate update ("tick the game state"), render a frame, or present a frame, with delta_times and "frame interpolation values" reported for each. (could do it with SDL events, or by registering callbacks)

It's significantly more complicated than simply measuring the time between frames and reporting a deltaTime, at least if you want a production quality solution here. Handling cases like, ex a monitor being 59.94hz instead of 60z resulting in timing drift that once in a while causes a double-update if the game expects a 60hz framerate, measuring times on vsynced monitors having +/- a small amount error that can result in a choppy experience when it randomly decides to double update and then skip an update, and a lot more. I have a lot of notes on this and the ways I've fixed this in my engine, so if this is a feature you're willing to entertain for SDL I'd be happy to share most of that

slouken commented 4 months ago

Sure, why don’t you share it here and at the very least people who are looking for a solution will be able to see how you solved it.

TylerGlaiel commented 4 months ago

Sure, I have this sample code public, though its a bit outdated (ex I haven't updated it since SDL3 now reports display mode refresh rates as floats instead of ints) https://github.com/TylerGlaiel/FrameTimingControl/blob/master/frame_timer.cpp

the sort of main insight here is that when measuring time between frames, if the time measured is "about the same as the monitor refresh rate, or a multiple of it", to snap the measured time to exactly that amount (ex assume the timing was governed by vsync and so trust that vs SDL_GetPerformanceCounter which can have a bit of error / variance)

other aspects here are somewhat standard, clamp the measured time so it never goes above a certain "minimum framerate" (if the game freezes for 1 second we do not want to do 60 updates at once to catch up), averaging out timing spikes across a few frames, and having the ability to manually "resync" the accumulator after an expected hitch (like a loading screen)

Oh also the option to specify "update multiplicity" which basically forces updates to come in multiples so you can lock the framerate to a "steady 30" instead of having it be choppy, when vsync is disabled

A decent amount of this could be a lot simpler if you could detect whether the game is actually vsynced, unfortunately it seems like graphics driver settings can override this and there doesnt seem to be an easy way to tell if its doing that or not, aside from measuring times

TylerGlaiel commented 4 months ago

also with SDL3 reporting monitor refresh rates as floats instead of ints now, it might be the case that you would want to pretend a 59.94hz monitor is actually 60hz so you can get a 1 update-per-vsync without any hitches or drifting, at the expense of the game running 0.01% slower. This would be desirable if your game is not multiplayer and if your target update rate matches the monitor refresh rate

Rendering interpolated game states also results in the rendered state having up to 1 frame of latency from the current game state. This is only necessary if you have a mismatch between your update rate and monitor refresh rate (or if vsync is off, or if gsync/freesync is on), in the case that your monitor matches your games update rate, you can do a simple update/render/display loop without interpolation (and reclaim that small amount of latency).

Its a lot of pedantic detail that every game has to deal with at some point, hence why it would be nice to have some actually standardized solution here

thatcosmonaut commented 4 months ago

The refresh rate stuff in particular keeps me up at night and I would love to have a standardized sane implementation of frame pacing. As mentioned, literally every game needs this and it's very easy to get wrong.

kg commented 4 months ago

the sort of main insight here is that when measuring time between frames, if the time measured is "about the same as the monitor refresh rate, or a multiple of it", to snap the measured time to exactly that amount (ex assume the timing was governed by vsync and so trust that vs SDL_GetPerformanceCounter which can have a bit of error / variance)

This "snap to vsync if possible" behavior is what we shipped in Escape Goat 2, with some guardrails (if the actual measured framerate goes too high or too low, indicating that for some reason either vsync is broken or we're lagging) to turn snapping off, and we never got any complaints about it (except from speedrunners who noticed that IGT and wall clock time would vary slightly depending on their hardware - we made it optional). It was complex enough to implement that it is definitely something that is best done at the SDL level where you already know what display the window/swapchain are presenting to, what its refresh rate is, etc.

It may also be worth thinking about how this would eventually impact the emscripten port of SDL though, since the browser exposes way less information and control over things like frame pacing. The only primitive you really have is 'request animation frame', which will give you a callback Eventually, and if you call it repeatedly you will ideally get something close to 1 callback per vsync. But you don't have any guarantee that it won't skip frames, and you can't query what the actual display refresh is, and the tab containing your game might get dragged from 120hz-monitor-A to 60hz-monitor-B while running without you finding out about it.

slouken commented 4 months ago

Makes sense. So what would you think a good API would look like here?

TylerGlaiel commented 4 months ago

I just getting ready to post this as a desired API, heh. rough draft. I am unsure about how this interacts with the SDL event system (if you do game updates or renders from within the event loop, does that potentially conflict with anything?), but this would make the most sense to me, if we wanted it to fit in with existing SDL features instead of requiring a second event-loop style thing for just frame pacing. I put all the flags and features I currently use, there might be more configurable values that people would want, and a number of these could be made more specific with hints

//Functions:
SDL_Init(SDL_INIT_FRAMEPACING);
SDL_FramePacing_SetUpdateRate(float update_rate, SDL_FramePacingFlags flags);
SDL_FramePacing_SetTimescale(float timescale); //measured time each frame gets multiplied by timescale, after snapping/smoothing adjustments (this is useful for debug purposes). If timescale is set to 0, you can basically enable "frame stepping" where each call to SDL_FramePacing_Sync() will advance the state by one update. 
SDL_FramePacing_Sync(); //the next frame should be updated/rendered with the target delta time, regardless of whatever time was measured. reset accumulator to 0. call this after a load that might hitch the game, to prevent the frame pacing subsystem from trying to "catch up"
SDL_FramePacing_SetMinimumFramerate(float max_framerate); //default = 10? if measured frametime is less than the minimum framerate, clamp it to this minimum. Basically if the game is rendering slower than this, then actually slow down the game instead of trying to catch up. 
SDL_FramePacing_SetMaximumFramerate(float max_framerate); //default = 0. if non-zero, automatically Sleep during calls to SDL_GL_SwapBuffers/SDL_RenderPresent to prevent the game from rendering faster than the framerate cap. this is unrelated to "MinimumFramerate"
SDL_FramePacing_SetUpdateMultiplicity(int update_multiplicity); //default = 1. fixed updates should be issued in groups of update_multiplicity, ex if UpdateRate is set to 60, and multiplicity is set to 2, then the game should issue 2 updates x 30 times per frame, instead of 1x60. this basically lets a user configure "give me a smooth 30 instead of a choppy 50"

//if the frame pacing subsystem is initialized then time should be accumulated in the call to SDL_GL_SwapBuffers/SDL_RenderPresent, immediatelty after the vsync. Events should be issued at the end of the normal event queue, however (all keyboard / gamepad / window events should happen before updates / renders are initialized)

//Events
SDL_FramePacingEvent {
    //common stuff

    //SDL_EVENT_FRAMEPACING_FIXED_UPDATE      //delta_time should be the same every time this one is issued, exactly what is specified by SetUpdateRate. frame_percent is 0/unused
    //SDL_EVENT_FRAMEPACING_VARIABLE_UPDATE   //issued once per cycle (unless interlacing is enabled). delta_time is the adjusted measured time between frames. frame_percent is same as render
    //SDL_EVENT_FRAMEPACING_RENDER            //issued once per cycle (unless interlacing is enabled). delta_time is the adjusted measured time between frames. frame_percent is the t between the previous update and the next update, to be used as a value to render interpolated/extrapolated frame states

    double delta_time; //these should be tracked as int64 internally, so maybe they should be reported as int64 here as well (possibly to keep it consistent with timestamp). double seems like the most common case tho
    double frame_percent;
};

//SDL_FramePacingFlags:
SDL_FRAMEPACING_DEFAULT //a set of reasonable defaults appropriate for most games
SDL_FRAMEPACING_DRIFT_ALLOWANCE_LOOSE //Default, enables vsync snapping within a threshold, at the expense of the game timer being allowed to drift from real time a very small amount
SDL_FRAMEPACING_DRIFT_ALLOWANCE_STRICT //disables vsync snapping, in game time should exactly match real time. *some* care on the implementation side still needs to be taken care to avoid measurement errors causing stuttering
SDL_FRAMEPACING_UNLOCKED_FRAMERATE //Default, decouples render/variable update from update, Render is issued with a "frame_percent" between 0 and 1 to allow for interpolation / extrapolation of game states. If vsynced at the target UpdateRate, this can behave like SDL_FRAMEPACING_LOCKED_FRAMERATE instead
SDL_FRAMEPACING_LOCKED_FRAMERATE //fixed and variable update are coupled together, Render is issued with a frame_percent of 1 to indicate display the most recently rendered frame. possible to avoid rendering at all in this case, if you can "repeat frame" in SDL_GL_SwapBuffers/SDL_RenderPresent
SDL_FRAMEPACING_INTERLACE_VARIABLE_UPDATES //If enabled, each Fixed Update will be followed by a Variable Update of the same delta_time. Before render, one additional Variable Update is issued with the remaining delta_time and frame percent reported. If disabled, one variable update per frame will be issued, before Render, with the total delta time. This flag might not be necessary, as its not that hard to implement this in user space if wanted, though its a nice conience to have taken care of by the frame pacing subsystem instead

there's a lot of neat stuff you could do with this system in place, ex you could launch an SDL game in a "headless mode" by not issuing rendering events, and you should actually be able to do "replays" in a much easier / trivial way by saving every event and just re-issuing them in the same order to play a replay.

slouken commented 4 months ago

Could you put together a little test case to demonstrate how this would be used?

kg commented 4 months ago

I think it's worth specifically calling out that the event ordering is important for a system like this to work well - you want to dispatch all input related events before dispatching any update events, and you want to dispatch any update events before render events.

Timing measurement is also nuanced - you want to correctly handle the following scenarios:

Updates are too slow to hit consistent 60, but rendering is fast. In this case trying to "catch up" by updating multiple times will just put you in a hole
Updates are very fast, but rendering can't hit consistent 60. In this case you want to "catch up" with multiple updates.
Responses to some other event are very slow, but updates and rendering are fast. In this case you might want to coalesce input events like mousemoves to reduce the amount of event processing overhead, but critically you wouldn't want to treat this as "rendering is too slow" OR "updates are too slow", since throttling rendering or updates won't fix this scenario. (I don't think SDL should fix this)
Everything is happening too slowly, because the system is under load. In this case you'd probably want to behave as if rendering is too slow, and try to at least keep updating at the target rate. But you may fall behind, in which case you have to act like in the first scenario - catching up with multiple updates will put you in a hole here too.

slouken commented 4 months ago

I don't know if using events is the right model here, especially since you might get input events interleaved with the update and render events, and the update and render events could sit in the queue unprocessed, which would throw off all the timing you're trying to do.

slouken commented 4 months ago

Maybe this makes sense as part of the new main callback model in SDL3?

TylerGlaiel commented 4 months ago

maybe, or maybe instead of events you just do

SDL_FramePacing_DoFrame (fixed_update_callback, variable_update_callback, render_callback);

and call that with some appropriate function pointers after you process events

I'm not sure I would want it to be exclusively to the main callback model since the docs say that should be optional, but it might be appropriate to have a way to do this from within that system too

kg commented 4 months ago

maybe, or maybe instead of events you just do

SDL_FramePacing_DoFrame (fixed_update_callback, variable_update_callback, render_callback);

and call that with some appropriate function pointers after you process events

This would probably be more compatible with the browser model, where you ask to render and get a render callback At Some Point, though it poses some safety issues since the developer now has to be able to handle the render callback getting fired at any point in the future. SDL would need to behave consistently (probably assert or ignore the call) in the scenario where DoFrame is called re-entrantly, etc.

An ideal frame pacing loop will sleep when there's a long time before the next update/render operation, and then wake for either the next update or the next waiting input event. In FNA we have a carefully constructed main loop that looks at how long we have left and does a syscall sleep calibrated to never sleep Too Long - if memory serves we go 'okay, observed sleep precision is 3ms, so perform an alertable sleep for timeleft-3 ms, to ensure we don't wake up too late, then spin'. This is something SDL might not be able to provide for the user but it would be cool if SDL could somehow provide a primitive for this kind of 'smart sleep' as well.

TylerGlaiel commented 4 months ago

An ideal frame pacing loop will sleep when there's a long time before the next update/render operation, and then wake for either the next update or the next waiting input event.

are you doing this after render / before present or are you measuring how long update/render takes and sleeping before update based on how long it "usually takes"?

kg commented 4 months ago

An ideal frame pacing loop will sleep when there's a long time before the next update/render operation, and then wake for either the next update or the next waiting input event.

are you doing this after render / before present or are you measuring how long update/render takes and sleeping before update based on how long it "usually takes"?

The correct time is usually after present, if it finished too early. (If vsync is on, it won't finish "too early"). The NVIDIA reflex "low latency" model is instead to sleep longer based on how long the last update+render pair took in order to reduce latency, but that feels way out of scope for SDL in all possible worlds.

Personally, my game starts its next update while rendering of the previous frame is in process on a worker thread, for higher throughput (I don't care about the extra ~16ms of input latency from this). So SDL doing this smart sleep wouldn't do a ton for me personally, but the 'spin after present until it's time for the next frame' model is extremely common in games, so SDL doing it properly with a sleep syscall would reduce power usage for people on laptops and steam decks.

TylerGlaiel commented 4 months ago

ok so in my wishlist API that would be how SDL_FramePacing_SetMaximumFramerate is implemented then, if that behavior is desired set the max framerate to the target framerate

TylerGlaiel commented 4 months ago

Could you put together a little test case to demonstrate how this would be used?

I can probably put together a sample app this weekend, or earlier if I get done with workstuff early

TylerGlaiel commented 4 months ago

I got done with workstuff early, here's a sample app demonstrating frame pacing

most frame pacing code here was copy pasted from my game engine and just edited slightly to fit the proposed API, there would be a bunch more needed internally here than I have

in this app click to toggle vsync to see how it looks the same regardless of vsync on or off. I set update rate to 17 here to show it should look smooth regardless of update rate, when done this way

https://gist.github.com/TylerGlaiel/b1d424b0ad90fd374a3402b2873983da

TylerGlaiel commented 4 months ago

even testing this more thoroughly it seems like calls to SDL_GetPerformanceFrequency immediately after a vsynced SDL_GL_SwapBuffers can be off by up to ~1ms on my machine, presumably because of OS scheduling stuff, which is a bit beyond the threshold I was using for vsync snapping so the time sometimes drifts. its a continuously difficult problem. I think I need to average out delta times before doing vsync snapping just smooth out that measurement error. But also if there's an anomalous frame it might not be desirable to count that in the steady-state average. I probably need 2 separate averages, one to smooth out measurement/scheduling error (before vsync snapping) and one to smooth out spikes (after vsync snapping) ... which makes me even more want this to be done at the SDL level considering how every single time I look at this problem theres more to it...

There's a secondary issue here where my monitor is 143.963hz according to windows, but SDL3 is still reporting that as 144hz in its display info (even though SDL3 reports this as a float now)

Lzard commented 4 months ago

I've experimented frame pacing in several contexts (SDL, Godot, web, Löve2D), and I've found it useful to make a clear distinction between:

the frame rate and frame period, measured by PerformanceCounter,
the refresh rate and refresh period, given by the monitor's infos,
and the game update rate and game update period, defined by the user.

All three of them give a different view of the elapsed time:

The frame rate gives the real, computed time, however not strictly in sync with the display time.
The refresh rate gives the time as it is displayed to and perceived by the user's eyes.
The game rate gives the time as the game would like it to be.

This is what I end up doing most of the time:

First of all, get the average frame period over the last X frames:

Measure the time elapsed between the end of two RenderPresent calls.
Store those measures in a ring buffer.
Once the ring buffer is filled, get the average of all the stored times.

Once the buffer is full, the logic to add new values changes a little:

When the new time is significantly longer than the current average period, it might be that one or more frames were missed; this can be verified by dividing that time until it's either approximately equal to the average, or until it's significantly shorter than the current average period, and store that time as many times as it's been divided (e.g. if the measured time is 50 ms, it would result in three ~16.67 ms values stored).
When the new time is significantly shorter than the current average, or when it is significantly longer and dividing it did not result in a time approxiamtely equal, there are three possibilities:
- The window is not v-synced anyway.
- The window is v-synced and late frame swap is enabled; this time will hopefully be lost among the others and have insignificant impact on the end result.
- There is a random latency spike, which may or may not be compensated in the following frames, and hopefully lost with insignificant impact on the end result.

Regarding the monitor refresh rate:

In the best and most likely case on most platform other than Emscripten, the refresh rate is known.
When it is not, either simply use the average period time found, or assume the refresh rate is 240: it will give acceptable results for 240 Hz, 120 Hz and 60 Hz displays, which are fairly common.

With the refresh rate known (or assumed), and with the ring buffer full, there are multiple possibilities:

The refresh period and the frame period are approximately equal; it is the best case scenario, in which it should be assumed that the time between each RenderPresent will be exactly equal to the refresh period, allowing for both the most precise interpolation and easy extrapolation of how much time will elapse before the next render.
The frame period is either significantly shorter or significantly longer than the refresh period; in this situation the logic is the same as for individual frames, with the difference that it cannot be a one frame spike; so when dividing the frame period does not give a result approximately equal to the refresh period, it is certain that the window does not have v-sync (or that every frame is late, which is about the same thing).

With those informations, the game update time can now be estimated, using an accumulator variable:

Increment by the desired game update rate on every frame.
When the accumulated value is strictly superior to the refresh rate, do a game tick and subtract the refresh rate; repeat until the accumulator is equal or inferior to the refresh rate.
- When v-sync is on, use the actual refresh rate as returned by the monitor's info.
- Otherwise, consider the frame rate to be the refresh rate.
To interpolate, divide the accumulator's value by the refresh rate; this will give a value 0 < x <= 1, 1 being the completed state of the tick. This assumes the game's logic is always ahead of the rendered frame, unless the interpolation value is 1.

With this, when the refresh rate and the game rate are equal, the interpolation will always be 1 and both times will be in sync, letting the user enjoy the smoothest experience with the lowest latency; when they differ, the interpolation will stay consistent (e.g. be 0.4, 0.8, 0.2, 0.6 and 1.0, then repeat).

This also allows having multiple update rates running in parallel and by synchronized together, which is sometimes useful for running logic or rendering parts of the screen at different rates.

Resynchronization is done by invalidating part or all of the ring buffer, and resuming the logic once it is full of valid values again. It is needed in the following cases:

When too many values too different from the average have been added to it.
When too many frames have been missed.
When too many updates are needed at once.
When the refresh rate has changed.
When the window starts/resumes rendering, e.g. after the user hid it.

I believe it may be valuable to have the option to "cheat" a little and round up the refresh rates that are 0.1% lower than needed (e.g. 59.94 Hz).

Time drift will also happen (especially with monitors that actually have a refresh rate 0.1% lower), and there are two possibilities about it:

Ignore it. This will lead to the best experience for most players as the game speed will be almost imperceptibily faster or slower; speedrunners however will definitely notice it.
Compare the game time and the real time, and add or skip an update when the difference is too large. This fixes the time drift, but the added or removed frames are definitely noticeable.
Let the library users know and handle it themselves, bringing the advantages of the first situation while giving the opportunity to fix the drift of the in-game times, though the player would still have more or less frames than intended.

I hope my experiments give some useful infos!

TylerGlaiel commented 4 months ago

yeah, that's all very useful, though I think a significant amount of that can basically be recharacterized as "detect if the window is vsynced, while the game is running" and then switch the timing method depending on if it is or not. I'm playing around with it again to see what else comes up.

The thought occurred that the "Frame Pacing" system can basically be broken down into 2 separate subsystems,

Accurate Frame Time Measurements
Update / Render Pacing

The bulk of the complexity here seems to be in "accurate frame timing measurements", given a magic function that does that correctly, the update/render pacing is not too difficult in comparison. So a starting point for SDL might actually just be a simple function, similar to QueryPerformaceCounter, that reports an accurate "time between presents", ex

Uint64 SDL_GetFrameTime(SDL_Window* window);

Which should report an accurate time between the last 2 Presents on the specific window.

If the window is vsynced, it should always report an exact multiple of the monitor's refresh rate
If the window is not vsynced, it should report a smoothed out average of the last few measured times (smoothed out enough to account for timer measurement error specifically, NOT to average out spiky frames)
Over a long period of time, this should not drift from real-time. (this is a desire, not something this timing function should account for, though it may be useful to have a separate function SDL_GetFrameTimeDrift() if it ends up being infeasible to avoid for whatever reason, so that frame pacing can choose if/how to compensate later)
When and how often this function is called should not matter, all relevant timing info should be gathered in the call to Present
In the case where a console or platform actually has a way to accurately get frame times without needing all these measurements and heuristics, the implementation of this function can use that

Additionally, SDL should probably make sure that its actually detecting the correct refresh rate here, and not round to int for the case where a monitor is 59.94 or 143.963 (right now SDL is reporting 144.0 on the display mode for me, when the monitor is 143.963). Time drift will occur if there's a mismatch here. IMO adjustments for this should not be handled in SDL_GetFrameTime, and should instead be handled in update/render pacing (which is where considerations for "allowable frame drift" should be handled)

SDL_GetPerformanceFrequency seems to give a power of 10 for me, so there is inevitable error here when using this to represent frame times. For this subsystem, it might make sense to have a different frequency if reporting times as int64 instead of doubles. (some multiple of common monitor refresh rates, like 1000*240*144). Or SDL_GetFrameTime could report the integer number of vsyncs & the (float) refresh rate in the case where its vsync snapped. That might make the thing too complicated, so maybe we just report times as doubles.

TylerGlaiel commented 4 months ago

Given Uint64 SDL_GetFrameTime(SDL_Window* window); as described above, the "frame pacing" portion of the API could be simplified a ton, as you can then handle a lot of the more user-specific stuff like timescale in user space code instead, and not need that stuff baked into SDL.

So a sample use case for pacing a frame might look like

//in the main loop
while(running){
    ProcessEvents();

    Uint64_t delta_time = SDL_GetFrameTime(window);
    SDL_PaceFrame(delta_time, MyFramePacingInfo);
}

//with frame pacing info being a struct with all the information needed to pace a frame
SDL_FramePacingInfo {
    float update_rate;
    SDL_FramePacing_FixedUpdateCallback fixed_update_callback;
    SDL_FramePacing_VariableUpdateCallback variable_update_callback;
    SDL_FramePacing_RenderCallback render_callback;
    void* userdata;
    //whatever other configurable params are needed here, allowable drift, update multiplicity, minimum/maximum framerate etc
}

Splitting it up like this It also makes it a lot less error prone if you wanted to handle frame pacing yourself, and only rely on SDL for the accurate timing info instead. In that case, it might be desirable to have SDL_PaceFrame be a simpler "reasonable default" that doesn't try to handle the more complex cases (interlacing variable updates and such), and expose a couple of other utility functions to help manually pace a frame (like the cycle-accurate sleep @kg mentioned)

And also an additional benefit of actually working with multiple windows. Not a use case I have, but since SDL supports it, the frame pacing stuff probably should too

TylerGlaiel commented 4 months ago

Here's an updated frame pacing API sample I took out all the complexity of the timing and put the dumb/wrong "just call SDL_GetPerformanceCounter each frame and take the difference" version in there, so theres a baseline to compare a good solution to

https://gist.github.com/TylerGlaiel/7b9ccd6f6402e2663383716c9d4b8fbe

TylerGlaiel commented 4 months ago

While trying to even get an ok sample implementation ready I'm running into issues with #5797. Pacing is smooth in the demo app that uses SDL_Renderer, but if I port it into my actual (OpenGL) project then I have to deal with some weird random pacing issues that seem to result from the compositor (randomly having SwapBuffers wait for 2-3 frames then try to make up for that with some much shorter paced frames, almost like it disables vsync for a few frames to catch up again, though I think this has something to do with how the DWM wants to buffer a few frames at a time. Its somewhat unclear how to handle this)

I have a second issue which is that SDL is not reporting my monitor refresh rate accurately. I filed this as a bug #10185. The method I'm currently pursuing here for frame timing is to constantly measure drift between real-time and reported time, if the refresh rate SDL reports is not the actual refresh rate then vsync-snapped times will necessarily drift (as they're being snapped to the wrong value).

slime73 commented 4 months ago

Is the proposed API planned to use the actual time between presents as reported by platform/GPU APIs (when possible), similar to https://unity.com/blog/engine-platform/fixing-time-deltatime-in-unity-2020-2-for-smoother-gameplay ? It may be more reliable than reported refresh rates in general, although there's also latency to think about.

On the one hand that information is hard to get outside of SDL's internals on some platforms, on the other hand SDL might not know enough about the graphics API currently being used to get that information itself on other platforms. Maybe it'd need an extra initialization API with parameters, for the latter...

Different present modes (like adaptive vsync) and VRR displays probably aren't very compatible with a basic 'log a multiple of the reported static refresh rate' approach.

kg commented 4 months ago

While trying to even get an ok sample implementation ready I'm running into issues with #5797. Pacing is smooth in the demo app that uses SDL_Renderer, but if I port it into my actual (OpenGL) project then I have to deal with some weird random pacing issues that seem to result from the compositor (randomly having SwapBuffers wait for 2-3 frames then try to make up for that with some much shorter paced frames, almost like it disables vsync for a few frames to catch up again, though I think this has something to do with how the DWM wants to buffer a few frames at a time. Its somewhat unclear how to handle this)

If you have access to the DXGI swapchain or the vulkan device, you can configure the queue depth which may be helpful for this.

I have a second issue which is that SDL is not reporting my monitor refresh rate accurately. I filed this as a bug #10185. The method I'm currently pursuing here for frame timing is to constantly measure drift between real-time and reported time, if the refresh rate SDL reports is not the actual refresh rate then vsync-snapped times will necessarily drift (as they're being snapped to the wrong value).

Keep in mind that if you're relying on knowing the exact refresh rate of the monitor, it can drift a little bit (I forget how you monitor this, but I've seen it before), and G-Sync/FreeSync could cause your presents to not match the refresh rate anyway. So whatever frame pacing algorithm you end up with needs to handle both of those scenarios, though the former one is not terribly catastrophic (I think the most i've seen is +/- 0.1hz)

TylerGlaiel commented 4 months ago

Is the proposed API planned to use the actual time between presents as reported by platform/GPU APIs (when possible), similar to https://unity.com/blog/engine-platform/fixing-time-deltatime-in-unity-2020-2-for-smoother-gameplay ? It may be more reliable than reported refresh rates in general, although there's also latency to think about.

The proposed api is "SDL handles this internally however is best so we don't have to think about it". Seeing as the proposed solution in that blog post is to use features available at the platform (DXGI) level, it seems like getting that actual info would only be possible if implemented in SDL internals (and might require layering opengl on top of DXGI with something like this when possible, from what I can gather. Is this something SDL would be willing to implement with something like SDL_GL_SetAttribute(SDL_GL_LAYER_ON_DXGI, 1); ? )

Different present modes (like adaptive vsync) and VRR displays probably aren't very compatible with a basic 'log a multiple of the reported static refresh rate' approach. Ideally those cases would just be detected as "we aren't vsynced anymore, so switch to the not-vsynced timing method".

If we have access to DXGI timing information then this whole thing probably simplifies down a ton just cause of this https://learn.microsoft.com/en-us/windows/win32/api/dxgi/nf-dxgi-idxgiswapchain-getframestatistics

========

If you have access to the DXGI swapchain or the vulkan device, you can configure the queue depth which may be helpful for this.

the issue seems to be if there's a latency spike that causes it to miss a few frames, it will "accept new frames until the queue is full", and measured times between calls to SwapBuffers will be a lot lower than they will actually be displayed at on screen. "Correct" behavior in this case would be to treat each of those frames as 1 vsync worth of time each and ignore the measured time. Over a few frames the measured timing should return to what is expected.

========

I don't have a good handle on the platform level stuff here (thats why I'm using SDL lol) so I can't really provide a good implementation here.

TylerGlaiel commented 4 months ago

ok I have this as a proof of concept now after staying up way too late last night

https://github.com/TylerGlaiel/SDL-Frame-Pacing-Sample

in order to get accurate frame time measurements, we need to use DXGI. I found some outdated sample code for how to do OpenGL on DXGI, the sample code did not work with the newer FLIP swapchains, but I found a way to work around it (instead of telling GL to render to the DXGI back buffers, render to a directX texture first, then copy that to the back buffer. this seems to work). It would be very nice to have SDL support this as a window flag, since it seems flat out superior to the opengl backend that is currently in place. Consider this a proof of concept that this is indeed something feasible for SDL to do.

With DXGI, swapChain->GetFrameStatistics returns extremely accurate values (within a few ticks of each other), and DXGI is kind of just smoother in general. GetFrameStatistics requires DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL, which is why the previous work was necessary.

GetFrameStatistics returns times accurate to when the DXGI swapchain presents images to the monitor. Notably this is not the same as the delta between calls to Present, if Vsync if off this will repeatedly report the same time until it pushes a new frame to the monitor (0 delta). In this case, we fall back to QueryPerformanceCounter instead.

This is not a full solution for accurate frame time, but since the error is basically near-zero now when vsynced, a lot less guesswork is needed on the frame pacing side of things. There's still occasional latency spikes, but the steady state results in times all within about ~100 ticks (0.15%) of each other (vs ~5000 (7.15%) with QueryPerformanceCounter instead). This means the snap-to-vsync frame timing method doesnt need to fuck about with averaging a million times together to try and even out error, you can just check the time on a single frame. I have not put much effort into that side of things yet, I just wanted to get some actual times first.

past-due commented 4 months ago

ok I have this as a proof of concept now after staying up way too late last night

https://github.com/TylerGlaiel/SDL-Frame-Pacing-Sample

Very interesting. It looks like WGL_NV_DX_interop is reasonably well-supported across not just Nvidia cards but some AMD and Intel as well (although barely exceeds 50% overall coverage of reports on https://opengl.gpuinfo.org/listextensions.php)

flibitijibibo commented 4 months ago

This may be something we can support in #9312 as well, would just have to make KHR_swapchain optional in that case - I have no idea what that looks like though, if anyone does know and is willing to futz with this file then we could probably make something usable with SDL_vulkan in addition to OpenGL windows.

TylerGlaiel commented 4 months ago

ok I have this as a proof of concept now after staying up way too late last night https://github.com/TylerGlaiel/SDL-Frame-Pacing-Sample

Very interesting. It looks like WGL_NV_DX_interop is reasonably well-supported across not just Nvidia cards but some AMD and Intel as well (although barely exceeds 50% overall coverage of reports on https://opengl.gpuinfo.org/listextensions.php)

The coverage there seems irrelevant as the majority of the drivers without it are non-windows and I wouldn't expect this to be relevant on non-windows platforms anyway. Filtering for just windows the coverage for WGL_NV_DX_interop2 is 73% which seems just about as well supported as anything else there.

thatcosmonaut commented 4 months ago

I have been wondering, how do we replicate this timing check on non-Windows platforms? Right now the way we create a swapchain in the GPU proposal is by calling ClaimWindow, which internally sets up a swapchain structure but does not expose any internal handle. If frame pacing in general depends on having a swapchain handle that might change how we want to structure things. This seems hairy in general because swapchains are dependent on both the graphics API in use and the operating system's window management.

thatcosmonaut commented 4 months ago

I did some investigation and it appears that support for presentation timing queries basically only exists with Windows + DXGI. While it's great to have accurate timings for that, it does strike me as awkward that any generalized graphics implementation we do would have to expose structures to the frame pacer that can only actually be used in a Windows + DXGI context.

flibitijibibo commented 4 months ago

On the Wayland side we have a protocol for present timing feedback, and Vulkan should have GOOGLE_present_timing, not sure about other targets.

TylerGlaiel commented 4 months ago

On the Wayland side we have a protocol for present timing feedback, and Vulkan should have GOOGLE_present_timing, not sure about other targets.

on apple there is https://developer.apple.com/documentation/corevideo/cvdisplaylink-k0k

on any platforms where that timing isn't available you can just fall back to using SDL_GetPerformanceCounter and averaging it out over a few frames, which is what all sdl games currently have to do

thatcosmonaut commented 4 months ago

Alright, I think what we could do from GPU side is implement a function like SDL_GpuGetPresentTiming(SDL_Window* window) that returns the timing values from the appropriate backend feature (or error if it's unsupported). Then we could pass that in to the appropriate function on the frame pacer API.

TylerGlaiel commented 4 months ago

Alright, I think what we could do from GPU side is implement a function like SDL_GpuGetPresentTiming(SDL_Window* window) that returns the timing values from the appropriate backend feature (or error if it's unsupported). Then we could pass that in to the appropriate function on the frame pacer API.

Would this be exclusive to SDL_Gpu or could this be done on SDL_window or SDL_GLContext so existing apps can make use of it?

Also, Is this feature different enough from frame pacing that it would be worth opening a separate issue about here (in the main SDL repository)?

thatcosmonaut commented 4 months ago

Would this be exclusive to SDL_Gpu or could this be done on SDL_window or SDL_GLContext so existing apps can make use of it?

Swapchains need access to the graphics context so I don't think Window alone would be enough. I think any API that uses its own context would probably have to implement a similar function.

Also, Is this feature different enough from frame pacing that it would be worth opening a separate issue about here (in the main SDL repository)?

I can't personally think of any reason why I would want granular access to present timings outside of frame pacing, but maybe there's a use case I'm missing.

TylerGlaiel commented 4 months ago

I can't personally think of any reason why I would want granular access to present timings outside of frame pacing, but maybe there's a use case I'm missing.

Oh I meant opening "OpenGL-On-DXGI" as a separate issue, since it seems like that's a prerequisite to get these frame timings + allows fixing #10185 as well, and I'm not sure that can be transparently added under-the-hood because it requires changes on the client side openGL code in 2 small ways if its enabled: the default framebuffer is no longer 0, and the Y axis is flipped. It might be possible to adjust for that by drawing a flipped quad instead of using CopyResource to transfer the gl framebuffer to the DXGI backbuffer, and it might be possible to adjust for the default framebuffer with wgl stuff I don't know about or (jankily) overriding glBindFramebuffer with a macro and map 0 to the correct framebuffer.

====

Related to that I've updated the sample slightly, turns out I can use the opengl context SDL creates just fine, and just "staple a DXGI swapchain onto the window and set up interop stuff". I've updated the sample to reflect that (which should hopefully show that it would not actually be all that much work to implement it as a SDL_GL_SetAttribute flag, since it doesn't require changing any other initialization code) https://github.com/TylerGlaiel/SDL-Frame-Pacing-Sample

I also moved when I measure timing into to right after waiting on the LatencyWaitableObject, which I think is the correct place to measure it from what I can tell. I don't get random 0 dt frames anymore when I do that.

I've been experimenting with how GetFrameStatistics behaves with certain vsync / gsync modes and driver settings.

Best case scenario is vsync is on and you know its on, in which case you can just rely on its timings. If vsync is off, you have to fall back to measuring timings, as GetFrameStatistics still just reports times synced with the monitor. I can't quite figure out how to get this to work with Gsync as it seems like using DXGI makes gsync not actually work, maybe I'm doing something wrong here (it works if I revert back to the non-dxgi version).

Detecting if vsync is on or off is still necessary as people can force it on or off in the driver settings, in this case the best way to tell seems to be to just measure the difference between GetFrameStatistics's reported time and the measured time immediately after waiting on the LatencyWaitableObject. If vsync is on, these times are almost identical outside of hitches/latency spikes. If vsync is off, these times will diverge. Check the median divergence over a few frames to ignore hiccups. Also if the delta between 2 calls to GetFrameStatistics is ever 0, then vsync is definitely off (though in this case the divergence should also be high, so its probably not necessary to check manually). This seems a lot more reliable than "guessing if measured times are vsync-ish" at least.

slime73 commented 4 months ago

Swapchains need access to the graphics context so I don't think Window alone would be enough. I think any API that uses its own context would probably have to implement a similar function.

On Apple platforms you'd need to either modify SDL's video subsystem internals or do what it already does, in order to use CA/CVDisplayLink for timings. SDL's OpenGL code on macOS already creates a CVDisplayLink, for example. A SDL_gpu implementation would ideally rely on the video subsystem being updated for that where possible. There are other platforms where the idea is similar too, and it's why I suggested an initialization API for frame timings.

I think the video subsystem doing everything it can to expose accurate frame timing makes sense, and on platforms and backends where it can't do anything that's where code using a graphics API can take over (or SDL can provide an abstraction function to help with that as well, separate from a full GPU API).

Personally I'd like to avoid artificially limiting this to a SDL_gpu API, since plenty of code that uses SDL won't use a SDL_gpu but would still like to benefit from accurate frame timings.

TylerGlaiel commented 4 months ago

Ok weirdly enough I found out that Gsync works fine if I move the window to my non-primary monitor (lol?), in that case according to this, to use gsync you need to basically just turn vsync off and "let windows do magic" here. GetFrameStatistics in this case behaves identical to the "no vsync" case, meaning we have to fall back to measured timings instead of synced ones. (Why it behaves like that is a little puzzling to me, maybe a side effect of being in windowed mode)

bartwe commented 4 months ago

I'd also recommend having a look at how this was done in openvr https://github.com/ValveSoftware/openvr/wiki/Compositor_FrameTiming

Additionally, for smooth animations ideally we'd have an api for the precise timing of the predicted next buffer flip and monitor refresh present.

TylerGlaiel commented 4 months ago

Last update on my sample https://github.com/TylerGlaiel/SDL-Frame-Pacing-Sample

Added a bool to configure whether or not DXGI is used, and implemented a non-DXGI frame timing sample. This is kind of about as far as I want to take that sample, all of this is before even getting into the complexities of frame pacing itself. The most annoying part seems to just be detecting if vsync is actually on or not.

This is as far as I wanna take this sample for now, since I gotta get back to my actual gamedev work again.

wishlist: SDL_GetFrameStatistics(...) for getting accurate frame timing measurements from the various OS layers that allow this. also layer opengl on dxgi so we actually have this info in opengl apps SDL_GetFrameTime(...) for getting snapped/filtered frame deltas based on GetPerformanceCounter & GetFrameStatistics (if available) SDL_AccurateDelay(...) for delaying a specific number of ticks (ex sleep for as long as you can based on the resolution of sleep, then wake up and spin until the correct time) SDL_PaceFrame(...) for taking a filtered delta time and dispatching appropriate fixed_update/variable_update/render/present events based on that time

Somewhere in the pipeline this specific type of error needs to be compensated for, as I've seen it happen even with the high-accuracy GetFrameStatistics. Snapping to vsync works ok, though sometimes this error can even be more than 50% of a frame if you're on a high refresh rate monitor, and so snapping ends up also showing this same error

Additionally, for smooth animations ideally we'd have an api for the precise timing of the predicted next buffer flip and monitor refresh present.

if SDL_PaceFrame takes control of issuing Presents, the best way this can probably be done is by measuring how long previous frames have taken on average, using that as a prediction, then SDL_AccurateDelay() after render to try and keep the processing time for the frames in line with the prediction (with some care to make sure we dont miss vsyncs from delaying too long). Ex if you "predict" 10ms and the frame only takes 2ms to render, wait the extra 8ms before presenting (and then adjust the prediction for the next frame). This can be part of the higher level frame pacing system instead of the lower level timing system.

thatcosmonaut commented 4 months ago

Just want to note that on newer command buffer APIs the acquire-draw-present loop is much more abstract and asynchronous than OpenGL, for example on Vulkan swapchain acquisition and presentation are synchronized on the GPU and the only real control you have over the actual timing is the presentation strategy you request (immediate, mailbox, or FIFO). You can submit multiple acquisitions and presentations before any work is finished to increase GPU utilization in GPU-bound scenarios (obviously this is at the cost of input latency).

ewichuu commented 4 months ago

any clues on how this can be done on Linux? AFAIK the x11 monitor api returns bogus data a lot of the time and the user has to manually edit a config file to make it return the correct value

wayland is probably better about this tho

Lzard commented 4 months ago

Maybe the functions of this API should take an argument specifying which method to use?

SDL_FRAMETIME_RAW for the performance counter value difference between the last two frames
SDL_FRAMETIME_AVERAGE for an average over the last X frames
SDL_FRAMETIME_VIDEO_* for OS provided methods
SDL_FRAMETIME_RENDER_* for GPU provided methods
SDL_FRAMETIME_MOST_ACCURATE for the most accurate method available

The functions would return an error code/NULL when the specified time source is unavailable for the given window.

TylerGlaiel commented 4 months ago

Maybe the functions of this API should take an argument specifying which method to use?

I don't think the client side should have to care about where the timestamps come from, especially when which method to use depends on what's available on the platform level + also what mode the monitor is actually in (vsync, non vsync, gsync/freesync, etc) + if it ever switches which modes its using (ex going from "gsynced but fast enough to push frames" to "gsyned but not hitting the framerate") then it needs to compensate for the difference between the timing methods when it switches which one its using

bartwe commented 4 months ago

Reference to Android Frame Pacing Library: https://developer.android.com/games/sdk/frame-pacing And for better sleeping: https://blog.bearcats.nl/accurate-sleep-function/

slouken commented 4 months ago

FYI, SDL already uses the better sleeping solution in the newer post by computerBear. You can double check that you're using it by seeing if the CREATE_WAITABLE_TIMER_HIGH_RESOLUTION code is being compiled in, in src/timer/windows/SDL_systimer.c

slouken commented 4 months ago

Also, this might be relevant for your interests: https://github.com/libsdl-org/SDL/commit/730d5cf2f889b553852bd02b2d56dedf8690872a

libsdl-org / SDL

Wishlist: Frame Pacing Subsystem #10160