Closed TylerGlaiel closed 2 months ago
I understand why you want this, but this is loaded with footguns. It makes more sense in the specific context of frame pacing than a general function that people might use everywhere and wonder why their application is using so much CPU time.
@icculus, thoughts?
I mean I also kind of think its a footgun that SDL_Delay is very inaccurate (I've seen a lot of bad sample code out there that uses it incorrectly as a result)
it seems there's some processor intrinsic / asm that can help with the cpu usage here (__mm_pause()) but I dont really know quite how those work (edit, on windows shove YieldProcessor(); in the spin loop))
We implemented a similar thing for FNA which tries to factor in scheduler precision:
https://github.com/FNA-XNA/FNA/commit/46216d6cd1ff832eaafa2aef96a088b13a474b25
It does work but it's also C# so it's probably not as good as it could be. I dunno what an SDL variant would look like but it's at least another example where such a function would probably simplify things a lot and benefit other frameworks too.
Fair enough, we'll consider this for SDL3
FYI, there's an interesting discussion of this topic at https://blog.bearcats.nl/perfect-sleep-function/
Can you verify that this is actually needed on Windows?
Yeah, SDL_Delay forwards to SDL_DelayNS (which I do believe was compiled with that flag on) and it was regularly oversleeping, anywhere from 0 to 1ms on my machine
Also, you don't actually want to yield the processor, because you might be rescheduled much later than you want.
YieldProcessor(); on windows expands to the __mm_pause; intrinsic, which is documented as "an instruction that makes spin loops use less energy" / "a 140-cycle noop" https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_pause&ig_expand=4897 I'm not an expert on this but I believe thats a different thing than yielding the thread back to the scheduler? I appear to get accurate sleeps when I do that
With that in there it seems to be sleeping the exact amounts requested, since 140 cycles is well under the precision of GetPerformanceCounter anyway
also an alternative formulation of the same thing would be SDL_WaitUntil(int64 QPCTime);, unsure whether that or AccurateDelay is nicer
Okay, I'm making SDL_DelayNS() the precise sleeping function in SDL.
Here's an adaptation of computerBear's sleep function to compare with SDL's method:
// The PERFECT sleeping function for Windows.
// - Sleep times accurate to 1 microsecond
// - Low CPU usage
// - Runs on Windows Vista and up
#include <Windows.h>
#include <SDL3/SDL.h>
#include <SDL3/SDL_main.h>
#pragma comment(lib, "Winmm.lib") // timeGetDevCaps, timeBeginPeriod
HANDLE Timer;
int SchedulerPeriodMs;
INT64 QpcPerSecond;
void PreciseSleep(double seconds)
{
LARGE_INTEGER qpc;
QueryPerformanceCounter(&qpc);
INT64 targetQpc = (INT64)(qpc.QuadPart + seconds * QpcPerSecond);
if (Timer) // Try using a high resolution timer first.
{
const double TOLERANCE = 0.001'02;
INT64 maxTicks = (INT64)SchedulerPeriodMs * 9'500;
for (;;) // Break sleep up into parts that are lower than scheduler period.
{
double remainingSeconds = (targetQpc - qpc.QuadPart) / (double)QpcPerSecond;
INT64 sleepTicks = (INT64)((remainingSeconds - TOLERANCE) * 10'000'000);
if (sleepTicks <= 0)
break;
LARGE_INTEGER due;
due.QuadPart = -(sleepTicks > maxTicks ? maxTicks : sleepTicks);
SetWaitableTimerEx(Timer, &due, 0, NULL, NULL, NULL, 0);
WaitForSingleObject(Timer, INFINITE);
QueryPerformanceCounter(&qpc);
}
} else // Fallback to Sleep.
{
const double TOLERANCE = 0.000'02;
double sleepMs = (seconds - TOLERANCE) * 1000 - SchedulerPeriodMs; // Sleep for 1 scheduler period less than requested.
int sleepSlices = (int)(sleepMs / SchedulerPeriodMs);
if (sleepSlices > 0)
Sleep((DWORD)sleepSlices * SchedulerPeriodMs);
QueryPerformanceCounter(&qpc);
}
while (qpc.QuadPart < targetQpc) // Spin for any remaining time.
{
YieldProcessor();
QueryPerformanceCounter(&qpc);
}
}
int main(int argc, char *argv[])
{
// Initialization
double target_delay = 1 / 60.0;
double total_oversleep = 0.0;
SDL_bool use_SDL = (argc == 2 && SDL_strcmp(argv[1], "--SDL") == 0);
Timer = CreateWaitableTimerExW(NULL, NULL, CREATE_WAITABLE_TIMER_HIGH_RESOLUTION, TIMER_ALL_ACCESS);
TIMECAPS caps;
timeGetDevCaps(&caps, sizeof caps);
timeBeginPeriod(caps.wPeriodMin);
SchedulerPeriodMs = (int)caps.wPeriodMin;
LARGE_INTEGER qpf;
QueryPerformanceFrequency(&qpf);
QpcPerSecond = qpf.QuadPart;
// Game loop
for (int i = 0; i < 100; ++i) {
LARGE_INTEGER qpc0, qpc1;
QueryPerformanceCounter(&qpc0);
if (use_SDL) {
SDL_DelayNS(SDL_NS_PER_SECOND / 60);
} else {
PreciseSleep(1 / 60.0);
}
QueryPerformanceCounter(&qpc1);
double dt = (qpc1.QuadPart - qpc0.QuadPart) / (double)QpcPerSecond;
double oversleep = (dt - target_delay);
total_oversleep += oversleep;
SDL_Log("Slept for %.2f ms (overslept %.2f ms)\n", 1000 * dt, 1000 * oversleep);
}
if (use_SDL) {
SDL_Log("Used SDL_DelayNS() method, total overslept: %.2f ms\n", 1000 * total_oversleep);
} else {
SDL_Log("Used PreciseSleep() method, total overslept: %.2f ms\n", 1000 * total_oversleep);
}
return 0;
}
Here's the output on my machine over 10 runs, discarding scheduling outliers:
INFO: Used PreciseSleep() method, total overslept: 1.67 ms
INFO: Used PreciseSleep() method, total overslept: 3.10 ms
INFO: Used PreciseSleep() method, total overslept: 6.38 ms
INFO: Used PreciseSleep() method, total overslept: 4.75 ms
INFO: Used PreciseSleep() method, total overslept: 2.18 ms
INFO: Used PreciseSleep() method, total overslept: 2.83 ms
INFO: Used PreciseSleep() method, total overslept: 2.91 ms
INFO: Used PreciseSleep() method, total overslept: 6.61 ms
INFO: Used PreciseSleep() method, total overslept: 6.97 ms
INFO: Used PreciseSleep() method, total overslept: 10.62 ms
INFO: Used SDL_DelayNS() method, total overslept: 0.52 ms
INFO: Used SDL_DelayNS() method, total overslept: 1.60 ms
INFO: Used SDL_DelayNS() method, total overslept: 1.08 ms
INFO: Used SDL_DelayNS() method, total overslept: 2.87 ms
INFO: Used SDL_DelayNS() method, total overslept: 3.05 ms
INFO: Used SDL_DelayNS() method, total overslept: 1.24 ms
INFO: Used SDL_DelayNS() method, total overslept: 1.36 ms
INFO: Used SDL_DelayNS() method, total overslept: 8.88 ms
INFO: Used SDL_DelayNS() method, total overslept: 2.99 ms
INFO: Used SDL_DelayNS() method, total overslept: 1.84 ms
Note that both of these methods had occasional scheduling hiccoughs which caused "total overslept" to jump to over 100 ms.
Done! Thanks for the suggestion! :)
sample implementation, requires no platform specific code SDL_Delay, undershooting by a set amount, then wake up and spin until the accurate amount of time has passed adjust the undershoot amount if it detects that it overslept