time.monotonic() should use a different clock source on Windows

python / cpython

The Python programming language

https://www.python.org

Other

63.36k stars 30.34k forks source link

time.monotonic() should use a different clock source on Windows #88494

Closed a307e466-d652-4777-89ea-b4973d3d4fb3 closed 7 months ago

a307e466-d652-4777-89ea-b4973d3d4fb3 commented 3 years ago

BPO	44328
Nosy	@pfmoore, @abalkin, @vstinner, @tjguk, @zware, @eryksun, @zooba, @pganssle, @lunixbochs

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.11', 'library', 'OS-windows', 'performance'] title = 'time.monotonic() should use a different clock source on Windows' updated_at = user = 'https://github.com/lunixbochs' ``` bugs.python.org fields: ```python activity = actor = 'eryksun' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)', 'Windows'] creation = creator = 'lunixbochs2' dependencies = [] files = [] hgrepos = [] issue_num = 44328 keywords = [] message_count = 12.0 messages = ['395221', '395238', '395490', '395493', '395681', '395683', '395719', '395769', '395771', '395782', '395784', '395849'] nosy_count = 9.0 nosy_names = ['paul.moore', 'belopolsky', 'vstinner', 'tim.golden', 'zach.ware', 'eryksun', 'steve.dower', 'p-ganssle', 'lunixbochs2'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'performance' url = 'https://bugs.python.org/issue44328' versions = ['Python 3.11'] ```

Linked PRs

gh-116781
gh-118395

a307e466-d652-4777-89ea-b4973d3d4fb3 commented 3 years ago

Presumably time.monotonic() on Windows historically used GetTickCount64() because QueryPerformanceCounter() could fail. However, that hasn't been the case since Windows XP: https://docs.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter

On systems that run Windows XP or later, the function will always succeed and will thus never return zero

I've run into issues with this when porting python-based applications to Windows. On other platforms, time.monotonic() was a decent precision so I used it. When I ported to Windows, I had to replace all of my time.monotonic() calls with time.perf_counter(). I would pretty much never knowingly call time.monotonic() if I knew ahead of time it could be quantized to 16ms.

My opinion is that the GetTickCount64() monotonic time code in CPython should be removed entirely and only the QueryPerformanceCounter() path should be used.

I also think some of the failure checks could be removed from QueryPerformanceCounter() / QueryPerformanceFrequency(), as they're documented to never fail in modern Windows and CPython has been dropping support for older versions of Windows, but that's less of a firm opinion.

a307e466-d652-4777-89ea-b4973d3d4fb3 commented 3 years ago

I found these two references:

Which suggest QueryPerformanceCounter() may be bad because it can drift. However, these posts are fairly old and the StackOverflow post also says the drift is small on newer hardware / Windows.

Microsoft's current stance is that QueryPerformanceCounter() is good: https://docs.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps

Guidance for acquiring time stamps Windows has and will continue to invest in providing a reliable and efficient performance counter. When you need time stamps with a resolution of 1 microsecond or better and you don't need the time stamps to be synchronized to an external time reference, choose QueryPerformanceCounter

I looked into how a few other languages provide monotonic time on Windows:

Golang seems to read the interrupt time (presumably equivalent to QueryInterruptTime) directly by address. https://github.com/golang/go/blob/a3868028ac8470d1ab7782614707bb90925e7fe3/src/runtime/sys_windows_amd64.s#L499

Rust uses QueryPerformanceCounter: https://github.com/rust-lang/rust/blob/38ec87c1885c62ed8c66320ad24c7e535535e4bd/library/std/src/time.rs#L91

V8 uses QueryPerformanceCounter after checking for old CPUs: https://github.com/v8/v8/blob/dc712da548c7fb433caed56af9a021d964952728/src/base/platform/time.cc#L672

Ruby uses QueryPerformanceCounter: https://github.com/ruby/ruby/blob/44cff500a0ad565952e84935bc98523c36a91b06/win32/win32.c#L4712

C# implements QueryPerformanceCounter on other platforms using CLOCK_MONOTONIC, indicating that they should be roughly equivalent: https://github.com/dotnet/runtime/blob/01b7e73cd378145264a7cb7a09365b41ed42b240/src/coreclr/pal/src/misc/time.cpp#L175

Swift originally used QueryPerformanceCounter, but switched to QueryUnbiasedInterruptTime() because they didn't want to count time the system spent asleep: https://github.com/apple/swift-corelibs-libdispatch/commit/766d64719cfdd07f97841092bec596669261a16f

------

Note that none of these languages use GetTickCount64(). Swift is an interesting counter point, and I noticed QueryUnbiasedInterruptTime() is available on Windows 8 while QueryInterruptTime() is new as of Windows 10. The "Unbiased" just refers to whether it advances during sleep.

I'm not actually sure whether time.monotonic() in Python counts time spent asleep, or whether that's desirable. Some kinds of timers using monotonic time should definitely freeze during sleep so they don't cause a flurry of activity on wake, but others definitely need to roughly track wall clock time, even during sleep.

Perhaps the long term answer would be to introduce separate "asleep" and "awake" monotonic clocks in Python, and possibly deprecate perf_counter() if it's redundant after this (as I think it's aliased to monotonic() on non-Windows platforms anyway).

eryksun commented 3 years ago

You resolved bpo-41299 using QueryPerformanceCounter(), so we're already a step toward making it the default monotonic clock. Personally, I've only relied on QPC for short intervals, but, as you've highlighted above, other language runtimes use it for their monotonic clock. Since Vista, it's apparently more reliable in terms of calibration and ensuring that a processor TSC is only used if it's known to be invariant and constant.

That said, Windows 10 also provides QueryInterruptTimePrecise(), which is a hybrid solution. It uses the performance counter to interpolate a timestamp between interrupts. I'd prefer to use this for time.monotonic() instead of QPC, if it's available via GetProcAddress().

QueryInterruptTimePrecise() is about 1.38 times the cost of QPC (on average across 100 million calls). Both functions are significantly more expensive than QueryInterruptTime() and GetTickCount64(), which simply return a value that's read from shared memory (i.e. the KUSER_SHARED_DATA structure).

QueryUnbiasedInterruptTime() is available on Windows 8 while QueryInterruptTime() is new as of Windows 10. The "Unbiased" just refers to whether it advances during sleep.

QueryInterruptTime() and QueryUnbiasedInterruptTime() don't provide high-resolution timestamps. They're updated by the system timer interrupt service routine, which defaults to 64 interrupts/second. The time increment depends on when the counter is read by the ISR, but it averages out to approximately the interrupt period (e.g. 15.625 ms).

I'm not actually sure whether time.monotonic() in Python counts time spent asleep, or whether that's desirable.

POSIX doesn't specify whether CLOCK_MONOTONIC [1] should include the time that elapses while the system is in standby mode. In Linux, CLOCK_BOOTTIME includes this time, and CLOCK_MONOTONIC excludes it. Windows QueryUnbiasedInterruptTime[Precise]() excludes it.

Perhaps the long term answer would be to introduce separate "asleep" and "awake" monotonic clocks in Python

Both may not be supportable on all platforms, but they're supported in Linux, Windows 10, and macOS. The latter has mach_continuous_time(), which includes the time in standby mode, and mach_absolute_time(), which excludes it.

[1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/clock_gettime.html

a307e466-d652-4777-89ea-b4973d3d4fb3 commented 3 years ago

Great information, thanks!

Windows 10 also provides QueryInterruptTimePrecise(), which is a hybrid solution. It uses the performance counter to interpolate a timestamp between interrupts. I'd prefer to use this for time.monotonic() instead of QPC, if it's available via GetProcAddress()

My personal vote is to use the currently most common clock source (QPC) for now for monotonic(), because it's the same across Windows versions and the most likely to produce portable monotonic timestamps between apps/languages on the same system. It's also the easiest patch, as there's already a code path for QPC.

(As someone building multi-app experiences around Python, I don't want to check the Windows version to see which time base Python is using. I'd feel better about switching to QITP() if/when Python drops Windows 8 support.)

A later extension of this idea (maybe behind a PEP) could be to survey the existing timers available on each platform and consider whether it's worth extending time to expose them all, and unify cross-platform the ones that are exposed (e.g. better formalize/document which clocks will advance while the machine is asleep on each platform).

vstinner commented 3 years ago

Changing is clock is a tricky. There are many things to consider:

Is it really monotonic in all cases?
Does it have a better resolution than the previous clock?
Corner cases: does it include time spent in time.sleep() and while the system is suspended?
etc.

When I designed PEP-418 (in 2012), QueryPerformanceCounter() was not reliable:

"It has a much higher resolution, but has lower long term precision than GetTickCount() and timeGetTime() clocks. For example, it will drift compared to the low precision clocks." https://www.python.org/dev/peps/pep-0418/#windows-queryperformancecounter

And there were a few bugs like: "The performance counter value may unexpectedly leap forward because of a hardware bug".

A Microsoft blog article explains that users wanting a steady clock with precision higher than GetTickCount() should interpolate GetTickCount() using QueryPerformanceCounter(). If I recall correctly, this is what Firefox did for instance.

Eryk: "That said, Windows 10 also provides QueryInterruptTimePrecise(), which is a hybrid solution. It uses the performance counter to interpolate a timestamp between interrupts. I'd prefer to use this for time.monotonic() instead of QPC, if it's available via GetProcAddress()."

Oh, good that they provided an implementation for that :-)

V8 uses QueryPerformanceCounter after checking for old CPUs: https://github.com/v8/v8/blob/dc712da548c7fb433caed56af9a021d964952728/src/base/platform/time.cc#L672

It uses CPUID to check for "non stoppable time stamp counter": https://github.com/v8/v8/blob/master/src/base/cpu.cc

  // Check if CPU has non stoppable time stamp counter.
  const unsigned parameter_containing_non_stop_time_stamp_counter = 0x80000007;
  if (num_ext_ids >= parameter_containing_non_stop_time_stamp_counter) {
    __cpuid(cpu_info, parameter_containing_non_stop_time_stamp_counter);
    has_non_stop_time_stamp_counter_ = (cpu_info[3] & (1 << 8)) != 0;
  }

Maybe we use such check in Python: use GetTickCount() on old CPUs, or QueryPerformanceCounter() otherwise. MSVC provides the __cpuid() function: https://docs.microsoft.com/en-us/cpp/intrinsics/cpuid-cpuidex?view=msvc-160

Swift originally used QueryPerformanceCounter, but switched to QueryUnbiasedInterruptTime() because they didn't want to count time the system spent asleep

Oh, I recall that it was a tricky question. The PEP-418 simply says: "The behaviour of clocks after a system suspend is not defined in the documentation of new functions."

See "Include Sleep" and "Include Suspend" columns of my table: https://www.python.org/dev/peps/pep-0418/#monotonic-clocks

a307e466-d652-4777-89ea-b4973d3d4fb3 commented 3 years ago

I think a lot of that is based on very outdated information. It's worth reading this article: https://docs.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps

I will repeat Microsoft's current recommendation (from that article):

Windows has and will continue to invest in providing a reliable and efficient performance counter. When you need time stamps with a resolution of 1 microsecond or better and you don't need the time stamps to be synchronized to an external time reference, choose QueryPerformanceCounter, KeQueryPerformanceCounter, or KeQueryInterruptTimePrecise. When you need UTC-synchronized time stamps with a resolution of 1 microsecond or better, choose GetSystemTimePreciseAsFileTime or KeQuerySystemTimePrecise.

(Based on that, it may also be worth replacing time.time()'s GetSystemTimeAsFileTime with GetSystemTimePreciseAsFileTime in CPython, as GetSystemTimePreciseAsFileTime is available in Windows 8 and newer)

PEP-418:

It has a much higher resolution, but has lower long term precision than GetTickCount() and timeGetTime() clocks. For example, it will drift compared to the low precision clocks.

Microsoft on drift (from the article above):

To reduce the adverse effects of this frequency offset error, recent versions of Windows, particularly Windows 8, use multiple hardware timers to detect the frequency offset and compensate for it to the extent possible. This calibration process is performed when Windows is started.

Modern Windows also automatically detects and works around stoppable TSC, as well as several other issues:

Some processors can vary the frequency of the TSC clock or stop the advancement of the TSC register, which makes the TSC unsuitable for timing purposes on these processors. These processors are said to have non-invariant TSC registers. (Windows will automatically detect this, and select an alternative time source for QPC).

It seems like Microsoft considers QPC to be a significantly better time source now, than when PEP-418 was written.

Another related conversation is whether Python can just expose all of the Windows clocks directly (through clock_gettime enums?), as that gives anyone who really wants full control over their timestamps a good escape hatch.

vstinner commented 3 years ago

To reduce the adverse effects of this frequency offset error, recent versions of Windows, particularly Windows 8, use multiple hardware timers to detect the frequency offset and compensate for it to the extent possible. This calibration process is performed when Windows is started.

Technically, it remains possible to install Python on Windows 7, see: bpo-32592.

eryksun commented 3 years ago

On second thought, starting with Windows 8, WaitForSingleObject() and WaitForMultipleObjects() exclude time when the system is suspended. For consistency, an external deadline (e.g. for SIGINT support) should work the same way. The monotonic clock should thus be based on QueryUnbiasedInterruptTime(). We can conditionally use QueryUnbiasedInterruptTimePrecise() in Windows 10, which I presume includes most users of Python 3.9+ on Windows since Windows 8.1 only has a 3% share of desktop/laptop systems.

If we can agree on the above, then the change to use QueryPerformanceCounter() to resolve bpo-41299 should be reverted. The deadline should instead be computed with QueryUnbiasedInterruptTime(). It's limited to the resolution of the system interrupt time, but at least compared to GetTickCount64() it returns the real interrupt time instead of an idealized 64 ticks/second.

expose all of the Windows clocks directly (through clock_gettime enums?)

_Py_clock_gettime() and _Py_clock_getres() could be implemented in Python/pytime.c. For Windows we could implement the following clocks:

CLOCK_REALTIME            GetSystemTimePreciseAsFileTime
CLOCK_REALTIME_COARSE     GetSystemTimeAsFileTime
CLOCK_MONOTONIC_COARSE    QueryUnbiasedInterruptTime
CLOCK_PROCESS_CPUTIME_ID  GetProcessTimes
CLOCK_THREAD_CPUTIME_ID   GetThreadTimes
    CLOCK_PERF_COUNTER        QueryPerformanceCounter

Windows 10+
CLOCK_MONOTONIC           QueryUnbiasedInterruptTimePrecise
CLOCK_BOOTTIME            QueryInterruptTimePrecise
CLOCK_BOOTTIME_COARSE     QueryInterruptTime

it may also be worth replacing time.time()'s GetSystemTimeAsFileTime with GetSystemTimePreciseAsFileTime

See bpo-19007, which is nearly 8 years old.

a307e466-d652-4777-89ea-b4973d3d4fb3 commented 3 years ago

The monotonic clock should thus be based on QueryUnbiasedInterruptTime

My primary complaint here is that Windows is the only major platform with a low resolution monotonic clock. Using QueryUnbiasedInterruptTime() on older OS versions wouldn't entirely help that, so we have the same consistency issue (just on a smaller scale). I would personally need to still use time.perf_counter() instead of time.monotonic() due to this, but I'm not totally against it.

For consistency, an external deadline (e.g. for SIGINT support) should work the same way.

Have there been any issues filed about the deadline behaviors across system suspend?

which I presume includes most users of Python 3.9+

Seems like Windows 7 may need to be considered as well, as per vstinner's bpo-32592 mention?

starting with Windows 8, WaitForSingleObject() and WaitForMultipleObjects() exclude time when the system is suspended

Looks like Linux (CLOCK_MONOTONIC) and macOS (mach_absolute_time()) already don't track suspend time in time.monotonic(). I think that's enough to suggest that long-term Windows shouldn't either, but I don't know how to reconcile that with my desire for Windows not to be the only platform with low resolution monotonic time by default.

then the change to use QueryPerformanceCounter() to resolve bpo-41299 should be reverted. The deadline should instead be computed with QueryUnbiasedInterruptTime()

I don't agree with this, as it would regress the fix. This is more of a topic for bpo-41299, but I tested QueryUnbiasedInterruptTime() and it exhibits the same 16ms jitter as GetTickCount64() (which I expected), so non-precise interrupt time can't solve this issue. I do think QueryUnbiasedInterruptTimePrecise() would be a good fit. I think making this particular timeout unbiased (which would be a new behavior) should be a lower priority than making it not jitter.

For Windows we could implement the following clocks:

I think that list is great and making those enums work with clock_gettime on Windows sounds like a very clear improvement to the timing options available. Having the ability to query each clock source directly would also reduce the impact if time.monotonic() does not perfectly suit a specific application.

---

I think my current positions after writing all of this are:

I would probably be in support of a 3.11+ change for time.monotonic() to use QueryUnbiasedInterruptTime() pre-Windows 10, and dynamically use QueryUnbiasedInterruptTimePrecise() on Windows 10+. Ideally the Windows clock_gettime() code lands in the same release, so users can directly pick their time source if necessary. This approach also helps my goal of making time.monotonic()'s suspend behavior more uniform across platforms.
Please don't revert bpo-41299 (especially the backports), as it does fix the issue and tracking suspend time is the same (not a regression) as the previous GetTickCount64() code. I think the lock timeouts should stick with QPC pre-Windows-10 to fix the jitter, but could use QueryUnbiasedInterruptTimePrecise() on Windows 10+ (which needs the same runtime check as the time.monotonic() change, thus could probably land in the same patch set).
I'm honestly left with more questions than I started after diving into the GetSystemTimePreciseAsFileTime() rabbit hole. I assume it's not a catastrophic issue? Maybe it's a situation where adding the clock_gettime() enums would sufficiently help anyone who cares about the exact behavior during clock modification. I don't have strong opinions about it, besides it being a shame that Windows currently has lower precision timestamps in general. Could be worth doing a survey of other languages' choices, but any further discussion can probably go to bpo-19007.

eryksun commented 3 years ago

Seems like Windows 7 may need to be considered as well, as per vstinner's bpo-32592 mention?

Python 3.9 doesn't support Windows 7. Moreover, the interpreter DLL in 3.9 implicitly imports PathCchCanonicalizeEx, PathCchCombineEx, and PathCchSkipRoot, which were added in Windows 8. So it won't even load in Windows 7.

Have there been any issues filed about the deadline behaviors across system suspend?

Not that I'm aware of, but waits should be correct and consistent in principle. It shouldn't behave drastically different just because the user closed the laptop lid for an hour.

Looks like Linux (CLOCK_MONOTONIC) and macOS (mach_absolute_time()) already don't track suspend time in time.monotonic(). I think that's enough to suggest that long-term Windows shouldn't either

I'm not overly concerned here with cross-platform consistency. If Windows hadn't changed the behavior of wait timeouts, then I wouldn't worry about it since most clocks in Windows are biased by the time spent suspended. It's a bonus that this change would also improve cross-platform consistency for time.monotonic().

I tested QueryUnbiasedInterruptTime() and it exhibits the same 16ms jitter as GetTickCount64() (which I expected),

For bpo-41299, it occurs to me that we've only ever used _PY_EMULATED_WIN_CV, in which case PyCOND_TIMEDWAIT() returns 1 for a timeout, as implemented in _PyCOND_WAIT_MS(). Try changing EnterNonRecursiveMutex() to break out of the loop in this case. For example:

    } else if (milliseconds != 0) {
        /* wait at least until the target */
        ULONGLONG now, target;
        QueryUnbiasedInterruptTime(&target);
        target += milliseconds;
        while (mutex->locked) {
            int ret = PyCOND_TIMEDWAIT(&mutex->cv, &mutex->cs,
                        (long long)milliseconds * 1000);
            if (ret < 0) {
                result = WAIT_FAILED;
                break;
            }
            if (ret == 1) { /* timeout */
                break;
            }
            QueryUnbiasedInterruptTime(&now);
            if (target <= now)
                break;
            milliseconds = (DWORD)(target - now);
        }
    }

a307e466-d652-4777-89ea-b4973d3d4fb3 commented 3 years ago

It shouldn't behave drastically different just because the user closed the laptop lid for an hour

I talked to someone who's been helping with the Go time APIs and it seems like that holds pretty well for interactive timeouts, but makes no sense for network related code. If you lost a network connection (with, say, a 30 second timeout) due to the lid being closed, you don't want to wait 30 seconds after opening the lid for the application to realize it needs to reconnect. (However there's probably no good way to design Python's locking system around both cases, so it's sufficient to say "lock timers won't advance during suspend" and make the application layer work around that on its own in the case of network code)

Try changing EnterNonRecursiveMutex() to break out of the loop in this case

This does work, but unfortunately a little too well - in a single test I saw several instances where that approach returned _earlier_ than the timeout.

I assume the reason for this loop is the call can get interrupted with a "needs retry" state. If so, you'd still see 16ms of jitter anytime that happens as long as it's backed by a quantized time source.

eryksun commented 3 years ago

> Try changing EnterNonRecursiveMutex() to break out of the loop in > this case

This does work, but unfortunately a little too well - in a single test I saw several instances where that approach returned _earlier_ than the timeout.

It's documented that a timeout between N and N+1 ticks can be satisfied anywhere in that range. In practice I see a wider range. In the kernel, variations in wait time could depend on when the due time is calculated in the interrupt cycle, when the next interrupt occurs and the interrupt time is updated, and when the thread is dispatched. A benefit of using a high-resolution external deadline is that waiting will never return early, but it may return later than it otherwise would, e.g. if re-waiting for a remaining 1 ms actually takes 20 ms.

There are many unrelated WaitForSingleObject and WaitForMultipleObjects in the interpreter, extension modules, and code that uses _winapi.WaitForSingleObject and _winapi.WaitForMultipleObjects. For example, time.sleep() allows WAIT_TIMEOUT to override the deadline. I suggest measuring the performance-counter interval for time.sleep(0.001) on both the main thread (Sleep based) and a new thread (WaitForSingleObjectEx based).

vstinner commented 8 months ago

IMO we should check how web browsers manage time on the different platforms. They have high expectations from clocks: security, efficiency, reliability.

Examples:

https://searchfox.org/mozilla-central/source/security/sandbox/chromium/base/time/time_win.cc -- uint64_t QPCNowRaw() calls QueryPerformanceCounter().
https://searchfox.org/mozilla-central/source/ipc/chromium/src/base/time_posix.cc -- TimeTicks TimeTicks::Now() calls clock_gettime(CLOCK_MONOTONIC).
https://searchfox.org/mozilla-central/source/gfx/angle/checkout/src/common/system_utils_win.cpp -- double GetCurrentSystemTime() calls QueryPerformanceCounter().

time_win.cc contains two long comments:

// Windows Timer Primer
//
// A good article:  http://www.ddj.com/windows/184416651
// A good mozilla bug:  http://bugzilla.mozilla.org/show_bug.cgi?id=363258
//
// The default windows timer, GetSystemTimeAsFileTime is not very precise.
// It is only good to ~15.5ms.
//
// QueryPerformanceCounter is the logical choice for a high-precision timer.
// However, it is known to be buggy on some hardware.  Specifically, it can
// sometimes "jump".  On laptops, QPC can also be very expensive to call.
// It's 3-4x slower than timeGetTime() on desktops, but can be 10x slower
// on laptops.  A unittest exists which will show the relative cost of various
// timers on any system.
//
// The next logical choice is timeGetTime().  timeGetTime has a precision of
// 1ms, but only if you call APIs (timeBeginPeriod()) which affect all other
// applications on the system.  By default, precision is only 15.5ms.
// Unfortunately, we don't want to call timeBeginPeriod because we don't
// want to affect other applications.  Further, on mobile platforms, use of
// faster multimedia timers can hurt battery life.  See the intel
// article about this here:
// http://softwarecommunity.intel.com/articles/eng/1086.htm
//
// To work around all this, we're going to generally use timeGetTime().  We
// will only increase the system-wide timer if we're not running on battery
// power.

and:


// Discussion of tick counter options on Windows:
//
// (1) CPU cycle counter. (Retrieved via RDTSC)
// The CPU counter provides the highest resolution time stamp and is the least
// expensive to retrieve. However, on older CPUs, two issues can affect its
// reliability: First it is maintained per processor and not synchronized
// between processors. Also, the counters will change frequency due to thermal
// and power changes, and stop in some states.
//
// (2) QueryPerformanceCounter (QPC). The QPC counter provides a high-
// resolution (<1 microsecond) time stamp. On most hardware running today, it
// auto-detects and uses the constant-rate RDTSC counter to provide extremely
// efficient and reliable time stamps.
//
// On older CPUs where RDTSC is unreliable, it falls back to using more
// expensive (20X to 40X more costly) alternate clocks, such as HPET or the ACPI
// PM timer, and can involve system calls; and all this is up to the HAL (with
// some help from ACPI). According to
// http://blogs.msdn.com/oldnewthing/archive/2005/09/02/459952.aspx, in the
// worst case, it gets the counter from the rollover interrupt on the
// programmable interrupt timer. In best cases, the HAL may conclude that the
// RDTSC counter runs at a constant frequency, then it uses that instead. On
// multiprocessor machines, it will try to verify the values returned from
// RDTSC on each processor are consistent with each other, and apply a handful
// of workarounds for known buggy hardware. In other words, QPC is supposed to
// give consistent results on a multiprocessor computer, but for older CPUs it
// can be unreliable due bugs in BIOS or HAL.
//
// (3) System time. The system time provides a low-resolution (from ~1 to ~15.6
// milliseconds) time stamp but is comparatively less expensive to retrieve and
// more reliable. Time::EnableHighResolutionTimer() and
// Time::ActivateHighResolutionTimer() can be called to alter the resolution of
// this timer; and also other Windows applications can alter it, affecting this
// one.

vstinner commented 8 months ago

My notes on Windows clocks: https://vstinner.readthedocs.io/windows.html#time

zooba commented 8 months ago

Beware of relying on old code comments (I'm assuming that since both links in that comment above are from ~20 years ago that the code is also roughly that old).

QPC in particular has improved/changed behaviour a few times since then.

Personally, I'd trust anything Eryk Sun has posted in the last 3 years over code that was tweaked to perfection based on Windows XP.

vstinner commented 8 months ago

Maybe I would be ok to use QPC for time.monotonic() if we only make the change conditionnal: only on the most recent Windows version, Windows 11. It would avoid any bad surprises on very old Windows versions running on old hardware with less reliable CPU TSC clock source.

zooba commented 8 months ago

Our earliest supported version is Windows 10 now, so we don't have to worry about really old ones.

This page covers the history. Looks like things have been "fine" since Windows 8: https://learn.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps

vstinner commented 7 months ago

I created https://github.com/python/cpython/pull/116781 to use QueryPerformanceCounter() for time.monotonic().

vstinner commented 7 months ago