Closed HeMet closed 1 week ago
cc @compnerd
This is interesting.
>swiftc -emit-executable reduced.swift -o reduced.exe -parse-as-library
>.\reduced
Slept for: 4.2784966 seconds
This is definitely over-sleeping even though we would expect some over-sleeping due to windows scheduling.
Concurrency uses libc++'s std::this_thread::sleep_until()
function to sleep, which in turn on Windows calls into the Win32 API directly without very much adjustment. In the case of sleep_until()
, it actually makes an unnecessary condition variable and mutex, and then uses SleepConditionVariableSRW()
to do the actual sleeping, after converting the time to sleep in nanoseconds into a number of milliseconds.
The Win32 API's sleep behaviour is highly unusual; I'm not sure why it would cause this degree of over-sleeping, but to give a brief explanation of what it appears to do:
Sleep()
to do their timing?)The Swift runtime's own wait code deliberately adds 16ms to every sleep in order to ensure that we wait for at least the specified duration, since that's the behaviour everyone expects (normally on other systems you would never wait for less than the time you asked for, unless a signal is delivered). libc++ doesn't do that, though, and Concurrency is using libc++'s code, not the runtime's code.
Now, even though all of this is very odd, as I say, I don't see why it would result in over sleeping. If the sleep duration was very short, it might, but for longer sleep durations on an un-loaded machine I'd expect under sleeping to occur. It may be that the combination of under-sleeping and the loop in claimNextFromCooperativeGlobalQueue()
combine somehow to produce this effect, not sure.
I was under the assumption that Windows uses the DispatchGlobalExecutor for the Concurrency Runtime rather than the CooperativeGlobalExecutor.
@al45tair I'm not sure I follow the CV/mutex bit ... do we somehow end up using the implementation from libc++ instead of msvcprt?
At least the MSVCPRT implementation is: https://github.com/microsoft/STL/blob/62205ab155d093e71dd9588a78f02c5396c3c14b/stl/inc/thread#L208-L218 https://github.com/microsoft/STL/blob/62205ab155d093e71dd9588a78f02c5396c3c14b/stl/src/sharedmutex.cpp#L40-L42
for (;;) {
const auto _Now = _Clock::now();
if (_Abs_time <= _Now) {
return;
}
const unsigned long _Rel_ms_count = _Clamped_rel_time_ms_count(_Abs_time - _Now)._Count;
_Thrd_sleep_for(_Rel_ms_count);
}
It loops to deal with the spurious wakeup/undersleep case, and then boils down to a Sleep(_Rel_ms_count)
.
@compnerd Oh, interesting. You're right, we'll be using Microsoft's version. That looks less mad, but it's still built on the Win32 API functions, which are very odd indeed. At least it won't under-sleep though, unlike libc++'s code.
Oh, we can agree that the behaviour of Sleep
is ... annoying. But the oversleep is very odd - its an excessively long oversleep. I would've expected an extra quantum of sleep.
I did wonder if somehow we're hitting the delay loop case multiple times. That could cause an over-sleep. But I don't see how it could be as substantial as the numbers we're seeing.
I was under the assumption that Windows uses the DispatchGlobalExecutor for the Concurrency Runtime rather than the CooperativeGlobalExecutor.
Good point. In which case we aren't using Sleep()
at all, but instead (I think) we're driving everything from thread pool timers (ala CreateThreadpoolTimer()
) and then posting to a completion port when the timer fires. The code for that is in event_windows.c
and looks rather complicated. I don't have time to look at it now, but I can well believe there might be a performance gotcha in there.
Interestingly, with a Swift 6 nightly, I get
C:\Users\Alastair\Source\swift-timer>reduced
Slept for: 0.1604191 seconds
i.e. we asked to sleep for 100ms and actually slept for 160ms.
I'm having flashbacks to Visual BASIC 6 over here… 160ms smells suspiciously close to 10*60Hz to me.
OK there are two problems. The first is that Swift is using the wrong clock(!), which on my VM adds 20ms to every sleep, but on real hardware it could potentially add any old value because QueryPerformanceCounter()
, which is what the Concurrency runtime is using, can be backed by various different hardware times that are not necessarily synchronised at all with QueryInteruptTime()
, which is what Dispatch is using. (rdar://135413803)
The second problem is that the way timers were being run, coupled with Windows' scheduling behaviour, resulted in around 10ms of extra latency. That's a problem in swift-corelibs-dispatch though. (rdar://135413990)
I previously updated the suspending clock on Windows to use QueryUnbiasedInterruptTimePrecise()
, but left the continuous clock alone. Sounds like changing the continuous clock to QueryInterruptTimePrecise()
would solve this issue, then?
Indeed it does. I'll have a PR up shortly for that. I've also made Dispatch timers more accurate, at the cost of a little bit of CPU, while I'm at it.
Please CC me on the PR then! I look forward to getting rid of our QueryPerformanceCounter()
dependency. :)
Relevant PR where I adopted QueryUnbiasedInterruptTimePrecise()
: #63225
Description
No response
Reproduction
Output:
Expected behavior
Task sleep for 0.1 second.
Environment
Swift version 5.10-dev (LLVM 7c3b53ecaad54fb, Swift ebaa8c65f94e6a1) Target: x86_64-unknown-windows-msvc
Additional information
No response