Task.sleep() sleep too long on Windows

HeMet commented 6 months ago

Description

No response

Reproduction

@main
public struct App {
    public static func main() async throws {
        let clock = ContinuousClock.continuous
        let dur = try await clock.measure {
            try await Task.sleep(for: .seconds(0.1))
        }
        print("Slept for: \(dur)")
    }
}

Output:

Slept for: 64.1733749 seconds

Expected behavior

Task sleep for 0.1 second.

Environment

Swift version 5.10-dev (LLVM 7c3b53ecaad54fb, Swift ebaa8c65f94e6a1) Target: x86_64-unknown-windows-msvc

Additional information

No response

AnthonyLatsis commented 6 months ago

cc @compnerd

compnerd commented 6 months ago

This is interesting.

>swiftc -emit-executable reduced.swift -o reduced.exe -parse-as-library
>.\reduced
Slept for: 4.2784966 seconds

This is definitely over-sleeping even though we would expect some over-sleeping due to windows scheduling.

al45tair commented 3 weeks ago

Concurrency uses libc++'s std::this_thread::sleep_until() function to sleep, which in turn on Windows calls into the Win32 API directly without very much adjustment. In the case of sleep_until(), it actually makes an unnecessary condition variable and mutex, and then uses SleepConditionVariableSRW() to do the actual sleeping, after converting the time to sleep in nanoseconds into a number of milliseconds.

The Win32 API's sleep behaviour is highly unusual; I'm not sure why it would cause this degree of over-sleeping, but to give a brief explanation of what it appears to do:

Internally Win32 uses a 64Hz tick (roughly 16ms per tick), though on some Windows versions, on some machines, this can change depending on things like the multimedia timer settings.
If you ask it to sleep for less than one tick, it seems to run a delay loop(!) I assume this behaviour is there for backwards compatibility (my guess is on 16-bit Windows some games were using Sleep() to do their timing?)
If you ask it to sleep for one tick or more, because of the way it tracks the tick count, it's possible that you will wait for up to one tick less than you expect, depending on where you are in the present tick.

The Swift runtime's own wait code deliberately adds 16ms to every sleep in order to ensure that we wait for at least the specified duration, since that's the behaviour everyone expects (normally on other systems you would never wait for less than the time you asked for, unless a signal is delivered). libc++ doesn't do that, though, and Concurrency is using libc++'s code, not the runtime's code.

Now, even though all of this is very odd, as I say, I don't see why it would result in over sleeping. If the sleep duration was very short, it might, but for longer sleep durations on an un-loaded machine I'd expect under sleeping to occur. It may be that the combination of under-sleeping and the loop in claimNextFromCooperativeGlobalQueue() combine somehow to produce this effect, not sure.

MarSe32m commented 3 weeks ago

I was under the assumption that Windows uses the DispatchGlobalExecutor for the Concurrency Runtime rather than the CooperativeGlobalExecutor.

compnerd commented 3 weeks ago

@al45tair I'm not sure I follow the CV/mutex bit ... do we somehow end up using the implementation from libc++ instead of msvcprt?

At least the MSVCPRT implementation is: https://github.com/microsoft/STL/blob/62205ab155d093e71dd9588a78f02c5396c3c14b/stl/inc/thread#L208-L218 https://github.com/microsoft/STL/blob/62205ab155d093e71dd9588a78f02c5396c3c14b/stl/src/sharedmutex.cpp#L40-L42

        for (;;) {
            const auto _Now = _Clock::now();
            if (_Abs_time <= _Now) {
                return;
            }

            const unsigned long _Rel_ms_count = _Clamped_rel_time_ms_count(_Abs_time - _Now)._Count;
            _Thrd_sleep_for(_Rel_ms_count);
        }

It loops to deal with the spurious wakeup/undersleep case, and then boils down to a Sleep(_Rel_ms_count).

al45tair commented 3 weeks ago

@compnerd Oh, interesting. You're right, we'll be using Microsoft's version. That looks less mad, but it's still built on the Win32 API functions, which are very odd indeed. At least it won't under-sleep though, unlike libc++'s code.

compnerd commented 3 weeks ago

Oh, we can agree that the behaviour of Sleep is ... annoying. But the oversleep is very odd - its an excessively long oversleep. I would've expected an extra quantum of sleep.

al45tair commented 3 weeks ago

I did wonder if somehow we're hitting the delay loop case multiple times. That could cause an over-sleep. But I don't see how it could be as substantial as the numbers we're seeing.

al45tair commented 3 weeks ago

I was under the assumption that Windows uses the DispatchGlobalExecutor for the Concurrency Runtime rather than the CooperativeGlobalExecutor.

Good point. In which case we aren't using Sleep() at all, but instead (I think) we're driving everything from thread pool timers (ala CreateThreadpoolTimer()) and then posting to a completion port when the timer fires. The code for that is in event_windows.c and looks rather complicated. I don't have time to look at it now, but I can well believe there might be a performance gotcha in there.

al45tair commented 1 week ago

Interestingly, with a Swift 6 nightly, I get

C:\Users\Alastair\Source\swift-timer>reduced
Slept for: 0.1604191 seconds

i.e. we asked to sleep for 100ms and actually slept for 160ms.

grynspan commented 1 week ago

I'm having flashbacks to Visual BASIC 6 over here… 160ms smells suspiciously close to 10*60Hz to me.

al45tair commented 1 week ago

OK there are two problems. The first is that Swift is using the wrong clock(!), which on my VM adds 20ms to every sleep, but on real hardware it could potentially add any old value because QueryPerformanceCounter(), which is what the Concurrency runtime is using, can be backed by various different hardware times that are not necessarily synchronised at all with QueryInteruptTime(), which is what Dispatch is using. (rdar://135413803)

The second problem is that the way timers were being run, coupled with Windows' scheduling behaviour, resulted in around 10ms of extra latency. That's a problem in swift-corelibs-dispatch though. (rdar://135413990)

grynspan commented 1 week ago

I previously updated the suspending clock on Windows to use QueryUnbiasedInterruptTimePrecise(), but left the continuous clock alone. Sounds like changing the continuous clock to QueryInterruptTimePrecise() would solve this issue, then?

al45tair commented 1 week ago

Indeed it does. I'll have a PR up shortly for that. I've also made Dispatch timers more accurate, at the cost of a little bit of CPU, while I'm at it.

grynspan commented 1 week ago

Please CC me on the PR then! I look forward to getting rid of our QueryPerformanceCounter() dependency. :)

grynspan commented 1 week ago

Relevant PR where I adopted QueryUnbiasedInterruptTimePrecise(): #63225

swiftlang / swift