Open maurer opened 3 years ago
We currently use QueryPerformanceCounter
to implement Instant
on Windows. Microsoft says that this ticks through suspend, so currently the Unix-like and Windows Instant
have differing semantics.
which are all negligible compared to the cost of servicing a system call.
Note that on linux getting time usually does not involve a system call overhead at all and is dispatched via vdso instead, basically a function call.
tokio uses Instant to govern its sleep and timeout functionality. Given that tokio is commonly used to implement network servers and clients, slicing out suspended time was likely not their intention.
That argument doesn't sound strong. If you sleep for a minute, suspend, wake up and have to wait another 30 seconds for the next service update that doesn't seem so bad? And Instant
s don't promise wall-time anyway.
Note, however, that instants are not guaranteed to be steady. In other words, each tick of the underlying clock may not be the same length (e.g. some seconds may be longer than others). An instant may jump forwards or experience time dilation (slow down or speed up), but it will never go backwards.
So if you expect any relation to wall-time then SystemTime
probably is a better choice than Instant
.
I don't see a big technical issue with switching to CLOCK_BOOTTIME
. But there seems something wrong with the expectation of wall-time when the API explicitly promises no such thing. Maybe we shouldn't encourage such misconceptions?
SystemTime is not suitable for measuring time, it is allowed to get synchronized and thus can jump around a lot. If the syscall cost is too much then maybe there can be an additional less expensive version of Instant::now? Otherwise I don't see that as much of a good reason not to correctly handle the OS being suspended. While there's not a lot of guarantees about the behavior of Instant, it is the de facto way to measure time in Rust. And while Rust may not be able to guarantee much, it should still strive to keep it as accurate as possible if the OS indeed has more accurate APIs it can use.
If the syscall cost is too much then maybe there can be an additional less expensive version of Instant::now?
At least on current x86_64 linux with a TSC clock source neither SystemTime::now
nor Instant::now
will incur syscall overheads, both are served via vdso.
Historically windows APIs or CPUs with unreliable TSCs were the ones that incurred higher overheads
If you mean adding a new API that includes times including suspends then yes, that would be nice, but currently that's not portable to all supported systems as discussed in #87907
So it would have to be gated for specific target or return a Result
to indicate it's not supported.
SystemTime is not suitable for measuring time, it is allowed to get synchronized and thus can jump around a lot.
Well, Instants
can also jump a lot, just forwards.
While there's not a lot of guarantees about the behavior of Instant, it is the de facto way to measure time in Rust. And while Rust may not be able to guarantee much, it should still strive to keep it as accurate as possible if the OS indeed has more accurate APIs it can use.
The more reliable it becomes on some systems the more people start relying on implementation details which then break when the program is shipped to a system where those details don't hold. That's a portability hazard.
API contracts exist for a reason.
If the syscall cost is too much then maybe there can be an additional less expensive version of Instant::now?
At least on current x86_64 linux with a TSC clock source neither
SystemTime::now
norInstant::now
will incur syscall overheads, both are served via vdso.
On systems with VDSO I still don't think this will be a meaningful overhead. If you think it might be too much we can try benchmarking it.
Historically windows APIs or CPUs with unreliable TSCs were the ones that incurred higher overheads
If you mean adding a new API that includes times including suspends then yes, that would be nice, but currently that's not portable to all supported systems as discussed in #87907 So it would have to be gated for specific target or return a
Result
to indicate it's not supported.SystemTime is not suitable for measuring time, it is allowed to get synchronized and thus can jump around a lot.
Well,
Instants
can also jump a lot, just forwards.While there's not a lot of guarantees about the behavior of Instant, it is the de facto way to measure time in Rust. And while Rust may not be able to guarantee much, it should still strive to keep it as accurate as possible if the OS indeed has more accurate APIs it can use.
The more reliable it becomes on some systems the more people start relying on implementation details which then break when the program is shipped to a system where those details don't hold. That's a portability hazard.
API contracts exist for a reason.
What we have right now is a portability hazard. The Windows Instant
ticks in suspend... if you have a new version of Windows, otherwise it doesn't. The Linux Instant
doesn't, but they've tried to update it to do so in the past, and might try again if they fix systemd (since CLOCK_MONOTONIC
doesn't specify it doesn't tick in suspend). FreeBSD can't tick during suspend at the moment, but they added a CLOCK_UPTIME
alias for CLOCK_MONOTONIC
because they might make CLOCK_MONOTONIC
do something else in the future.
I'd argue:
Instant::ticks_in_suspend() -> bool
which allows a program to check this property if they're depending on it. Answering that question might require it to know what version of an OS it's running on, but if that detection is only occurring once it doesn't seem like an issue.On systems with VDSO I still don't think this will be a meaningful overhead.
I wrote neither [...] nor [...] will incur syscall overhead. That's a negation, in other words I already called it low-overhead. Where is the disagreement?
What we have right now is a portability hazard.
I disagree, I see it as the difference between an easy to notice stumbling block that will be taken into account by major libraries because they encounter it on some major platforms vs. a subtle issue which will cause breakage on less popular platforms because major libraries didn't consider the case since they were developed on the major platforms.
Being loudly inconsistent means nobody can rely on that particular aspect. Being mostly consistent means people will rely on it.
If we're worried about portability, we could add a Instant::ticks_in_suspend() -> bool which allows a program to check this property if they're depending on it.
The question is would that help programs? Do we have any particular error reports that would be fixed by using boot time? How do the current workarounds look like?
On systems with VDSO I still don't think this will be a meaningful overhead.
I wrote neither [...] nor [...] will incur syscall overhead. That's a negation, in other words I already called it low-overhead. Where is the disagreement?
No disagreement, sorry, I misread your statement.
What we have right now is a portability hazard.
I disagree, I see it as the difference between an easy to notice stumbling block that will be taken into account by major libraries because they encounter it on some major platforms vs. a subtle issue which will cause breakage on less popular platforms because major libraries didn't consider the case since they were developed on the major platforms.
Being loudly inconsistent means nobody can rely on that particular aspect. Being mostly consistent means people will rely on it.
If we're worried about portability, we could add a Instant::ticks_in_suspend() -> bool which allows a program to check this property if they're depending on it.
The question is would that help programs? Do we have any particular error reports that would be fixed by using boot time? How do the current workarounds look like?
Here's a workaround for using quiche
+ tokio
with a timeout on Android.
If ticks_in_suspend
returned false, and the application needed to tolerate suspend, I don't think it could fix the problem. The most it could do would be to log a warning saying that behavior will be unpredictable around suspend.
@rustbot claim
I can see a problem with picking either solution and not indicating clearly which one it is. For example you wouldn't possibly want a benchmarking tool after a couple of days of sleep to say that maximum/p99/etc iteration took… days. Similar logic can be applied to e.g. games, physics engines, etc.
On the other hand I also see use-cases for the time that does elapse in suspend (as described in the issue report).
Instant
doing different things depending on the platform is clearly a bug.
Posting in the interest of centralizing discussion.
To summarize #87907: it was uncertain whether Linux's CLOCK_MONOTONIC
behavior is the "normal" one across platforms. In discussion we found it might be the case that it was the deviation, as it was found that OpenBSD has CLOCK_MONOTONIC
count time through suspends, although that platform also has introduced CLOCK_BOOTTIME
in the same way Linux has, both of which embed a formal promise to preserve counting time across suspends. We have begun to suspect that the *BSDs may have a similar behavior to OpenBSD here.
As a result, there was an inclination to use CLOCK_BOOTTIME
where available, and to eventually someday bump the Linux kernel minimum far enough that this bug is fully resolved with Rust on Linux. However, we cannot do so immediately because it will require coordination, so we must suffer the bug for at least a little while either way. @joshtriplett suggested the following strategy:
Always fall back from CLOCK_BOOTTIME to CLOCK_MONOTONIC on Linux if we get EINVAL. (Optionally omit the fallback code on Linux targets whose minimum kernel is already more than 2.6.39, which as far as I can tell means every non-x86 target.)
For #88714, I will quote @CryZe:
Alright, so the situation is the following:
- All the resources I found seem to very roughly indicate that except for Linux
CLOCK_MONOTONIC
does actually include suspend on basically all / most of (?) the operating systems.- I have tried to verify this since then, but had a lot of trouble setting up FreeBSD for example (it refused to ever wake up again after suspending)
- Therefore I am not super confident that the claim is actually true.
- However if anything macOS would also need to switch over to
CLOCK_MONOTONIC
as it currently uses it's own implementation in std. This function that is called is: not recommended by the documentation, has a lower limit (u32 + u32 as opposed to u64 + u32) and doesn't include suspends.So yeah, that's why I didn't open this PR yet.
And @thomcc says:
Essentially all the clock APIs on Darwin other than
mach_absolute_time
were added in Darwin 16 (macOS 10.12, iOS 10.0, ...). This includesclock_gettime
,clock_gettime_nsec_np
,mach_continuous_time
,CLOCK_UPTIME_RAW
and all the otherCLOCK_*
constants...We support back to macOS 10.6 or 10.7, which is considerably older. Darwin 16 is only about 4 or 5 years old, and I think it would be a problem for libstd to require this.
However, we will want to fix the Unix Epoch problem for Apple platforms eventually, as time is quite literally running out (even if we still have some), and the aarch64-apple-darwin
platform is Big Sur or later of course.
However, we will want to fix the Unix Epoch problem for Apple platforms eventually
For what it's worth, the API Instant
uses doesn't suffer from the Unix Epoch problem:
mach_absolute_time
returns a u64 representing "mach ticks", which are an unspecified time unit that you can relate to nanoseconds using mach_timebase_info
.
mach_timebase_info
is a struct with {number: u32, denom: u32}
and is a fraction describing the ratio between "mach ticks" and nanoseconds via nanos = (ticks * timebase.numer) / timebase.denom
, or similar.
{numer: 1, denom: 1}
. Equivalently: on these machines "mach ticks" and nanoseconds are currently the same.This is not true on my M1, where the ratio seems to be 125 / 3
, and I don't believe it's generally true on non-macOS Darwin machines.
(I'm not saying we can or should rely on this; it just helps give context for some of this, IMO)
64-bit nanoseconds won't run out for an astronomically long time, so this is fine. It also isn't a benefit over clock_gettime_nsec_np
, which also returns a u64
(but with a more flexible and slightly nicer API)
This is speculation, but I'm pretty sure1 the only reason mach_absolute_time
is being discouraged is that they intend to change the timebase (or are anticipating Intel machines becoming obsolete), and want all the broken macOS
code that assumes they can use it as nanoseconds (very common, since historically that has worked) to move to an API which is actually designed to return that directly (hence the existence of clock_gettime_nsec_np
).
Regardless, I don't think we'd have to worry about the Unix epoch problem for Instant, since detecting the wrap is completely doable (we'd just have to assume deltas > half the type range are from wraparound when converting instants to Durations, not that we need to do this).
[1]: Concretely, Apple recently used it in some extremely shiny new timing-related APIs they're pushing.
The os_workgroup
API was just added (darwin 20/macOS 11.0, e.g. very new) and uses them exclusively. A large part of the point of this API is to allow realtime threads (just audio at the moment) to tell the OS scheduler about their deadlines, in units of "mach ticks", based on mach_absolute_time
+ mach_timebase_info
.
Anyway, the developer.apple.com website said something like this before about os_unfair_lock
(and instead to use GCD/libdispatch), despite also pushing it in other documentation as a replacement for OSSpinLock
. (At the time, I was told that that comment was aimed at Swift developers).
So, I think it's just discouraged for users who don't have a reason to use the that API, but I don't think that it has been deprecated.
Ah, I see then. For some reason I had concluded the margin was much narrower there when I made my first attempt to understand the time code. Well, that is perfectly understandable and not at all confusing. Ahem. It looks like "mach ticks" may also be related to the kernel's notion of abstime
, which depends on hardware variables. It would make sense it was one way on Intel and another on the M1. It also looks like this may become even more Fun in Rosetta-translated environments, judging by Apple's ominous hinting: https://developer.apple.com/documentation/apple-silicon/addressing-architectural-differences-in-your-macos-code
It seems best for Instant to use clock_gettime_nsec_np
when we are compiling for a target that defines it, since we won't have to do conversion math.
Can we have two clock types? One measures duration that includes the suspension time, while the other one does not. These two clock types should behave consistently across different platforms. Also, if the host OS does not support a certain clock type, we can return an error at runtime.
Yeah that seems to be the ideal solution.
That is, indeed, likely going to be the case. Most of the question that this issue, and the PRs branching off of it, pertains to is which Instant
is going to be normalizing into. It's a type already used in a lot of code, so ideally we would not simply deprecate it.
And we can certainly attempt to statically guarantee things like "what does the host support?", as we know what the host is at compile time.
FWIW right now the current situation actually seems to be consistent between Linux and macOS -- both use clocks that do not advance when the system is asleep. (Based on this for macOS, where we are using CLOCK_UPTIME_RAW or mach_absolute_time depending on the architecture, but both seem to behave the same.)
So if this is still accurate, only Windows uses a clock that keeps going during system sleep. https://github.com/rust-lang/rust/pull/88714 would switch Linux over, leaving macOS (and of course all the other targets...)
(I am bringing this up because the description in https://github.com/rust-lang/rust/pull/88714 claims that Linux is the odd one out, but I don't see that supported by the docs.)
So I just read through most of #87907 and #88714, and it seems like there's some consensus that a "monotonic clock that includes time suspended" and a "monotonic clock that does not include time suspended" have individual usecases. But I couldn't find a conclusive statement on what steps could be taken to support both in std
.
I did a quick sweep of the platforms that have been previously mentioned. Windows is the only one missing an easy choice for "monotonic time not including time suspended", making it the odd one out among Tier 1 platforms. FreeBSD is interesting because it seemingly has three names for the same clock.
QueryPerformanceCounter
includes time suspended.QueryUnbiasedInterruptTimePrecise
does not include time suspended. (First appeared in Windows 10.)CLOCK_MONOTONIC
does not include time suspended.CLOCK_BOOTTIME
includes time suspended. (First appeared in kernel 2.6.39.)CLOCK_MONOTONIC
includes time suspended.CLOCK_UPTIME_RAW
does not include time suspended. (First appeared in Darwin 16, so macOS 10.12 and iOS 10.)CLOCK_UPTIME_RAW
clock ID is what's new. Its value is equivalent to mach_absolute_time
in earlier versions.CLOCK_MONOTONIC
does not include time suspended (I'm not sure, but that has been said before).CLOCK_UPTIME
is equivalent to CLOCK_MONOTONIC
.CLOCK_BOOTIME
is an alias of CLOCK_UPTIME
.CLOCK_MONOTONIC
includes time suspended.CLOCK_UPTIME
does not include time suspended.CLOCK_BOOTTIME
is equivalent to of CLOCK_MONOTONIC
.Note: The _RAW
/_FAST
/_COARSE
variants for clock_gettime
are unaffected by NTP-related frequency adjustments (slewing).
If Rust can eventually bump the minimum macOS version from 10.7 to 10.12, then Windows would be the only Tier 1 holdout. I figure bumping the minimum from Windows 7 to Windows 10 is pretty much a no-go.
However, I did come across an alternative in https://github.com/python-trio/trio/issues/1586 that sounds promising and wouldn't force a minimum version bump. Apparently, Windows has a KUSER_SHARED_DATA
struct that stores (encoded) information about the time spent suspended, which you can subtract from the QPC value.
What solution would the Rust team like to see? Can we just make Instant
fill one of the two roles and add another type to std::time
to fill the other?
One option is to make it generic with a default Instant<InstantSource=OsDefault> where InstantSource: ClockSource + Monotonic
and then add ways to get additional clock sources, which might fail if they're not available on specific OS versions. Similar to what various collections do for allocators.
Then the default behavior could be left unspecified, whatever is available, and users who have more specific needs could try to obtain more specific clocks.
Hmm, I hadn't thought of that one. I don't know if there are other clock sources that need to be supported, but I suppose there may be more platforms like FreeBSD that already can't do both.
Overall, I don't really have an opinion on the API. I mostly suggested a new type because SystemTime
and Instant
are separate types and Instant
already has a disclaimer that the underlying system call could change. It seemed like requiring it to be one of the two monotonic clocks wouldn't count as a breaking change (especially given it's already inconsistent between Windows and other platforms).
Tangentially-related, I think Instant
could stand to be a bit more strict. The docs also say "The size of an Instant
struct may vary depending on the target operating system," but I think that's a bit too loose. In https://github.com/rust-lang/rust/pull/103594, one platform's Instant
struct had a lower resolution than Duration
(which has nanosecond resolution on all platforms), which meant addition and subtraction were non-associative with small enough Duration
values.
I'll try to confirm if the Windows workaround actually works in the meantime. (edit: I tried the workaround in Rust and it does seem to remove the time suspended, so that's a nice fallback solution for Windows 7 and 8.)
I've published boot-time crate that provides Instant
struct based on CLOCK_BOOTTIME
on platforms that support it.
I just noticed that the minimum supported macOS version was raised to 10.12 and the minimum supported Windows version will be raised to Windows 10 in the upcoming 1.78 release, so all of the Tier 1 platforms will have "suspend-aware" and "not suspend-aware" monotonic clocks.
Is there any other major hurdle (aside from RFC process) before we could add support for both types in std
?
Freebsd still does not account for suspend. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236702
The other question is that even if all systems have some clock that can account for suspend whether those clocks can be made consistent with sleep/wait APIs that take deadlines. #113752
It seems the intent of the current Instant
is to measure time that does not advance during suspend (at least that's what the majority of our tier 1 platforms do), so step one would be to switch Windows over to that (if it hasn't already happened), and then maybe even state in the docs that on tier 1 platforms we guarantee that system suspends do not count as elapsed time. (And it seems the FreeBSD clock also behaves that way so maybe we can even guarantee this more widely?)
Adding support for measuring time where system suspends do count as elapsed time would be a new API, at that point one would probably want to talk about APIs that support choosing a clock. That can be explored as a regular crate before starting discussions with t-libs-api about including it in std. The barrier for std inclusion is pretty high these days (higher than it was in Rust 1.0).
There is a grand total of 3 tier 1 OSes, so I think looking at "the majority" is a weak argument. I also don't think we should be promising anything about the clock Instant
uses other than it's consistent with whatever would be considered the default choice for the platform.
If we want to provide guaranteed clocks then I agree that should be a new API. E.g. allowing specifying the clock with Instant<Clock>
or some other API.
It occurs to me that while both Linux kernel and Go tried and failed to move their default clocks to boot time, their clock objects are both more powerful than what std::time::Instant
can currently do (on stable). So we might have an opportunity to do this when they could not.
The only thing Instant
supports is comparison and arithmetic (with Duration); we have still not stabilized sleep_until or recv_deadline. So the use cases that prevented Linux and Go from changing their clocks, like watchdog timers, don't apply to std::time::Instant today.
It's arguable that because subtracting two Instants gets you a Duration, any operation which takes a Duration should behave in the same way as Instant wrt to ticking during suspend. Changing that behavior would likely run into the aforementioned compatibility issues. I don't think I buy that, however: We can simply think of Duration as a difference between two instants in some monotonic clock. It strikes me as very rare, and probably a bug, that a duration used for sleeps would be the same duration used for arithmetic on a std::time::Instant – if you have a counterexample, please provide it!
If we want to provide guaranteed clocks then I agree that should be a new API. E.g. allowing specifying the clock with
Instant<Clock>
or some other API.
It seems difficult to provide guarantees like this across all platforms, but I think we should try to align the defaults as well as we can with user expectations and correct program behavior. https://github.com/tokio-rs/tokio/issues/3185 and #71860 demonstrate some cases where we have not done that.
One instance that comes to mind of the pattern you're asking for is fixed rate iteration. You take an instant, do some compute, take another instant, compute the elapsed time, then sleep for the desired period minus the elapsed time. Then duration calculation from instants factors directly into sleep time.
If a platform supports absolute timeouts (like Linux+iouring) then it's possible not to calculate elapsed time. Program calculates the next absolute timestamp using current time and gets its ceil with the desired granularity period.
Absolute timestamps are more convenient in case there are several timeout sources and program waits for the nearest one.
Instead of sleeps with relative duration it's possible to provide more advanced API with relative or absolute timeouts, choice of clocks (in the best effort manner).
Previously I've experimented with completion-based IO driver with relative timeouts.
On Linux the driver uses io_uring
's Timeout
operation and allows to wait till the next operation completion as an alternative to system call timeout.
On Windows and BSD-like systems it maintains a timer wheel, calculates timeout duration and explicitly provides it in system calls.
In case anybody is looking for suspend-aware (on supported platforms) Instant
wrapper I've published boottime crate.
It strikes me as very rare, and probably a bug, that a duration used for sleeps would be the same duration used for arithmetic on a std::time::Instant – if you have a counterexample, please provide it!
Work pacing that mixes active (measured via instant differences) and forced sleep times and doesn't do a separate measurement across the sleep to ensure both are from the same clock source.
Instant::now
currently usesCLOCK_MONOTONIC
, which stops ticking during suspend. This means that creating aDuration
viaInstant::duration_since
will be notably shorter than expected if the system suspended between the&self
instance and theearlier
argument.If
CLOCK_BOOTTIME
is used instead, the clock is guaranteed to continue ticking during suspend. This closer matches the documentation and real world usage ofInstant
. The documentation forInstant::duration_since
reads "Returns the amount of time elapsed from another instant to this one." - no mention of an exception for suspend or other low power states.tokio
usesInstant
to govern its sleep and timeout functionality. Given thattokio
is commonly used to implement network servers and clients, slicing out suspended time was likely not their intention.Using
CLOCK_BOOTTIME
should have minimal impact on existing usage. Performance-wise,CLOCK_BOOTTIME
is implemented in Linux by getting theCLOCK_MONOTONIC
time and adding an offset to it. That offset is updated as part of exiting suspend. As a result, the total cost should be an extra two loads , two adds, and maybe two stores with bad optimization, which are all negligible compared to the cost of servicing a system call. In terms of existing applications, the only ones which will see new behavior is ones which are measuring time across suspends. The only use case I can think of for wanting the oldCLOCK_MONOTONIC
behavior is time-multiplexing system resources, but I am not aware of an actual user.Other projects on the same issue
CLOCK_BOOTTIME
due toCLOCK_MONOTONIC
to act likeCLOCK_BOOTTIME
since it was almost always the desired behavior, and technically spec compatible, but reverted it shortly afterwards due to compatibility issues.