rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.24k stars 12.71k forks source link

`Instant::duration_since` returns a shortened duration if a suspend occurred #87906

Open maurer opened 3 years ago

maurer commented 3 years ago

Instant::now currently uses CLOCK_MONOTONIC, which stops ticking during suspend. This means that creating a Duration via Instant::duration_since will be notably shorter than expected if the system suspended between the &self instance and the earlier argument.

If CLOCK_BOOTTIME is used instead, the clock is guaranteed to continue ticking during suspend. This closer matches the documentation and real world usage of Instant. The documentation for Instant::duration_since reads "Returns the amount of time elapsed from another instant to this one." - no mention of an exception for suspend or other low power states. tokio uses Instant to govern its sleep and timeout functionality. Given that tokio is commonly used to implement network servers and clients, slicing out suspended time was likely not their intention.

Using CLOCK_BOOTTIME should have minimal impact on existing usage. Performance-wise, CLOCK_BOOTTIME is implemented in Linux by getting the CLOCK_MONOTONIC time and adding an offset to it. That offset is updated as part of exiting suspend. As a result, the total cost should be an extra two loads , two adds, and maybe two stores with bad optimization, which are all negligible compared to the cost of servicing a system call. In terms of existing applications, the only ones which will see new behavior is ones which are measuring time across suspends. The only use case I can think of for wanting the old CLOCK_MONOTONIC behavior is time-multiplexing system resources, but I am not aware of an actual user.

Other projects on the same issue

maurer commented 3 years ago

We currently use QueryPerformanceCounter to implement Instant on Windows. Microsoft says that this ticks through suspend, so currently the Unix-like and Windows Instant have differing semantics.

the8472 commented 3 years ago

which are all negligible compared to the cost of servicing a system call.

Note that on linux getting time usually does not involve a system call overhead at all and is dispatched via vdso instead, basically a function call.

tokio uses Instant to govern its sleep and timeout functionality. Given that tokio is commonly used to implement network servers and clients, slicing out suspended time was likely not their intention.

That argument doesn't sound strong. If you sleep for a minute, suspend, wake up and have to wait another 30 seconds for the next service update that doesn't seem so bad? And Instants don't promise wall-time anyway.

Note, however, that instants are not guaranteed to be steady. In other words, each tick of the underlying clock may not be the same length (e.g. some seconds may be longer than others). An instant may jump forwards or experience time dilation (slow down or speed up), but it will never go backwards.

So if you expect any relation to wall-time then SystemTime probably is a better choice than Instant.

I don't see a big technical issue with switching to CLOCK_BOOTTIME. But there seems something wrong with the expectation of wall-time when the API explicitly promises no such thing. Maybe we shouldn't encourage such misconceptions?

CryZe commented 3 years ago

SystemTime is not suitable for measuring time, it is allowed to get synchronized and thus can jump around a lot. If the syscall cost is too much then maybe there can be an additional less expensive version of Instant::now? Otherwise I don't see that as much of a good reason not to correctly handle the OS being suspended. While there's not a lot of guarantees about the behavior of Instant, it is the de facto way to measure time in Rust. And while Rust may not be able to guarantee much, it should still strive to keep it as accurate as possible if the OS indeed has more accurate APIs it can use.

the8472 commented 3 years ago

If the syscall cost is too much then maybe there can be an additional less expensive version of Instant::now?

At least on current x86_64 linux with a TSC clock source neither SystemTime::now nor Instant::now will incur syscall overheads, both are served via vdso.

Historically windows APIs or CPUs with unreliable TSCs were the ones that incurred higher overheads

If you mean adding a new API that includes times including suspends then yes, that would be nice, but currently that's not portable to all supported systems as discussed in #87907 So it would have to be gated for specific target or return a Result to indicate it's not supported.

SystemTime is not suitable for measuring time, it is allowed to get synchronized and thus can jump around a lot.

Well, Instants can also jump a lot, just forwards.

While there's not a lot of guarantees about the behavior of Instant, it is the de facto way to measure time in Rust. And while Rust may not be able to guarantee much, it should still strive to keep it as accurate as possible if the OS indeed has more accurate APIs it can use.

The more reliable it becomes on some systems the more people start relying on implementation details which then break when the program is shipped to a system where those details don't hold. That's a portability hazard.

API contracts exist for a reason.

maurer commented 3 years ago

If the syscall cost is too much then maybe there can be an additional less expensive version of Instant::now?

At least on current x86_64 linux with a TSC clock source neither SystemTime::now nor Instant::now will incur syscall overheads, both are served via vdso.

On systems with VDSO I still don't think this will be a meaningful overhead. If you think it might be too much we can try benchmarking it.

Historically windows APIs or CPUs with unreliable TSCs were the ones that incurred higher overheads

If you mean adding a new API that includes times including suspends then yes, that would be nice, but currently that's not portable to all supported systems as discussed in #87907 So it would have to be gated for specific target or return a Result to indicate it's not supported.

SystemTime is not suitable for measuring time, it is allowed to get synchronized and thus can jump around a lot.

Well, Instants can also jump a lot, just forwards.

While there's not a lot of guarantees about the behavior of Instant, it is the de facto way to measure time in Rust. And while Rust may not be able to guarantee much, it should still strive to keep it as accurate as possible if the OS indeed has more accurate APIs it can use.

The more reliable it becomes on some systems the more people start relying on implementation details which then break when the program is shipped to a system where those details don't hold. That's a portability hazard.

API contracts exist for a reason.

What we have right now is a portability hazard. The Windows Instant ticks in suspend... if you have a new version of Windows, otherwise it doesn't. The Linux Instant doesn't, but they've tried to update it to do so in the past, and might try again if they fix systemd (since CLOCK_MONOTONIC doesn't specify it doesn't tick in suspend). FreeBSD can't tick during suspend at the moment, but they added a CLOCK_UPTIME alias for CLOCK_MONOTONIC because they might make CLOCK_MONOTONIC do something else in the future.

I'd argue:

the8472 commented 3 years ago

On systems with VDSO I still don't think this will be a meaningful overhead.

I wrote neither [...] nor [...] will incur syscall overhead. That's a negation, in other words I already called it low-overhead. Where is the disagreement?

What we have right now is a portability hazard.

I disagree, I see it as the difference between an easy to notice stumbling block that will be taken into account by major libraries because they encounter it on some major platforms vs. a subtle issue which will cause breakage on less popular platforms because major libraries didn't consider the case since they were developed on the major platforms.

Being loudly inconsistent means nobody can rely on that particular aspect. Being mostly consistent means people will rely on it.

If we're worried about portability, we could add a Instant::ticks_in_suspend() -> bool which allows a program to check this property if they're depending on it.

The question is would that help programs? Do we have any particular error reports that would be fixed by using boot time? How do the current workarounds look like?

maurer commented 3 years ago

On systems with VDSO I still don't think this will be a meaningful overhead.

I wrote neither [...] nor [...] will incur syscall overhead. That's a negation, in other words I already called it low-overhead. Where is the disagreement?

No disagreement, sorry, I misread your statement.

What we have right now is a portability hazard.

I disagree, I see it as the difference between an easy to notice stumbling block that will be taken into account by major libraries because they encounter it on some major platforms vs. a subtle issue which will cause breakage on less popular platforms because major libraries didn't consider the case since they were developed on the major platforms.

Being loudly inconsistent means nobody can rely on that particular aspect. Being mostly consistent means people will rely on it.

If we're worried about portability, we could add a Instant::ticks_in_suspend() -> bool which allows a program to check this property if they're depending on it.

The question is would that help programs? Do we have any particular error reports that would be fixed by using boot time? How do the current workarounds look like?

Here's a workaround for using quiche + tokio with a timeout on Android.

If ticks_in_suspend returned false, and the application needed to tolerate suspend, I don't think it could fix the problem. The most it could do would be to log a warning saying that behavior will be unpredictable around suspend.

hellow554 commented 3 years ago

@rustbot claim

nagisa commented 3 years ago

I can see a problem with picking either solution and not indicating clearly which one it is. For example you wouldn't possibly want a benchmarking tool after a couple of days of sleep to say that maximum/p99/etc iteration took… days. Similar logic can be applied to e.g. games, physics engines, etc.

On the other hand I also see use-cases for the time that does elapse in suspend (as described in the issue report).

Instant doing different things depending on the platform is clearly a bug.

workingjubilee commented 3 years ago

Posting in the interest of centralizing discussion.

To summarize #87907: it was uncertain whether Linux's CLOCK_MONOTONIC behavior is the "normal" one across platforms. In discussion we found it might be the case that it was the deviation, as it was found that OpenBSD has CLOCK_MONOTONIC count time through suspends, although that platform also has introduced CLOCK_BOOTTIME in the same way Linux has, both of which embed a formal promise to preserve counting time across suspends. We have begun to suspect that the *BSDs may have a similar behavior to OpenBSD here.

As a result, there was an inclination to use CLOCK_BOOTTIME where available, and to eventually someday bump the Linux kernel minimum far enough that this bug is fully resolved with Rust on Linux. However, we cannot do so immediately because it will require coordination, so we must suffer the bug for at least a little while either way. @joshtriplett suggested the following strategy:

Always fall back from CLOCK_BOOTTIME to CLOCK_MONOTONIC on Linux if we get EINVAL. (Optionally omit the fallback code on Linux targets whose minimum kernel is already more than 2.6.39, which as far as I can tell means every non-x86 target.)

For #88714, I will quote @CryZe:

Alright, so the situation is the following:

  • All the resources I found seem to very roughly indicate that except for Linux CLOCK_MONOTONIC does actually include suspend on basically all / most of (?) the operating systems.
  • I have tried to verify this since then, but had a lot of trouble setting up FreeBSD for example (it refused to ever wake up again after suspending)
  • Therefore I am not super confident that the claim is actually true.
  • However if anything macOS would also need to switch over to CLOCK_MONOTONIC as it currently uses it's own implementation in std. This function that is called is: not recommended by the documentation, has a lower limit (u32 + u32 as opposed to u64 + u32) and doesn't include suspends.

So yeah, that's why I didn't open this PR yet.

And @thomcc says:

Essentially all the clock APIs on Darwin other than mach_absolute_time were added in Darwin 16 (macOS 10.12, iOS 10.0, ...). This includes clock_gettime, clock_gettime_nsec_np, mach_continuous_time, CLOCK_UPTIME_RAW and all the other CLOCK_* constants...

We support back to macOS 10.6 or 10.7, which is considerably older. Darwin 16 is only about 4 or 5 years old, and I think it would be a problem for libstd to require this.

However, we will want to fix the Unix Epoch problem for Apple platforms eventually, as time is quite literally running out (even if we still have some), and the aarch64-apple-darwin platform is Big Sur or later of course.

thomcc commented 3 years ago

However, we will want to fix the Unix Epoch problem for Apple platforms eventually

For what it's worth, the API Instant uses doesn't suffer from the Unix Epoch problem:

64-bit nanoseconds won't run out for an astronomically long time, so this is fine. It also isn't a benefit over clock_gettime_nsec_np, which also returns a u64 (but with a more flexible and slightly nicer API)


This is speculation, but I'm pretty sure1 the only reason mach_absolute_time is being discouraged is that they intend to change the timebase (or are anticipating Intel machines becoming obsolete), and want all the broken macOS code that assumes they can use it as nanoseconds (very common, since historically that has worked) to move to an API which is actually designed to return that directly (hence the existence of clock_gettime_nsec_np).

Regardless, I don't think we'd have to worry about the Unix epoch problem for Instant, since detecting the wrap is completely doable (we'd just have to assume deltas > half the type range are from wraparound when converting instants to Durations, not that we need to do this).


[1]: Concretely, Apple recently used it in some extremely shiny new timing-related APIs they're pushing.

The os_workgroup API was just added (darwin 20/macOS 11.0, e.g. very new) and uses them exclusively. A large part of the point of this API is to allow realtime threads (just audio at the moment) to tell the OS scheduler about their deadlines, in units of "mach ticks", based on mach_absolute_time + mach_timebase_info.

Anyway, the developer.apple.com website said something like this before about os_unfair_lock (and instead to use GCD/libdispatch), despite also pushing it in other documentation as a replacement for OSSpinLock. (At the time, I was told that that comment was aimed at Swift developers).

So, I think it's just discouraged for users who don't have a reason to use the that API, but I don't think that it has been deprecated.

workingjubilee commented 3 years ago

Ah, I see then. For some reason I had concluded the margin was much narrower there when I made my first attempt to understand the time code. Well, that is perfectly understandable and not at all confusing. Ahem. It looks like "mach ticks" may also be related to the kernel's notion of abstime, which depends on hardware variables. It would make sense it was one way on Intel and another on the M1. It also looks like this may become even more Fun in Rosetta-translated environments, judging by Apple's ominous hinting: https://developer.apple.com/documentation/apple-silicon/addressing-architectural-differences-in-your-macos-code

It seems best for Instant to use clock_gettime_nsec_np when we are compiling for a target that defines it, since we won't have to do conversion math.

EFanZh commented 3 years ago

Can we have two clock types? One measures duration that includes the suspension time, while the other one does not. These two clock types should behave consistently across different platforms. Also, if the host OS does not support a certain clock type, we can return an error at runtime.

CryZe commented 3 years ago

Yeah that seems to be the ideal solution.

workingjubilee commented 3 years ago

That is, indeed, likely going to be the case. Most of the question that this issue, and the PRs branching off of it, pertains to is which Instant is going to be normalizing into. It's a type already used in a lot of code, so ideally we would not simply deprecate it.

And we can certainly attempt to statically guarantee things like "what does the host support?", as we know what the host is at compile time.

RalfJung commented 1 year ago

FWIW right now the current situation actually seems to be consistent between Linux and macOS -- both use clocks that do not advance when the system is asleep. (Based on this for macOS, where we are using CLOCK_UPTIME_RAW or mach_absolute_time depending on the architecture, but both seem to behave the same.)

So if this is still accurate, only Windows uses a clock that keeps going during system sleep. https://github.com/rust-lang/rust/pull/88714 would switch Linux over, leaving macOS (and of course all the other targets...)

(I am bringing this up because the description in https://github.com/rust-lang/rust/pull/88714 claims that Linux is the odd one out, but I don't see that supported by the docs.)

maniwani commented 1 year ago

So I just read through most of #87907 and #88714, and it seems like there's some consensus that a "monotonic clock that includes time suspended" and a "monotonic clock that does not include time suspended" have individual usecases. But I couldn't find a conclusive statement on what steps could be taken to support both in std.


I did a quick sweep of the platforms that have been previously mentioned. Windows is the only one missing an easy choice for "monotonic time not including time suspended", making it the odd one out among Tier 1 platforms. FreeBSD is interesting because it seemingly has three names for the same clock.

Note: The _RAW/_FAST/_COARSE variants for clock_gettime are unaffected by NTP-related frequency adjustments (slewing).


If Rust can eventually bump the minimum macOS version from 10.7 to 10.12, then Windows would be the only Tier 1 holdout. I figure bumping the minimum from Windows 7 to Windows 10 is pretty much a no-go.

However, I did come across an alternative in https://github.com/python-trio/trio/issues/1586 that sounds promising and wouldn't force a minimum version bump. Apparently, Windows has a KUSER_SHARED_DATA struct that stores (encoded) information about the time spent suspended, which you can subtract from the QPC value.

What solution would the Rust team like to see? Can we just make Instant fill one of the two roles and add another type to std::time to fill the other?

the8472 commented 1 year ago

One option is to make it generic with a default Instant<InstantSource=OsDefault> where InstantSource: ClockSource + Monotonic and then add ways to get additional clock sources, which might fail if they're not available on specific OS versions. Similar to what various collections do for allocators.

Then the default behavior could be left unspecified, whatever is available, and users who have more specific needs could try to obtain more specific clocks.

maniwani commented 1 year ago

Hmm, I hadn't thought of that one. I don't know if there are other clock sources that need to be supported, but I suppose there may be more platforms like FreeBSD that already can't do both.

Overall, I don't really have an opinion on the API. I mostly suggested a new type because SystemTime and Instant are separate types and Instant already has a disclaimer that the underlying system call could change. It seemed like requiring it to be one of the two monotonic clocks wouldn't count as a breaking change (especially given it's already inconsistent between Windows and other platforms).

Tangentially-related, I think Instant could stand to be a bit more strict. The docs also say "The size of an Instant struct may vary depending on the target operating system," but I think that's a bit too loose. In https://github.com/rust-lang/rust/pull/103594, one platform's Instant struct had a lower resolution than Duration (which has nanosecond resolution on all platforms), which meant addition and subtraction were non-associative with small enough Duration values.

I'll try to confirm if the Windows workaround actually works in the meantime. (edit: I tried the workaround in Rust and it does seem to remove the time suspended, so that's a nice fallback solution for Windows 7 and 8.)

DXist commented 1 year ago

I've published boot-time crate that provides Instant struct based on CLOCK_BOOTTIME on platforms that support it.

maniwani commented 8 months ago

I just noticed that the minimum supported macOS version was raised to 10.12 and the minimum supported Windows version will be raised to Windows 10 in the upcoming 1.78 release, so all of the Tier 1 platforms will have "suspend-aware" and "not suspend-aware" monotonic clocks.

Is there any other major hurdle (aside from RFC process) before we could add support for both types in std?

the8472 commented 8 months ago

Freebsd still does not account for suspend. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236702

The other question is that even if all systems have some clock that can account for suspend whether those clocks can be made consistent with sleep/wait APIs that take deadlines. #113752

RalfJung commented 8 months ago

It seems the intent of the current Instant is to measure time that does not advance during suspend (at least that's what the majority of our tier 1 platforms do), so step one would be to switch Windows over to that (if it hasn't already happened), and then maybe even state in the docs that on tier 1 platforms we guarantee that system suspends do not count as elapsed time. (And it seems the FreeBSD clock also behaves that way so maybe we can even guarantee this more widely?)

Adding support for measuring time where system suspends do count as elapsed time would be a new API, at that point one would probably want to talk about APIs that support choosing a clock. That can be explored as a regular crate before starting discussions with t-libs-api about including it in std. The barrier for std inclusion is pretty high these days (higher than it was in Rust 1.0).

ChrisDenton commented 7 months ago

There is a grand total of 3 tier 1 OSes, so I think looking at "the majority" is a weak argument. I also don't think we should be promising anything about the clock Instant uses other than it's consistent with whatever would be considered the default choice for the platform.

If we want to provide guaranteed clocks then I agree that should be a new API. E.g. allowing specifying the clock with Instant<Clock> or some other API.

tmandry commented 3 months ago

It occurs to me that while both Linux kernel and Go tried and failed to move their default clocks to boot time, their clock objects are both more powerful than what std::time::Instant can currently do (on stable). So we might have an opportunity to do this when they could not.

The only thing Instant supports is comparison and arithmetic (with Duration); we have still not stabilized sleep_until or recv_deadline. So the use cases that prevented Linux and Go from changing their clocks, like watchdog timers, don't apply to std::time::Instant today.

It's arguable that because subtracting two Instants gets you a Duration, any operation which takes a Duration should behave in the same way as Instant wrt to ticking during suspend. Changing that behavior would likely run into the aforementioned compatibility issues. I don't think I buy that, however: We can simply think of Duration as a difference between two instants in some monotonic clock. It strikes me as very rare, and probably a bug, that a duration used for sleeps would be the same duration used for arithmetic on a std::time::Instant – if you have a counterexample, please provide it!

If we want to provide guaranteed clocks then I agree that should be a new API. E.g. allowing specifying the clock with Instant<Clock> or some other API.

It seems difficult to provide guarantees like this across all platforms, but I think we should try to align the defaults as well as we can with user expectations and correct program behavior. https://github.com/tokio-rs/tokio/issues/3185 and #71860 demonstrate some cases where we have not done that.

jonhoo commented 3 months ago

One instance that comes to mind of the pattern you're asking for is fixed rate iteration. You take an instant, do some compute, take another instant, compute the elapsed time, then sleep for the desired period minus the elapsed time. Then duration calculation from instants factors directly into sleep time.

DXist commented 3 months ago

If a platform supports absolute timeouts (like Linux+iouring) then it's possible not to calculate elapsed time. Program calculates the next absolute timestamp using current time and gets its ceil with the desired granularity period.

Absolute timestamps are more convenient in case there are several timeout sources and program waits for the nearest one.

Instead of sleeps with relative duration it's possible to provide more advanced API with relative or absolute timeouts, choice of clocks (in the best effort manner).

Previously I've experimented with completion-based IO driver with relative timeouts.

On Linux the driver uses io_uring's Timeout operation and allows to wait till the next operation completion as an alternative to system call timeout.

On Windows and BSD-like systems it maintains a timer wheel, calculates timeout duration and explicitly provides it in system calls.

DXist commented 3 months ago

In case anybody is looking for suspend-aware (on supported platforms) Instant wrapper I've published boottime crate.

the8472 commented 1 week ago

It strikes me as very rare, and probably a bug, that a duration used for sleeps would be the same duration used for arithmetic on a std::time::Instant – if you have a counterexample, please provide it!

Work pacing that mixes active (measured via instant differences) and forced sleep times and doesn't do a separate measurement across the sleep to ensure both are from the same clock source.