time-rs / time

The most used Rust library for date and time handling.
https://time-rs.github.io
Apache License 2.0
1.09k stars 273 forks source link

Use thread safe libtz instead of libc to get local timezone offset on unix. #543

Closed caldwell closed 1 year ago

caldwell commented 1 year ago

Fixes #293 without needing to disable the feature on multithreaded programs.

I wrote the Rust libtz/libtz-sys crates to address this problem. They wrap IANA's libtz (repo, homepage), which is maintained in the same repo with their timezone and leapsecond datasets.

The libtz-sys crate has 2 interfaces, one that wraps localtime_r and another that wraps localtime_rz. Both should be thread safe:

The libtz crate provides a slightly more idiomatic interface to the libtz-sys crate. Currently it only uses libtz-sys's localtime_rz interface because that seems a little cleaner (though it does use std::env::var_os() to read the TZ env var, since that's the standard way to override the timezone for a process).

I set up a repo with the multithreaded POC in the original bug report.

 git clone https://github.com/caldwell/bad-localtime
 cd bad-localtime
 cargo run --bin time

Its Cargo.toml is currently pointing to this pull request branch to show that it doesn't crash.

Things I'm worried about:

I would love to get localtime support into multithreaded apps (pretty much anything async can't use it at the moment!).

Thoughts?

codecov[bot] commented 1 year ago

Codecov Report

Merging #543 (3b93cc3) into main (c45264c) will increase coverage by 0.3%. The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main    #543     +/-   ##
=======================================
+ Coverage   95.3%   95.5%   +0.3%     
=======================================
  Files         78      78             
  Lines       8563    8533     -30     
=======================================
- Hits        8157    8151      -6     
+ Misses       406     382     -24     
Impacted Files Coverage Δ
time/src/sys/local_offset_at/unix.rs 100.0% <100.0%> (+66.7%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

jclulow commented 1 year ago

I wrote the Rust libtz/libtz-sys crates to address this problem. They wrap IANA's libtz (repo, homepage), which is maintained in the same repo with their timezone and leapsecond datasets.

The libtz-sys crate has 2 interfaces, one that wraps localtime_r and another that wraps localtime_rz. Both should be thread safe:

The libtz crate provides a slightly more idiomatic interface to the libtz-sys crate. Currently it only uses libtz-sys's localtime_rz interface because that seems a little cleaner (though it does use std::env::var_os() to read the TZ env var, since that's the standard way to override the timezone for a process).

Presumably on systems where this is a problem, that won't help with getenv() or setenv() or tzset() use by code that is not Rust code; e.g., threads managed by libraries, or if you are building a shared library with Rust?

In crates that adopt libtz like you're suggesting here, on operating systems that do not have a defective setenv()/tzset() (i.e., where all of those functions are actually thread-safe), is it easy or possible to just use the system routines instead of the extra copy of the IANA library and the extra timezone determination logic?

caldwell commented 1 year ago

Presumably on systems where this is a problem, that won't help with getenv() or setenv() or tzset() use by code that is not Rust code; e.g., threads managed by libraries, or if you are building a shared library with Rust?

I don't think other threads doing C tzset() or getenv() will affect anything, since time nor libtz ever setenv() or std::env::var_set().

Definitely another thread using an unsafe C setenv() would break things. However one might argue that another thread doing setenv() is a bug in that code, since it is not specced to be a thread safe call. Staying in the Rust realm, std::env::var_set() is thread safe and will not interact (since this library uses std::env::var_os()).

It would also possible to remove the lookup of the TZ env var completely. I don't really like that as it's the only way for a user on a POSIX system to change the timezone away from the system default for a specific process and it would violate user expectations if it weren't there. But I don't know how often it is used it practice (when it doesn't exist, /etc/localtime is used). I know I never use it. Perhaps it could be disabled via a feature flag?

In crates that adopt libtz like you're suggesting here, on operating systems that do not have a defective setenv()/tzset() (i.e., where all of those functions are actually thread-safe), is it easy or possible to just use the system routines instead of the extra copy of the IANA library and the extra timezone determination logic?

In theory that should be possible. I'm not sure how to detect it though. Are you thinking some sort of explicit list of good OSes? I guess it's really the C library version, not the OS per se. Do you know of any implementations that are thread safe?

jclulow commented 1 year ago

It would also possible to remove the lookup of the TZ env var completely. I don't really like that as it's the only way for a user on a POSIX system to change the time zone away from the system default for a specific process and it would violate user expectations if it weren't there. But I don't know how often it is used it practice (when it doesn't exist, /etc/localtime is used). I know I never use it. Perhaps it could be disabled via a feature flag?

In crates that adopt libtz like you're suggesting here, on operating systems that do not have a defective setenv()/tzset() (i.e., where all of those functions are actually thread-safe), is it easy or possible to just use the system routines instead of the extra copy of the IANA library and the extra timezone determination logic?

In theory that should be possible. I'm not sure how to detect it though. Are you thinking some sort of explicit list of good OSes? I guess it's really the C library version, not the OS per se. Do you know of any implementations that are thread safe?

FWIW, I suspect the Linux ecosystem is somewhat unusual in housing a menagerie of libc implementations. I expect most other UNIX environments (e.g., illumos, the BSDs, commercial/proprietary systems) provide libc along with the kernel as part of one large consolidation developed together.

The illumos implementations of tzset(3C), localtime_r(3C), setenv(3C), and getenv(3C), all have an MT-Level of Safe or MT-Safe; MT-Level and other attributes are described in attributes(7). We inherited them a decade ago when we forked from Solaris, so I would expect that Solaris is also thread safe for all of these routines.

In addition to thread safety, our time zone configuration is a bit different from a Linux system, though most of this is hidden from a consuming process behind the standard entrypoints like localtime_r(), etc.

A process determines the local time zone in roughly this way:

Iff a process does not have TZ set in its environment, we cache the value read from /etc/TIMEZONE the first time we try to establish a local time, and the zone information from the zoneinfo file. This cache can be purged in all processes on the system by a privileged user with the tzreload(8) command. One can, thus, reconfigure the local system time zone while software is running. If a process does have TZ set in its environment then tzreload won't change the time zone for that process, but it will still invalidate the cache of the nominated zone file, which might have been patched in a package update to reflect an updated set of time zone information without needing to restart all the software on the system.

Historically systems like Java and Go and so on have created challenges for operators by ignoring the local time zone database, or at least some of the system facilities for configuring and refreshing it. It would be really great if we could avoid having that happen with Rust, where there are truly superlative facilities for using system library routines through FFI, etc!

caldwell commented 1 year ago

In addition to thread safety, our time zone configuration is a bit different from a Linux system…

Interesting. Though it seems once it has the timezone name it still parses it the same way, either as a POSIX inline definition or as a tzdata file compiled by zic.

Historically systems like Java and Go and so on have created challenges for operators by ignoring the local time zone database, or at least some of the system facilities for configuring and refreshing it. It would be really great if we could avoid having that happen with Rust, where there are truly superlative facilities for using system library routines through FFI, etc!

I agree. FWIW libtz currently doesn't override the database, it uses system database files. But it does replace the system library, so I decided to poke around existing implementations that I can find:

It does appear from that list that really the only OSes that need to be worked-around are linux/glibc, freebsd, and openbsd. Of those only freebsd would have an inferior tz implementation (in the sense that the IANA libtz code doesn't reload itself ever).

But I think I agree that this should probably be more of a surgical workaround than what it currently is. I'll see if I can come up with something tomorrow.

jhpratt commented 1 year ago

While this approach in general is tenable, there are a number of problems here. I won't list the ones that I noticed within ~5 minutes of looking at this, as to be honest I think it would be somewhat rude (feel free to reach out privately). By far the most important, however, is that the repository you linked does not compile on my machine. I get a linker error, which is a dealbreaker right off the bat. My setup isn't anything crazy, either — Fedora on a ThinkPad. For this reason, I am closing this pull request.

With regard to it being effectively unusable in async code, that is simply due to the safety requirements. The next release of time will permit the end user to explicitly opt out of soundness checks at runtime, though doing is unsafe, of course. My ideal solution is to get #[deprecated_safe] going again on the language-level, as work on it stopped last year when the author unexpectedly deleted their GitHub account.

jhpratt commented 1 year ago

And as a side note, if there are operating systems that have thread-safe environment mutation, please let me know with references. I should be able to opt specific OS's out of the multi-threaded check quite easily.

caldwell commented 1 year ago

And as a side note, if there are operating systems that have thread-safe environment mutation, please let me know with references. I should be able to opt specific OS's out of the multi-threaded check quite easily.

It's mostly here in this message, albeit more verbosely, so I'll summarize here (and add illumos links):

jhpratt commented 1 year ago

Thanks! I'll look that over and add opt-outs as appropriate.

jhpratt commented 1 year ago

illumos, NetBSD, and MacOS now bypass the check on main.