rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.29k stars 12.45k forks source link

Tracking issue for cleaning up std's thread_local implementation details #110897

Open m-ou-se opened 1 year ago

m-ou-se commented 1 year ago

std::thread_local, std::thread::local, std::thread::local_impl, std::sys_common::thread_local_key, std::sys_common::thread_local_dtor, std::sys::thread_local_key, etc. etc. are all messy and form quite a confusing maze. Just look at this map I tried to draw of it all:

tlmap

AAAaaaaAaa

Time to clean it up.

And maybe also fix some bugs while we're at it. ^^


Related issues to solve, maybe:

joboet commented 1 year ago

LazyKeyInner unfortunately can't just be OnceCell because of reentrant initialization: Because OnceCell::get_or_init returns a shared reference to the data that lives for the duration of self, it cannot change the value after the first inner initialization. LazyKeyInner can do that (and does) because references cannot outlive LocalKey::with.

m-ou-se commented 1 year ago

@joboet In what situation would that mem::replace call replace a Some instead of a None? Is that the right behaviour? Do we have a test for that situation?

joboet commented 1 year ago

It's very cursed, but this works. I don't think it's currently tested, however.

m-ou-se commented 1 year ago

That just seems like a bug to me. A (thread local) static &'static str (not mut, not a cell) shouldn't be able to have different values at different times (on the same thread).

joboet commented 1 year ago

Maybe :smile:? It's sound however, as there can be no references to the data from the first initialization when the new value is written.

m-ou-se commented 1 year ago

It's sound

Sure, but so would OnceCell be. ^^ And I think users should be able to safely assume that an immutable static never changes. (And as a bonus we get to delete code.)

m-ou-se commented 1 year ago

I did some software archeology and found the original issue and PR that introduced that mem::replace call: https://github.com/rust-lang/rust/issues/30228 and https://github.com/rust-lang/rust/pull/30267.

The issue was originally about thread locals, but then most of the discussion was about drop on assignment in general. The issue description suggests:

It should either drop one of the two values (perhaps after putting the TLS slot in "destructor being called" mode) or panic, if a multiple initialization condition is detected.

Without a preference for any of these solutions. The PR also doesn't include a reason for picking one solution over the other.

So I don't think the mem::replace behaviour was purposely picked over another solution (like dropping the other value or panicking).

workingjubilee commented 1 year ago

"We've all come to rely on undocumented endpoints in your API." "And this gives you power over me?"

m-ou-se commented 1 year ago

In this case I don't think anyone actually relies on LazyKeyInner being a OnceOrMaybeTwiceIfYouTryReallyHardCell instead of just a OnceCell. ^^

joboet commented 1 year ago

I'd advocate for a crater run, just in case, but yes.

workingjubilee commented 1 year ago

It is extremely unlikely someone is going to have tests that stress this exact point sufficiently for it to be revealed.

joboet commented 1 year ago

Ok, yes, it's actually part of the documentation. Nevermind, then.

thomcc commented 1 year ago

Thanks for this, I've been very unhappy with the thread_local! impl for a long time (@ChrisDenton has had to listen to me gripe about it many times). It's something I was hoping to get to back last fall, but then got a dayjob that takes up much of my time. I think you hit the big issues. Here are some other ones

  1. Fixing our (mis)use of __dso_handle/__cxa_thread_atexit_impl (https://github.com/rust-lang/rust/issues/88737). That issue has a lot of writing and some of it is made with incomplete understanding of the issue (I think), so I've summarized in a gist: https://gist.github.com/thomcc/8c4a8003cf1a282949a539e899cb45aa (initially it was going to be part of this comment).

  2. Lots of code does unnecessary reads from #[thread_local]s, which, while generally fast, can be very slow in cases like dynamic libraries (where it has to call a function that sometimes takes a lock, walks linked lists, etc).

    Some of this is just a quirk of the compiler (https://doc.rust-lang.org/nightly/src/std/sys/common/thread_local/fast_local.rs.html#182-198), but our "fast" impl actually uses two #[thread_local] variables now, which guarantees such behavior (that said, I only realized this yesterday and intend to fix it today).

  3. Related to that code, the non-const version ends up pessimizing things (especially the !needs_drop case) to avoid extra TLS reads on macOS: https://github.com/rust-lang/rust/blob/7452822843cf461b56742f0fc648af35889a3070/library/std/src/sys/common/thread_local/fast_local.rs#L182.

    This might be fixed in LLVM now, and if not could be worked around by using either compiler_fence or (if all else fails) black_box.

    (Alternatively, it's possible for us to do semi-dubious things with the fact that std is allowed to unstabley use needs_drop<T>() as a const fn...)

There are a few others I had written down (multiple implementations of the manual tls-dtor list) but you touch on them and it might be outdated given https://github.com/rust-lang/rust/pull/109858.

robertbastian commented 2 months ago

This might have caused #126992