Open pvillela opened 11 months ago
You know, technically, it only says it has the "happens before" on the API docs of ScopedJoinHandle::join
. I'm guessing there being a difference is not intended behavior, however. cc @m-ou-se
This error is spotty, sometimes it happens, sometimes it doesn't. It happens since at least 1.63, when this function was stabilized.
nope, seems this has been happening since day -128, checked with 7737e0b5c 2022-04-04 using
Explicit joins go through the platforms join
operation, which waits until the spawned thread has actually, fully terminated. Therefore, operations performed in TLS destructors are seen by the joining thread.
Implicit joins on the other hand are implemented via an atomic counter that tracks the number of extant threads. The spawning thread parks at the end of the scope until the counter has been set to zero. Scoped threads decrement the counter when exiting their main function and unpark the spawning thread if necessary. The bug described here occurs because the TLS destructors are run after the main function completes, therefore there is no relationship formed between their operations and the counter load at the end of the scope.
As I see it, there are two ways to resolve this:
unsafe
code, as safe code can neither store referenced data in TLS nor write to shared data without proper synchronization.join
operation to join the remaining threads, providing the synchronization required for the code in this issue to be sound.Thanks for the bug report!
It's a good question whether this behaviour is acceptable or not. In many ways, TLS destructors behave a bit weirdly.
As mentioned above, the implicit joining behaviour does properly wait for the thread's main function to finish, so this is never a problem in safe code.
So, I don't think I'd categorize this as a high priority bug, or perhaps not even as a bug at all. But I do agree it would be good to fix this, if we can find a good way to do that.
Keeping track of all the join/thread handles in some sort of (linked) list would be problematic, as that'd result in an ever growing list (~a memory leak) when using a long-lived scope. (E.g. a webserver that all runs in a single thread::scope, which spawns a thread for every request. It'd never exit the scope, so the list would just grow forever.)
I'll investigate to see if we can find a better solution. Perhaps the counter needs to be decremented by a TLS destructor itself, or we can trigger TLS destruction ourselves, or perhaps we can think of another solution.
I'll investigate to see if we can find a better solution. Perhaps the counter needs to be decremented by a TLS destructor itself, or we can trigger TLS destruction ourselves, or perhaps we can think of another solution.
This might work. TLS destructors are executed in reverse order of registration, so if we register one right at the start of the scoped thread, it should execute last after any other TLS destructors in the thread.
That's not guaranteed and very unreliable. If a destructor registers another destructor, that one will run after all others. On platforms that use keyed TLS, there is no predefined order at all.
It could be acceptable to define the implicit join as inherently slightly racy if we're confident that it has no unsoundness per se in doing so (i.e. it is very obvious this is bad if combined with the aforementioned unsafe
code), as we can shift the blame to the unsafe code.
However, that leaves the question:
// SAFETY: this happens after the implicit thread join
// which should provide a `happens before` guarantee
let v = unsafe { &CONTROL };
black_box(format!("{:?}", v));
print!(".");
}
println!("Completed IMPLICIT join loop.");
}
}
impl Drop for Foo {
fn drop(&mut self) {
// SAFETY: this happens before the thread join
// which should provide a `happens before` guarantee
let _: Option<String> = unsafe { replace(&mut CONTROL, Some("abcd".to_owned())) };
}
}
Would either of these unsafe blocks, themselves, be okay, if the other one didn't exist? It seems the answer is that the unsafe read of the static mut
is never okay, at least.
Also, in the previous example, it should be noted that executing with optimizations appears to make the first mem::drop
and the implicit scoped thread's TLS Drop::drop
significantly more likely to race. The thread local's Drop sometimes doesn't even get to its print before the mem::drop
is over with.
We discussed this in the library team meeting and the decision was the keep the existing behavior. However we are open to any suggestions on how the documentation can be improved to be clearer. Feel free to open a doc PR.
That may be more easily done if we know what the motivation was for deciding to preserve the existing behavior, contra the alternatives? And which unsafe
block is the one that "actually" contains the UB?
That meeting happened on Zulip, here.
What we discussed is that the way to solve this involves keeping track of all the join handles in the main thread, which can pile up (behave like a memory leak) for a long living scope, so we see no other resolution than just accepting the current behaviour of scope()
.
Note that the thread_local
documentation says:
Note that a “best effort” is made to ensure that destructors for types stored in thread local storage are run
Although it's easy to argue that this "best effort" is only about whether they are run or not, not whether they are run at the expected time.
I believe we don't guarantee anywhere that TLS destructors will run before .join(). (I wonder how much would break if that behaviour changed.)
But if we do want to guarantee that, we could document that scope()
does not implicitly call .join()
, but instead that it simply waits for the main function of all spawned threads to have returned.
It would be VERY helpful to have a guarantee that TLS destructors run before a call to join() returns.
On Thu, Mar 14, 2024 at 11:03 Mara Bos @.***> wrote:
I believe we don't guarantee anywhere that TLS destructors will run before .join(). (I wonder how much would break if that behaviour changed.)
But if we do want to guarantee that, we could document that scope() does not implicitly call .join(), but instead that it simply waits for the main function of all spawned threads to have returned.
— Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/116237#issuecomment-1997666561, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD4LPLTBGTJQUIUBGDACN3YYG32NAVCNFSM6AAAAAA5LI32UKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJXGY3DMNJWGE . You are receiving this because you authored the thread.Message ID: @.***>
Keeping track of all the join/thread handles in some sort of (linked) list would be problematic, as that'd result in an ever growing list (~a memory leak) when using a long-lived scope
Note that this is already a memory leak for exactly the same sort of reason inside of the OS's thread library. Any joinable thread that has not been joined will require some memory to handle that state and pass through any return value supported by the API.
It would increase the size of the memory leak of course, likely by around 2x at most if the OS has optimized the joinable-but-exited thread state down to minimal amounts.
Ouch, yeah that's quite the surprising footgun. Personally I'd rather have a bit of memory accumulation in thread::scope
than risk UB due to missing synchronization. As @talchas2 said, this doesn't make any O(1) memory profile into O(n), at least not when taking into account pthread/kernel resource usage -- it's "just" a different constant in an O(n).
To clarify, is there a consensus on whether this is a soundness bug, or just a documentation bug?
Note that this is already a memory leak for exactly the same sort of reason inside of the OS's thread library. Any joinable thread that has not been joined will require some memory to handle that state and pass through any return value supported by the API.
We explicitly detach those threads when the join handle is dropped, so those threads are no longer joinable, so are not leaking memory.
Yeah, that's true, I missed that.
It would be VERY helpful to have a guarantee that TLS destructors run before a call to join() returns.
That is already the behavior today. Explicitly calling .join()
calls pthread_join or similar, which will wait until after TLS destructors have run. (Although we don't explicitly guarantee that. Perhaps we should!)
This issue is about what happens to threads that you have not explicitly .join()
ed when exiting the thread::scope
in which they were spawned. That currently only waits for the main function of those threads to return. They are not 'joined', because they have already been 'detached' (when the join handle was dropped).
Proposing to close this issue with resolution "not a bug, current behavior at scope exit needs to be properly documented".
@rfcbot close
Team member @m-ou-se has proposed to close this. The next step is review by the rest of the tagged team members:
No concerns currently listed.
Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!
See this document for info about what commands tagged team members can give me.
@m-ou-se reply:on https://github.com/rust-lang/rust/issues/116237#issuecomment-2217941740:
It would be VERY helpful to have a guarantee that TLS destructors run before a call to join() returns.
That is already the behavior today. Explicitly calling
.join()
calls pthread_join or similar, which will wait until after TLS destructors have run. (Although we don't explicitly guarantee that. Perhaps we should!)
I have created a separate issue #127571 about this.
:bell: This is now entering its final comment period, as per the review above. :bell:
The usecase of a webserver running in scope
and creating one thread per connection does pose an interesting challenge, though it may not be insurmountable.
The idea would be to have the threads that are stopping participate to the joining effort. In short:
Arc<Mutex<Option<JoinHandle>>>
is created per scope (or an array of such mutexes).main
), injected after the user-run closure:
scope
ends, whatever handle is left there is joined.And that's it. All threads are joined.
I am not sure if the idea is completely feasible, though.
Specifically, there's the issue of auto-join vs user-join. Today one cannot have two JoinHandle
to the same thread, and while std
could probably work around that (internally use an Arc
to an atomic thread-handle, swap it for 0 on first join?) or just use lower-level APIs, it's not clear if it's possible, or what the cost would be.
Advantages:
scope
never yields and spawns millions of short-lived threads.Disadvantages:
JoinHandle
, beyond scope
.scope
.For bonus points, ditch JoinHandle
for a single u64/pointer and using an atomic rather than a mutex of option.
The final comment period, with a disposition to close, as per the review above, is now complete.
As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.
See also https://github.com/rust-lang/rust/issues/116179:
Scoped thread implicit join fails 'happens before' guarantee
As documented in the Rust Atomics and Locks book by Mara Bos, "joining a thread creates a happens-before relationship between the joined thread and what happens after the join() call". This is not true for implicit joins on scoped threads.
I tried this code:
I expected to see this happen: Both the explicit join and implicit join portions of the code should terminate normally.
Instead, this happened: The explicit join portion terminates normally as expected, but the implicit join portion panics (the number of iterations on the loop before it panics varies) .
This issue is related to issue #116179.
Meta
The same behaviour is observed on the nightly version nightly-x86_64-unknown-linux-gnu unchanged - rustc 1.74.0-nightly (0288f2e19 2023-09-25).
rustc --version --verbose
:Backtrace
``` thread 'main' panicked at 'byte index 5 is out of bounds of `hP�^`', library/core/src/fmt/mod.rs:2324:30 stack backtrace: 0: rust_begin_unwind at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/std/src/panicking.rs:593:5 1: core::panicking::panic_fmt at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/panicking.rs:67:14 2: core::str::slice_error_fail_rt 3: core::str::slice_error_fail at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/str/mod.rs:87:9 4: core::str::traits:: for core::ops::range::Range>::index
at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/str/traits.rs:235:21
5: core::str::traits:: for str>::index
at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/str/traits.rs:61:15
6: ::fmt
at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/fmt/mod.rs:2324:30
7: ::fmt
at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/alloc/src/string.rs:2271:9
8: <&T as core::fmt::Debug>::fmt
at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/fmt/mod.rs:2268:62
9: core::fmt::builders::DebugTuple::field::{{closure}}
at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/fmt/builders.rs:322:17
10: core::result::Result::and_then
at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/result.rs:1319:22
11: core::fmt::builders::DebugTuple::field
at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/fmt/builders.rs:309:35
12: core::fmt::Formatter::debug_tuple_field1_finish
at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/fmt/mod.rs:2035:9
13: as core::fmt::Debug>::fmt
at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/option.rs:559:37
14: <&T as core::fmt::Debug>::fmt
at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/fmt/mod.rs:2268:62
15: core::fmt::rt::Argument::fmt
at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/fmt/rt.rs:138:9
16: core::fmt::write
at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/fmt/mod.rs:1094:21
17: core::fmt::Write::write_fmt
at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/fmt/mod.rs:192:9
18: alloc::fmt::format::format_inner
at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/alloc/src/fmt.rs:610:16
19: alloc::fmt::format::{{closure}}
at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/alloc/src/fmt.rs:614:34
20: core::option::Option::map_or_else
at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/option.rs:1180:21
21: alloc::fmt::format
at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/alloc/src/fmt.rs:614:5
22: thread_local_destruction_in_scoped_thread2::main
at ./general/src/bin/thread_local_destruction_in_scoped_thread2.rs:41:23
23: core::ops::function::FnOnce::call_once
at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
```