`std::process::exit` is not thread-safe in combination with C code calling `exit`

teskje commented 5 months ago

The current implementation of std::process::exit is unsound on Linux and possibly related platforms where it defers to libc's exit() function. exit() is documented to be not thread-safe, and hence std::process::exit is not thread-safe either, despite being a non-unsafe function.

To show that this isn't just a theoretical problem, here is a minimal example that segfaults on my machine (Ubuntu with glibc 2.37):

use std::thread;

fn main() {
    for _ in 0..32 {
        unsafe { libc::atexit(exit_handler) };
    }
    for _ in 0..2 {
        thread::spawn(|| std::process::exit(0));
    }
}

extern "C" fn exit_handler() {
    thread::sleep_ms(1000);
}

The example contains unsafe code, but only to install exit handlers. AFAICT nothing about the libc::atexit call is unsafe. The UB is introduced by calling std::process::exit concurrently afterwards.

If you are curious, https://github.com/MaterializeInc/database-issues/issues/6528 lays out what causes the segfault (it's a use-after-free). That's not terribly relevant though, given that glibc has no obligation to ensure exit() is thread safe when it's clearly documented not to be. Instead, Rust should either mark std::process::exit as unsafe (which is something it could do for the 2024 edition), or introduce locking to ensure only a single thread gets to call exit() at a time.

Current status

Mitigation on our end (only covers pure Rust programs): #126606

The current status is that the specification language stems from a time when C did not specify multithreading and the intent was to forbid reentrancy. External discussion indicates that there's intent to fix it both in libc implementations and the specs:

glibc: https://sourceware.org/bugzilla/show_bug.cgi?id=31997
- patch: https://sourceware.org/pipermail/libc-alpha/2024-July/158579.html
POSIX: https://austingroupbugs.net/view.php?id=1845
- also a comment mentioning taking it to the C committee.
libc-coord (including musl author): https://www.openwall.com/lists/libc-coord/2024/07/24/4
musl patch: https://git.musl-libc.org/cgit/musl/commit/?id=8cca79a72cccbdb54726125d690d7d0095fc2409
freebsd patch: https://reviews.freebsd.org/D46108
bionic: https://android.googlesource.com/platform/bionic.git/+/089f4d17265480a5ad9311bcbd8890bc9f361801%5E%21/

teskje commented 5 months ago

This has been discussed previously in https://github.com/rust-lang/rust/issues/83994. There the decision was to not do anything about the concerns with std::process::exit, mostly based on the reasoning that exit() is thread safe. The above example shows it's clearly not, neither according to the documentation nor in the implementation.

Given that I just spend the better part of a weekend debugging a segfault that was caused by concurrent std::process::exit invocations I feel quite strongly that this issue deserves reconsideration.

ChrisDenton commented 5 months ago

On that thread joshtriplett notes:

The same arguments apply here as they do with environment handling: some types of unsynchronized changes to the environment from C code could race with an otherwise-synchronized one from Rust on some platforms, but we don't mark the corresponding functions in std::env as unsafe.

We do now mark std::env::set_var as unsafe so maybe thinking on this has changed. Though I'm not sure how you would mark returning from main as being unsafe?

conradludgate commented 5 months ago

Linux libc documentation does claim that exit() is thread-unsafe: "MT-Unsafe race:exit", although the original thread suggests that all libc implementations introduce appropriate locks. Perhaps introducing a Once before calling libc::exit is sufficient and not a performance blocker?

the8472 commented 5 months ago

Doesn't solve the problem with exit getting called from non-rust code. And no, an atexit handler won't help since it leaves a race-window and if you're in a situation where threads concurrently call exit you're already racing.

https://github.com/MaterializeInc/database-issues/issues/6528 sounds like it's either a libc bug or the assessment from the previous thread that libc implementations provide the desired behavior - even if the standard language doesn't - needs to be revised.

ChrisDenton commented 5 months ago

I'd guess the safe thing to do is kill all other threads then call exit.

the8472 commented 5 months ago

Not sure if serious...

Killing threads would make anything that accesses their stacks UB. Freezing them would work but that's difficult to implement reliably. Libc would be in a position to do it since they control pthreads. But then they might as well fix their locking.

tbu- commented 5 months ago

Adding 32 atexit handlers seems to be essential to make this segfault on my glibc 2.39+r52+gf8e4623421-1 on Arch Linux. With 31 atexit handlers, it doesn't segfault.

tbu- commented 5 months ago

I'd say it makes sense to introduce a lock on the Rust side, since pure-Rust code shouldn't be able to cause UB by calling libc's exit on different threads.

This also means that Rust cdylibs mustn't call std::process::exit since they might not be the only language running, and applications must ensure that no foreign functions call exit. The latter should be a bug anyway due to the C standard saying that multiple calls to exit are UB.

teskje commented 5 months ago

https://github.com/MaterializeInc/database-issues/issues/6528 sounds like it's either a libc bug or the assessment from the previous thread that libc implementations provide the desired behavior - even if the standard language doesn't - needs to be revised.

Based on how the glibc code is set up it very much looks like exit isn't intended to be thread safe. It uses a lock, but releases it every time it calls one of the exit handlers. That'd make no sense if it wanted to prevent other threads from modifying the list of exit handlers concurrently. The lock is useful for preventing corruption in the face of other threads calling atexit in parallel. There is a comment that says that this is its intended purpose (though admittedly it could be clearer).

I think the previous assumption that glibc's exit is thread-safe was a misconception based on the existence of this lock alone.

teskje commented 5 months ago

Adding 32 atexit handlers seems to be essential to make this segfault on my glibc 2.39+r52+gf8e4623421-1 on Arch Linux. With 31 atexit handlers, it doesn't segfault.

That's explained by the fact that one block in the list of exit handlers contains 32 entries, and the last block is never freed. So to trigger the use-after-free you need at least two blocks in the list. glibc installs one exit handler on its own, so 32 more need to be installed to add a free-able block to the list.

jieyouxu commented 5 months ago

Triage: marking this as I-unsound, but please readjust the label accordingly.

ChrisDenton commented 5 months ago

This comment from the main author of musl was pointed out in the libs-api meeting:

POSIX requires exit be thread-safe; it is not one of the functions listed in the exceptions, and all functions are required to be thread-safe by default. My understanding is that C11 also requires this.

I looked up the spec to confirm and it seems to be correct. https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_01

All functions defined by this volume of POSIX.1-2017 shall be thread-safe, except that the following functions1 need not be thread-safe.

exit is not on the list.

tbu- commented 5 months ago

According to the C11 standard, calling exit more than once is undefined behavior.

C11 7.22.4.4p2

If a program calls the exit function more than once, or calls the quick_exit function in addition to the exit function, the behavior is undefined.

Calling it from two threads at the same time is calling it twice. It is thus explicitly thread-unsafe in C11.

tbu- commented 5 months ago

The POSIX description also contains that sentence.

https://pubs.opengroup.org/onlinepubs/009695299/functions/exit.html

If exit() is called more than once, the behavior is undefined.

So the POSIX spec is kind of in conflict with itself?

ericlagergren commented 5 months ago

@tbu- See the latter half of https://github.com/rust-lang/rust/issues/83994#issuecomment-1430013574

tbu- commented 5 months ago

If there's any proper action to be had here, it's on the C side. The text about calling exit more than once being UB was written in the absence of threads and was obviously intended to make recursive exit (via atexit handlers) undefined, not to be a constraint on MT programs outside the scope of C. This should just be fixed (at least in POSIX and other more specific implementation specs) to make it so only the recursive case is undefined.

It makes sense to fix this on the C side, too.

Currently, we have process::exit causing practical UB on a tier 1 platform, it's not just a hypothetical issue, so I think something like #126606 also makes sense. Basically making sure exit() is only called once, and other threads calling exit simply going to sleep forever (until they're cleaned up by the OS).

Amanieu commented 5 months ago

This should probably be reported as a bug to glibc, along with a C program that reproduces the issue.

zachs18 commented 5 months ago

Here is a mostly-one-to-one C11 translation of the program in the OP that reproduces a segfault on my machine (Ubuntu 24.04, gcc 13.2.0-23ubuntu4, glibc 2.39-0ubuntu8.2), if someone wants to use it in a bug report.

code

```C #include #include void exit_handler(void) { thrd_sleep(&(struct timespec){.tv_sec = 1}, NULL); } int thread_function(void *arg) { (void)arg; exit(0); } int main(void) { for (int i = 0; i < 32; ++i) { atexit(exit_handler); } for (int i = 0; i < 2; ++i) { thrd_t thread; thrd_create(&thread, thread_function, NULL); } } ```

carbotaniuman commented 5 months ago

I'd say it makes sense to introduce a lock on the Rust side, since pure-Rust code shouldn't be able to cause UB by calling libc's exit on different threads.

This is definitely the wrong way to go, we just finished the fight with set_var, let's not repeat it.

tbu- commented 5 months ago

This is definitely the wrong way to go, we just finished the fight with set_var, let's not repeat it.

I think this case is different.

Process exiting is something that should fundamentally be thread-safe. I can even see this being adopted into the C standard. Libraries are not supposed to exit (but they are expected to getenv).

The problem with safe setenv was that libraries were expected to getenv without taking a lock.

benesch commented 5 months ago

This should probably be reported as a bug to glibc, along with a C program that reproduces the issue.

I don't think it's exactly a bug in glibc, though, and reporting it that way might get the issue bounced. The Linux manual page for exit(3) says clearly:

The exit() function uses a global variable that is not protected, so it is not thread-safe.

So glibc is at least conforming to its own documentation.

As @richfelker said in https://github.com/rust-lang/rust/issues/83994#issuecomment-1430013574, I do think it's worth trying to get this fixed over in the POSIX/C world, but I think that looks like engaging the folks who work on the POSIX and C standards to get this updated in the next version of those standards, and looping the glibc maintainers into those conversations for feedback.

benesch commented 5 months ago

This is definitely the wrong way to go, we just finished the fight with set_var, let's not repeat it.

I think this case is different.

Process exiting is something that should fundamentally be thread-safe. I can even see this being adopted into the C standard. Libraries are not supposed to exit (but they are expected to getenv).

The problem with safe setenv was that libraries were expected to getenv without taking a lock.

I agree. Adding a lock around std::process::exit seems like a no downside change. As far as I can tell, doing so entirely eliminates the unsoundness. If it's later determined that there's an error in that reasoning and std::process::exit must be marked unsafe, the existence of the lock doesn't make that transition any more difficult. And the happy case is that the POSIX and C folks agree that exit should be safe to call from multiple threads simultaneously, and Rust can remove the lock for platforms with exit implementations that are known to follow the new spec.

Amanieu commented 5 months ago

I agree that adding a lock is fine and would solve this issue. I am slightly concerned about the possibility of deadlocks but I think it shouldn't cause any.

the8472 commented 5 months ago

😕 How does it fix the issue? Doesn't it suffer from the same problem as the environment lock, that C code can still race?

bjorn3 commented 5 months ago

C code that follows the rules wouldn't call exit in cases where Rust could call it anyway. Only if you have multiple Rust staticlibs/cdylibs which don't share the lock can you still get issues. This is unlike the setenv issue where C code that follows the rules (doesn't call setenv once any thread is spawned) is enough to cause UB when calling setenv from the Rust side. And the setenv issue is fundamentally impossible to make safe due to the guarantees POSIX provides around getenv and the environ global, while exit is trivially to make safe by having libc add a lock.

the8472 commented 5 months ago

I don't expect C code to follow the rules because following the rules isn't reasonable here. Linking python (or java?) and rust code was an example that was brought up in environ discussion and this applies here too.

So imo it's at best a partial bandaid, not a fix.

tbu- commented 5 months ago

It's a fix for the Rust side. libc and other libraries don't call exit in random places (unlike getenv).

It's partial bandaid for the general problem, but theoretically, libraries shouldn't call exit anyway (again, unlike getenv). I feel like the situation looks much better than the env lock.

benesch commented 5 months ago

So imo it's at best a partial bandaid, not a fix.

It is a full fix for this issue. The issue here is that a well behaved Rust program can cause memory unsafety by calling the std::process::exit function from multiple threads. This means that std::process::exit function is currently unsound. It should not be possible to cause memory unsafety in Rust by calling a safe function.

Adding a lock inside of std::process::exit fully fixes this issue, as it will no longer be possible for a well behaved Rust program to cause memory unsafety by calling std::process::exit.

The issue is not that C programs can cause memory unsafety by calling the libc exit function from multiple threads. POSIX is clear: calling libc's exit function twice is undefined behavior. Libraries that call libc's exit function must only do so if they have arranged to ensure that they are the only thread that will call exit. How C libraries (or libraries in other languages that call libc's exit function) arrange to do that is out of scope for Rust.

I don't expect C code to follow the rules because following the rules isn't reasonable here.

I think it's the other way around: the only reasonable approach to writing safe C code is to mandate that C code follows the rules. Safe C code must not trigger undefined behavior. Any C library that wants to call exit safely needs to somehow ensure that no other thread could possibly cause exit to be called after the library calls exit. For example, the library could declare itself to be thread-unsafe, or it could clearly document the caller's responsibilities on a function that calls exit, like so:

/** Terminates the current process with exit code 0.
 *
 * Calling this function concurrently with another thread that calls `exit`
 * will trigger undefined behavior. If you call this function, you MUST ensure
 * that no other threads can possibly call `exit` after this function is called.
 */
void mylib_terminate(void) {
    printf("mylib requested to exit process\n");
    exit(0);
}

It sounds like there is some appetite for adjusting the C standard to allow calling exit more than once, but until that happens, I don't think there is a Rust issue here. If C code needs to callexit, it must somehow arrange to ensure that exit is not called more than once, but how that happens is out of scope for Rust.

the8472 commented 5 months ago

I'll repeat @RalfJung's argument https://github.com/rust-lang/rust/issues/83994#issuecomment-1430358346

This isn't about just C having UB. It's about libc/posix (not C the language) being the foundation on which most other languages build. By necessity, on platforms where raw syscalls are not available. Safe languages still expose means to exit the process that end up calling C's exit(). One of Rust's aims is to be able to interoperate with such languages.

We cannot claim that it's merely some sketchy C programs that are holding it wrong. We are building on a foundation of sand. Libraries that claim to provide safe interop with safe languages are exposing UB.

process::exit does not even have a correctness requirement that it should be called from bin crates and not from libraries. Neither do python's or java's exit(). And of course zero safety requirements, since it's a safe method.

It sounds like there is some appetite for adjusting the C standard to allow calling exit more than once, but until that happens, I don't think there is a Rust issue here.

We can't promise UB-free language interop as long as the underlying issue is not fixed.

tbu- commented 5 months ago

@the8472 You haven't responded to the statement that this fix makes pure-Rust programs linking glibc on Linux not have UB on parallel std::process::exit.

You're correct that we can't fix the ecosystem problem without fixing the C standard and/or POSIX.

RalfJung commented 5 months ago

That fix is a bit like the env lock though -- it's not safe under composition with non-Rust code. For the environment we eventually said the functions should be unsafe. So, should we make exit unsafe? That seems so silly. Even after all the things I have seen C do, I am still surprised about just how programmer-unfriendly that language is.

With the environment, the "implicit contract" that avoids races in concurrent C programs seems to be that the environment is read-only. What is that contract with exit? Is there a general expectation that atexit and exit can only be called on the main thread? Could we have a safe exit that checks whether it runs on the main thread? If not, how are C programmers expected to ever correctly call that function?

It is worth filing a bug against glibc to ask them to provide a sane interface, even if POSIX does not require sanity? I would be curious what their thoughts are about how languages built on a C runtime (which is sometimes the only runtime one can build any program on) are supposed to safely exit the process.

Cc @richfelker for a libc perspective

RalfJung commented 5 months ago

If libcs cannot provide a reasonably safe exit, IMO we should call the syscall directly, and provide an alternative unsafe function one can call instead if one needs the atexit handlers to be run. In a concurrent world, clearly we can't be expected to call exit. glibc, by making it non-thread-safe, is basically putting up a big sign saying "please don't call this function in a concurrent program", and we should do as they say. It is unfortunate if that breaks things due to missing out on atexit handlers, but it's the platform that is broken, not Rust -- glibc can fix this, but we cannot.

benesch commented 5 months ago

Safe languages still expose means to exit the process that end up calling C's exit(). One of Rust's aims is to be able to interoperate with such languages.

That is a different issue, IMO. This issue is not about allowing Rust programs to interleave calls to exit with non-Rust code that might also call exit. It's just about allowing Rust programs to safely call std::process::exit from Rust.

That fix is a bit like the env lock though -- it's not safe under composition with non-Rust code.

I think it's meaningfully different, though. The exit lock allows Rust code that calls std::process::exit to be safely composed with non-Rust code that calls atexit. That is a meaningful improvement in composability.

What is that contract with exit? Is there a general expectation that atexit and exit can only be called on the main thread? Could we have a safe exit that checks whether it runs on the main thread? If not, how are C programmers expected to ever correctly call that function?

atexit is thread safe. You can safely call it from any thread. One reasonable way to write safe multithreaded C code is to adhere to the following rules:

atexit may be called from any thread.
The process can only be exited by returning from main. If another thread wants to exit, that thread must signal the main thread to return, rather than calling libc's exit function.

RalfJung commented 5 months ago

I think it's meaningfully different, though. The exit lock allows Rust code that calls std::process::exit to be safely composed with non-Rust code that calls atexit.

Is that true? There can still be atexit calls running concurrently to exit iterating through the list of handlers to call. So I think there still is a race condition there.

One reasonable way to write safe multithreaded C code is to adhere to the following rules:

For some notion of "reasonable". ;) Basically you are saying that we have to deprecate exit without replacement. I think it is fair to say that "being able to exit the process at any moment by calling a specific function" is a requirement generally expected to be supported by all reasonable platforms. As evidence I bring up every single programming language that provides such a function -- they provide them, presumably, because programmers want or need them.

OTOH, there are other functions we could call instead -- quick_exit and _Exit. Is either of them documented as thread-safe?

the8472 commented 5 months ago

That is a different issue, IMO. This issue is not about allowing Rust programs to interleave calls to exit with non-Rust code that might also call exit. It's just about allowing Rust programs to safely call std::process::exit from Rust.

Well, we could add the lock and declare this issue as "fixed". But then we'd have to open another issue because the UB is still there, only more difficult to exploit.

bjorn3 commented 5 months ago

If libcs cannot provide a reasonably safe exit, IMO we should call the syscall directly, and provide an alternative unsafe function one can call instead if one needs the atexit handlers to be run.

_exit() and _Exit() (both functionally equivalent) skip the atexit handlers. The POSIX specification doesn't state that calling them twice is UB, unlike for exit().

tbu- commented 5 months ago

Is that true? There can still be atexit calls running concurrently to exit iterating through the list of handlers to call. So I think there still is a race condition there.

Yes, it's thread-safe. atexit is documented to be thread-safe. There is no race-condition here. glibc also locks this.

OTOH, there are other functions we could call instead -- quick_exit and _Exit. Is either of them documented as thread-safe?

quick_exit has the same problems as exit. _Exit is just a wrapper around the exit_group syscall on Linux AFAIK.

benesch commented 5 months ago

I think it's meaningfully different, though. The exit lock allows Rust code that calls std::process::exit to be safely composed with non-Rust code that calls atexit.

Is that true? There can still be atexit calls running concurrently to exit iterating through the list of handlers to call. So I think there still is a race condition there.

As far as I can tell. See: https://github.com/rust-lang/rust/issues/126600#issuecomment-2174312879. Concurrent calls to exit and atexit appear to be properly synchronized, and there are comments indicating that concurrent calls to exit and atexit are meant to be safe:

https://github.com/bminor/glibc/blob/f94f6d8a3572840d3ba42ab9ace3ea522c99c0c2/stdlib/exit.c#L119-L120

See also this glibc bug report about a race during concurrent calls to atexit and exit, which was accepted and fixed: https://sourceware.org/bugzilla/show_bug.cgi?id=14333

This seems consistent with the POSIX specification, which requires that it is safe to call atexit and exit simultaneously, but declares calling exit more than once to be undefined behavior.

RalfJung commented 5 months ago

Reading through the old issue, it seems @richfelker is of the opinion that

exit is thread-safe. Installing atexit handlers which free resources other threads might still be using is not thread-safe and is a bug in whatever code is installing those atexit handlers, and should be fixed there.

That directly contradicts the Linux man page when it says "MT-Unsafe race:exit". The POSIX and C standards state

If exit() is called more than once, the behavior is undefined.

I guess it is up to interpretation whether two concurrent calls to exit count as "called more than once" -- glibc seems to say that yes, it does. I would say that makes exit not-thread-safe. @richfelker either has a different interpretation of "called more than once" or a different definition of "thread-safe".

C also has quick_exit, but the documentation for that says

If a program calls the quick_exit function more than once, or calls the exit function in addition to the quick_exit function, the behavior is undefined.

So _Exit seems to be the only well-behaved option.

(Turns out I am just repeating what @bjorn3 and @tbu- also already said above.)

benesch commented 5 months ago

For some notion of "reasonable". ;)

Heh, yeah, C is tricky. That's why y'all invented Rust. ;)

Basically you are saying that we have to deprecate exit without replacement.

I don't think I'm saying that! Adding a lock to std::process::exit seems like it leaves Rust in a sound place. Then we can work to shift the ecosystem so that users who want to write a program that links together different languages that each safely call exit can do so. But I don't see why Rust needs to be under an obligation to provide a std::process::exit call that can be safely interspersed with exit calls from other languages. Of course that's a desirable end state, but it seems impossible to achieve given the current C standard.

I think it is fair to say that "being able to exit the process at any moment by calling a specific function" is a requirement generally expected to be supported by all reasonable platforms. As evidence I bring up every single programming language that provides such a function -- they provide them, presumably, because programmers want or need them.

I do agree with this! But for me the tension is resolved by splitting this issue into two: what we can do today in Rust to provide composability between std::process::exit and atexit, and what we can do in the long term by fixing the C standard to allow composability between std::process::exit and exit.

RalfJung commented 5 months ago

With set_var and remove_var, we decided that we want Rust code to be sound under composition with "well-behaved" C (and other non-Rust) code. For the environment, that meant code which only reads and never mutates the environment. We thus had to make set_var and remove_var unsafe.

For exit, the question then is -- what does "well-behaved" C code look like? Is the answer "code that never calls exit ever"? Or "only call exit on the main thread"? I guess we can rule out C libraries calling exit as obviously being already broken, but what about a C binary using a Rust library calling exit? Following the usual practices of the C world, the C binary can expect the Rust library to not call exit, and therefore it can safely call exit on its main thread (or on a single dedicated thread). Rust providing a safe exit breaks this concept, and therefore we cannot provide a safe wrapper around exit.

what we can do today in Rust to provide composability between std::process::exit and atexit

As we have decided when making environment mutation unsafe, this is insufficient. Rust's soundness story demands more.

benesch commented 5 months ago

That directly contradicts the Linux man page when it says "MT-Unsafe race:exit".

Yeah, this came up in the old thread, too. @richfelker commented on it here:

That's the Linux man page, which seems to be going by a different definition of MT-unsafe or else just wrong.

I guess it is up to interpretation whether two concurrent calls to exit count as "called more than once" -- glibc seems to say that yes, it does. I would say that makes exit not-thread-safe. @richfelker either has a different interpretation of "called more than once" or a different definition of "thread-safe".

@richfelker had an explanation for this too in the old thread:

If there's any proper action to be had here, it's on the C side. The text about calling exit more than once being UB was written in the absence of threads and was obviously intended to make recursive exit (via atexit handlers) undefined, not to be a constraint on MT programs outside the scope of C. This should just be fixed (at least in POSIX and other more specific implementation specs) to make it so only the recursive case is undefined.

The text in the POSIX standard makes a lot more sense to me if you assume threads hadn't been invented when the text was written, as there would be no way to call exit concurrently. The only way to call exit more than once would be to have an atexit handler call exit recursively.

For exit, the question then is -- what does "well-behaved" C code look like?

C doesn't offer the programmer any assistance in using exit safely. In my opinion, a well behaved C binary is one in which the author has audited every line of code that will be linked into that binary (regardless of whether that code is written in C or another language like Rust) and asserts that there is no code path that can allow multiple threads to call libc's exit function more than once.

what about a C binary using a Rust library calling exit? Following the usual practices of the C world, the C binary can expect the Rust library to not call exit, and therefore it can safely call exit on its main thread (or on a single dedicated thread). Rust providing a safe exit breaks this concept, and therefore we cannot provide a safe wrapper around exit.

This is where I have a different opinion! The C binary cannot expect the Rust library not to call std::process::exit. Someone programming in C is responsible for ensuring there is no undefined behavior when linking in libraries written in other languages. It is their responsibility to check whether the Rust library they're linking calls std::process::exit, and, if it does, they must somehow arrange to only trigger that call to std::process::exit when they can ensure that no other threads running non-Rust code will also trigger a call to libc's exit.

In other words: with a C binary, the programmer is responsible for upholding the requirements about not calling exit twice, and cannot rely on the Rust runtime to provide that guarantee.

As we have decided when making environment mutation unsafe, this is insufficient. Rust's soundness story demands more.

It seems subtly different to me! With environment mutation, a well-behaved Rust binary that called set_env could call a well behaved C library that called getenv and cause memory unsafety. Whereas here, a well-behaved Rust library that calls std::process::exit could only cause memory unsafety when linked into a C binary, where it is explicitly the C programmer's responsibility to look for and prevent multiple calls to exit.

That said, my opinion on Rust's soundness story is worth a lot less than yours. :)

If libcs cannot provide a reasonably safe exit, IMO we should call the syscall directly, and provide an alternative unsafe function one can call instead if one needs the atexit handlers to be run.

This take seems reasonable to me, for what it's worth. The one downside I see is that if you add a function like this to the standard library, that change is largely backwards compatible:

/// Like `std::process::exit`, but invokes `libc::exit` in order to call
/// any handlers registered with `libc::atexit`.
///
/// Calling this function from multiple threads concurrently or calling
/// this function more than once is undefined behavior.
pub unsafe fn exit_calling_handlers(code: i32) -> !;

But if in the future the POSIX and C standards are adjusted to make it safe to call exit concurrently from multiple threads, it will be a non-backwards compatible change to make std::process::exit call handlers again, and remove exit_calling_handlers.

the8472 commented 5 months ago

I do agree with this! But for me the tension is resolved by splitting this issue into two: what we can do today in Rust to provide composability between std::process::exit and atexit, and what we can do in the long term by fixing the C standard to allow composability between std::process::exit and exit.

From https://github.com/rust-lang/rust/issues/126600#issuecomment-2200627275:

[...] Only if you have multiple Rust staticlibs/cdylibs which don't share the lock can you still get issues.

So even with a lock pure rust would still be unsound.

This is consequence of the standard library not aiming to be or require a runtime that has exclusive rights to some parts of the system API. E.g. the hotspot JVM claims ownership of certain signal handlers and they require that you go through them if you want to add hooks.

Rust doesn't do that kind of thing. The consequence is we can't pretend to be able to guard access to exit().

DemiMarie commented 5 months ago

If a program calls the quick_exit function more than once, or calls the exit function in addition to the quick_exit function, the behavior is undefined.

So _Exit seems to be the only well-behaved option.

(Turns out I am just repeating what @bjorn3 and @tbu- also already said above.)

Can we preceed that with a call to fflush(NULL)? That is safe, and it should avoid “why didn’t my standard streams get flushed?” complaints.

comex commented 5 months ago

[...] Only if you have multiple Rust staticlibs/cdylibs which don't share the lock can you still get issues.

So even with a lock pure rust would still be unsound.

Sidenote: Just like with set_env, it would sure be nice if we had a way for multiple otherwise-independent images within a process to coordinate with each other. Then we could patch this by 'just' having the standard library coordinate with other copies of itself to share a lock, at least until exit is fixed on libc's side. Unfortunately, as I discovered when investigating this for set_env, there aren't any portable libc APIs suitable for such coordination. The only reasonable one is dlsym, but that doesn't work in some configurations.

benesch commented 5 months ago

[...] Only if you have multiple Rust staticlibs/cdylibs which don't share the lock can you still get issues.

So even with a lock pure rust would still be unsound.

Sidenote: Just like with set_env, it would sure be nice if we had a way for multiple otherwise-independent images within a process to coordinate with each other. Then we could patch this by 'just' having the standard library coordinate with other copies of itself to share a lock, at least until exit is fixed on libc's side. Unfortunately, as I discovered when investigating this for set_env, there aren't any portable libc APIs suitable for such coordination. The only reasonable one is dlsym, but that doesn't work in some configurations.

Oh, interesting! I haven't spent much time working with shared or static libraries in Rust, and naively I'd always assumed that linking together multiple libraries using Rust would result in the runtime sharing some global state between the runtimes.

Is there more written up on what you tried with set_env, @comex? I would have expected this to be largely solved by exporting a global symbol containing the env/process lock, but maybe it is not so simple with static libraries.

bjorn3 commented 5 months ago

For set_env there is no way to force all callers of getenv (including the C ones) to take the environment lock. Getenv is safe to call in multithreaded programs and a fair amount of libc functions in fact call it.

RalfJung commented 5 months ago

The text in the POSIX standard makes a lot more sense to me if you assume threads hadn't been invented when the text was written, as there would be no way to call exit concurrently. The only way to call exit more than once would be to have an atexit handler call exit recursively.

I have seen that argument and find it plausible. But with glibc explicitly documenting exit as not-thread-safe, and their implementation evidently being non-thread-safe, it seems like other people have other interpretations of this wording.

For instance, this list does not have exit as a thread-safe function either.

So, @richfelker might be in the minority here with their interpretation. Has anyone tried bringing this up with the glibc folks?

This is where I have a different opinion! The C binary cannot expect the Rust library not to call std::process::exit. Someone programming in C is responsible for ensuring there is no undefined behavior when linking in libraries written in other languages. It is their responsibility to check whether the Rust library they're linking calls std::process::exit, and, if it does, they must somehow arrange to only trigger that call to std::process::exit when they can ensure that no other threads running non-Rust code will also trigger a call to libc's exit.

That does not seem like a workable approach to me. It does not match precedent e.g. for getenv / setenv, where the answer is "only call getenv if there might be threds"; it is not "audit everything to find uses of setenv". A complete audit also does not match actual practice to my knowledge, so I don't think what you are saying reflects the reality out there. It may reflect what you wish the reality should be, but that's not enough. The question was "how do C programs in practice negotiate the use of exit", not "how do you think they should negotiate the use of exit".

Now, it is very possible that the answer is "they don't, and it is pure luck that things don't explode all the time".

It seems subtly different to me! With environment mutation, a well-behaved Rust binary that called set_env could call a well behaved C library that called getenv and cause memory unsafety. Whereas here, a well-behaved Rust library that calls std::process::exit could only cause memory unsafety when linked into a C binary, where it is explicitly the C programmer's responsibility to look for and prevent multiple calls to exit.

Where does it say that it is "explicitly the C programmer's responsibility to look for and prevent multiple calls to exit"? This is just your personal opinion for how C programs should be written, not some official guideline, as far as I know. The official glibc docs just say it's not thread-safe, end of story -- very similar to setenv.

But if in the future the POSIX and C standards are adjusted to make it safe to call exit concurrently from multiple threads, it will be a non-backwards compatible change to make std::process::exit call handlers again, and remove exit_calling_handlers.

We can just say that process::exit calls the handlers if that is possible to do in a thread-safe way. (I assume on Windows similar handlers exist and they do get executed and it is thread-safe to call exit concurrently in multiple threads.)

ChrisDenton commented 5 months ago

I have seen that argument and find it plausible. But with glibc explicitly documenting exit as not-thread-safe, and their implementation evidently being non-thread-safe, it seems like other people have other interpretations of this wording.

Is there any libc, other than glibc, that uses glibc's interpretation?

benesch commented 5 months ago

I have seen that argument and find it plausible. But with glibc explicitly documenting exit as not-thread-safe, and their implementation evidently being non-thread-safe, it seems like other people have other interpretations of this wording.

Yeah, sorry, you and I are on the same page here about the wording being unclear in a way that allows different libc implementations to come up with different interpretations of the wording. I just thought you might not have seen Rich's argument for why he interprets the wording the way that he does.

So, @richfelker might be in the minority here with their interpretation. Has anyone tried bringing this up with the glibc folks?

There's a libc mailing list thread where Andreas Schwab (a libc maintainer) posts that exit must be thread-safe, which is at odds with the MT-Unsafe designation in the manual. That's all I can find for historical threads on the subject. I agree that it'd be worth starting a new discussion specifically about why the glibc manual does not match the POSIX and C standards.

The question was "how do C programs in practice negotiate the use of exit", not "how do you think they should negotiate the use of exit".

Oh, I see. Your question was:

For exit, the question then is -- what does "well-behaved" C code look like?

and I didn't read the quotes around "well-behaved" as you meant them. For me, the phrase "well-behaved C code" means "C code that does not trigger undefined behavior." I agree there is a difference between "the C code most programmers write" and "C code that strictly adheres to the standard and therefore never triggers UB", and that most C programmers probably call exit in multi-threaded contexts without careful locking.

A complete audit also does not match actual practice to my knowledge, so I don't think what you are saying reflects the reality out there. It may reflect what you wish the reality should be, but that's not enough.

Right, I was talking about responsibility, not actual practice.

It does not match precedent e.g. for getenv / setenv, where the answer is "only call getenv if there might be threds"; it is not "audit everything to find uses of setenv".

The equivalent precedent is "don't call exit." Standard practice in C libraries is not to call exit, because libraries generally shouldn't manipulate the state of the process hosting the library. But if a C library really wants to call exit, it could do so safely in a function that is clearly documented as MT-Unsafe.

This seems equivalent to getenv / setenv. Standard practice in C libraries is to call only getenv and never call setenv, and then hope that everyone else follows the same rule. But a library that really wants to call setenv could do so in a function that is clearly documented to be MT-unsafe.

So it seems to me that the answer to the question of "how do C programs in practice negotiate the use of exit" is that C programs in practice don't link libraries that call exit.

Where does it say that it is "explicitly the C programmer's responsibility to look for and prevent multiple calls to exit"? This is just your personal opinion for how C programs should be written, not some official guideline, as far as I know. The official glibc docs just say it's not thread-safe, end of story -- very similar to setenv.

Declaring that calling exit more than once is UB is in the ISO C standard:

If a program calls the exit function more than once, or calls the quick_exit function in addition to the exit function, the behavior is undefined.

That's not a matter of opinion—that's what the standard says. Where did I miscommunicate? With C code, ultimately someone or something is responsible for ensuring that the code is semantically valid, and part of ensuring that the code is semantically valid is ensuring that exit is not called twice.

rust-lang / rust

`std::process::exit` is not thread-safe in combination with C code calling `exit` #126600

Current status