Open teskje opened 5 months ago
This has been discussed previously in https://github.com/rust-lang/rust/issues/83994. There the decision was to not do anything about the concerns with std::process::exit
, mostly based on the reasoning that exit()
is thread safe. The above example shows it's clearly not, neither according to the documentation nor in the implementation.
Given that I just spend the better part of a weekend debugging a segfault that was caused by concurrent std::process::exit
invocations I feel quite strongly that this issue deserves reconsideration.
On that thread joshtriplett notes:
The same arguments apply here as they do with environment handling: some types of unsynchronized changes to the environment from C code could race with an otherwise-synchronized one from Rust on some platforms, but we don't mark the corresponding functions in std::env as unsafe.
We do now mark std::env::set_var
as unsafe so maybe thinking on this has changed. Though I'm not sure how you would mark returning from main as being unsafe
?
Linux libc documentation does claim that exit()
is thread-unsafe: "MT-Unsafe race:exit", although the original thread suggests that all libc implementations introduce appropriate locks. Perhaps introducing a Once
before calling libc::exit is sufficient and not a performance blocker?
Doesn't solve the problem with exit getting called from non-rust code. And no, an atexit handler won't help since it leaves a race-window and if you're in a situation where threads concurrently call exit you're already racing.
https://github.com/MaterializeInc/database-issues/issues/6528 sounds like it's either a libc bug or the assessment from the previous thread that libc implementations provide the desired behavior - even if the standard language doesn't - needs to be revised.
I'd guess the safe thing to do is kill all other threads then call exit
.
Not sure if serious...
Killing threads would make anything that accesses their stacks UB. Freezing them would work but that's difficult to implement reliably. Libc would be in a position to do it since they control pthreads. But then they might as well fix their locking.
Adding 32 atexit
handlers seems to be essential to make this segfault on my glibc 2.39+r52+gf8e4623421-1 on Arch Linux. With 31 atexit
handlers, it doesn't segfault.
I'd say it makes sense to introduce a lock on the Rust side, since pure-Rust code shouldn't be able to cause UB by calling libc's exit
on different threads.
This also means that Rust cdylibs mustn't call std::process::exit
since they might not be the only language running, and applications must ensure that no foreign functions call exit
. The latter should be a bug anyway due to the C standard saying that multiple calls to exit
are UB.
https://github.com/MaterializeInc/database-issues/issues/6528 sounds like it's either a libc bug or the assessment from the previous thread that libc implementations provide the desired behavior - even if the standard language doesn't - needs to be revised.
Based on how the glibc code is set up it very much looks like exit
isn't intended to be thread safe. It uses a lock, but releases it every time it calls one of the exit handlers. That'd make no sense if it wanted to prevent other threads from modifying the list of exit handlers concurrently. The lock is useful for preventing corruption in the face of other threads calling atexit
in parallel. There is a comment that says that this is its intended purpose (though admittedly it could be clearer).
I think the previous assumption that glibc's exit
is thread-safe was a misconception based on the existence of this lock alone.
Adding 32 atexit handlers seems to be essential to make this segfault on my glibc 2.39+r52+gf8e4623421-1 on Arch Linux. With 31 atexit handlers, it doesn't segfault.
That's explained by the fact that one block in the list of exit handlers contains 32 entries, and the last block is never freed. So to trigger the use-after-free you need at least two blocks in the list. glibc installs one exit handler on its own, so 32 more need to be installed to add a free-able block to the list.
Triage: marking this as I-unsound
, but please readjust the label accordingly.
This comment from the main author of musl was pointed out in the libs-api meeting:
POSIX requires exit be thread-safe; it is not one of the functions listed in the exceptions, and all functions are required to be thread-safe by default. My understanding is that C11 also requires this.
I looked up the spec to confirm and it seems to be correct. https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_01
All functions defined by this volume of POSIX.1-2017 shall be thread-safe, except that the following functions1 need not be thread-safe.
exit
is not on the list.
According to the C11 standard, calling exit
more than once is undefined behavior.
C11 7.22.4.4p2
If a program calls the
exit
function more than once, or calls thequick_exit
function in addition to theexit
function, the behavior is undefined.
Calling it from two threads at the same time is calling it twice. It is thus explicitly thread-unsafe in C11.
The POSIX description also contains that sentence.
https://pubs.opengroup.org/onlinepubs/009695299/functions/exit.html
If
exit()
is called more than once, the behavior is undefined.
So the POSIX spec is kind of in conflict with itself?
@tbu- See the latter half of https://github.com/rust-lang/rust/issues/83994#issuecomment-1430013574
If there's any proper action to be had here, it's on the C side. The text about calling
exit
more than once being UB was written in the absence of threads and was obviously intended to make recursiveexit
(viaatexit
handlers) undefined, not to be a constraint on MT programs outside the scope of C. This should just be fixed (at least in POSIX and other more specific implementation specs) to make it so only the recursive case is undefined.
It makes sense to fix this on the C side, too.
Currently, we have process::exit
causing practical UB on a tier 1 platform, it's not just a hypothetical issue, so I think something like #126606 also makes sense. Basically making sure exit()
is only called once, and other threads calling exit
simply going to sleep forever (until they're cleaned up by the OS).
This should probably be reported as a bug to glibc, along with a C program that reproduces the issue.
Here is a mostly-one-to-one C11 translation of the program in the OP that reproduces a segfault on my machine (Ubuntu 24.04, gcc 13.2.0-23ubuntu4, glibc 2.39-0ubuntu8.2), if someone wants to use it in a bug report.
I'd say it makes sense to introduce a lock on the Rust side, since pure-Rust code shouldn't be able to cause UB by calling libc's
exit
on different threads.
This is definitely the wrong way to go, we just finished the fight with set_var
, let's not repeat it.
This is definitely the wrong way to go, we just finished the fight with
set_var
, let's not repeat it.
I think this case is different.
Process exiting is something that should fundamentally be thread-safe. I can even see this being adopted into the C standard. Libraries are not supposed to exit
(but they are expected to getenv
).
The problem with safe setenv
was that libraries were expected to getenv
without taking a lock.
This should probably be reported as a bug to glibc, along with a C program that reproduces the issue.
I don't think it's exactly a bug in glibc, though, and reporting it that way might get the issue bounced. The Linux manual page for exit(3)
says clearly:
The exit() function uses a global variable that is not protected, so it is not thread-safe.
So glibc is at least conforming to its own documentation.
As @richfelker said in https://github.com/rust-lang/rust/issues/83994#issuecomment-1430013574, I do think it's worth trying to get this fixed over in the POSIX/C world, but I think that looks like engaging the folks who work on the POSIX and C standards to get this updated in the next version of those standards, and looping the glibc maintainers into those conversations for feedback.
This is definitely the wrong way to go, we just finished the fight with
set_var
, let's not repeat it.I think this case is different.
Process exiting is something that should fundamentally be thread-safe. I can even see this being adopted into the C standard. Libraries are not supposed to
exit
(but they are expected togetenv
).The problem with safe
setenv
was that libraries were expected togetenv
without taking a lock.
I agree. Adding a lock around std::process::exit
seems like a no downside change. As far as I can tell, doing so entirely eliminates the unsoundness. If it's later determined that there's an error in that reasoning and std::process::exit
must be marked unsafe, the existence of the lock doesn't make that transition any more difficult. And the happy case is that the POSIX and C folks agree that exit
should be safe to call from multiple threads simultaneously, and Rust can remove the lock for platforms with exit
implementations that are known to follow the new spec.
I agree that adding a lock is fine and would solve this issue. I am slightly concerned about the possibility of deadlocks but I think it shouldn't cause any.
😕 How does it fix the issue? Doesn't it suffer from the same problem as the environment lock, that C code can still race?
C code that follows the rules wouldn't call exit in cases where Rust could call it anyway. Only if you have multiple Rust staticlibs/cdylibs which don't share the lock can you still get issues. This is unlike the setenv issue where C code that follows the rules (doesn't call setenv once any thread is spawned) is enough to cause UB when calling setenv from the Rust side. And the setenv issue is fundamentally impossible to make safe due to the guarantees POSIX provides around getenv and the environ global, while exit is trivially to make safe by having libc add a lock.
I don't expect C code to follow the rules because following the rules isn't reasonable here. Linking python (or java?) and rust code was an example that was brought up in environ discussion and this applies here too.
So imo it's at best a partial bandaid, not a fix.
It's a fix for the Rust side. libc and other libraries don't call exit
in random places (unlike getenv
).
It's partial bandaid for the general problem, but theoretically, libraries shouldn't call exit
anyway (again, unlike getenv
). I feel like the situation looks much better than the env lock.
So imo it's at best a partial bandaid, not a fix.
It is a full fix for this issue. The issue here is that a well behaved Rust program can cause memory unsafety by calling the std::process::exit
function from multiple threads. This means that std::process::exit
function is currently unsound. It should not be possible to cause memory unsafety in Rust by calling a safe function.
Adding a lock inside of std::process::exit
fully fixes this issue, as it will no longer be possible for a well behaved Rust program to cause memory unsafety by calling std::process::exit
.
The issue is not that C programs can cause memory unsafety by calling the libc exit
function from multiple threads. POSIX is clear: calling libc's exit
function twice is undefined behavior. Libraries that call libc's exit
function must only do so if they have arranged to ensure that they are the only thread that will call exit
. How C libraries (or libraries in other languages that call libc's exit
function) arrange to do that is out of scope for Rust.
I don't expect C code to follow the rules because following the rules isn't reasonable here.
I think it's the other way around: the only reasonable approach to writing safe C code is to mandate that C code follows the rules. Safe C code must not trigger undefined behavior. Any C library that wants to call exit
safely needs to somehow ensure that no other thread could possibly cause exit
to be called after the library calls exit
. For example, the library could declare itself to be thread-unsafe, or it could clearly document the caller's responsibilities on a function that calls exit
, like so:
/** Terminates the current process with exit code 0.
*
* Calling this function concurrently with another thread that calls `exit`
* will trigger undefined behavior. If you call this function, you MUST ensure
* that no other threads can possibly call `exit` after this function is called.
*/
void mylib_terminate(void) {
printf("mylib requested to exit process\n");
exit(0);
}
It sounds like there is some appetite for adjusting the C standard to allow calling exit
more than once, but until that happens, I don't think there is a Rust issue here. If C code needs to callexit
, it must somehow arrange to ensure that exit
is not called more than once, but how that happens is out of scope for Rust.
I'll repeat @RalfJung's argument https://github.com/rust-lang/rust/issues/83994#issuecomment-1430358346
This isn't about just C having UB. It's about libc/posix (not C the language) being the foundation on which most other languages build. By necessity, on platforms where raw syscalls are not available. Safe languages still expose means to exit the process that end up calling C's exit(). One of Rust's aims is to be able to interoperate with such languages.
We cannot claim that it's merely some sketchy C programs that are holding it wrong. We are building on a foundation of sand. Libraries that claim to provide safe interop with safe languages are exposing UB.
process::exit
does not even have a correctness requirement that it should be called from bin crates and not from libraries. Neither do python's or java's exit()
. And of course zero safety requirements, since it's a safe method.
It sounds like there is some appetite for adjusting the C standard to allow calling exit more than once, but until that happens, I don't think there is a Rust issue here.
We can't promise UB-free language interop as long as the underlying issue is not fixed.
@the8472 You haven't responded to the statement that this fix makes pure-Rust programs linking glibc on Linux not have UB on parallel std::process::exit
.
You're correct that we can't fix the ecosystem problem without fixing the C standard and/or POSIX.
That fix is a bit like the env lock though -- it's not safe under composition with non-Rust code. For the environment we eventually said the functions should be unsafe
. So, should we make exit
unsafe
? That seems so silly. Even after all the things I have seen C do, I am still surprised about just how programmer-unfriendly that language is.
With the environment, the "implicit contract" that avoids races in concurrent C programs seems to be that the environment is read-only. What is that contract with exit
? Is there a general expectation that atexit
and exit
can only be called on the main thread? Could we have a safe exit
that checks whether it runs on the main thread? If not, how are C programmers expected to ever correctly call that function?
It is worth filing a bug against glibc to ask them to provide a sane interface, even if POSIX does not require sanity? I would be curious what their thoughts are about how languages built on a C runtime (which is sometimes the only runtime one can build any program on) are supposed to safely exit the process.
Cc @richfelker for a libc perspective
If libcs cannot provide a reasonably safe exit
, IMO we should call the syscall directly, and provide an alternative unsafe function one can call instead if one needs the atexit handlers to be run. In a concurrent world, clearly we can't be expected to call exit
. glibc, by making it non-thread-safe, is basically putting up a big sign saying "please don't call this function in a concurrent program", and we should do as they say. It is unfortunate if that breaks things due to missing out on atexit handlers, but it's the platform that is broken, not Rust -- glibc can fix this, but we cannot.
Safe languages still expose means to exit the process that end up calling C's exit(). One of Rust's aims is to be able to interoperate with such languages.
That is a different issue, IMO. This issue is not about allowing Rust programs to interleave calls to exit
with non-Rust code that might also call exit
. It's just about allowing Rust programs to safely call std::process::exit
from Rust.
That fix is a bit like the env lock though -- it's not safe under composition with non-Rust code.
I think it's meaningfully different, though. The exit lock allows Rust code that calls std::process::exit
to be safely composed with non-Rust code that calls atexit
. That is a meaningful improvement in composability.
What is that contract with
exit
? Is there a general expectation thatatexit
andexit
can only be called on the main thread? Could we have a safeexit
that checks whether it runs on the main thread? If not, how are C programmers expected to ever correctly call that function?
atexit
is thread safe. You can safely call it from any thread. One reasonable way to write safe multithreaded C code is to adhere to the following rules:
atexit
may be called from any thread.main
. If another thread wants to exit, that thread must signal the main thread to return, rather than calling libc's exit
function.I think it's meaningfully different, though. The exit lock allows Rust code that calls std::process::exit to be safely composed with non-Rust code that calls atexit.
Is that true? There can still be atexit
calls running concurrently to exit
iterating through the list of handlers to call. So I think there still is a race condition there.
One reasonable way to write safe multithreaded C code is to adhere to the following rules:
For some notion of "reasonable". ;) Basically you are saying that we have to deprecate exit
without replacement. I think it is fair to say that "being able to exit the process at any moment by calling a specific function" is a requirement generally expected to be supported by all reasonable platforms. As evidence I bring up every single programming language that provides such a function -- they provide them, presumably, because programmers want or need them.
OTOH, there are other functions we could call instead -- quick_exit and _Exit. Is either of them documented as thread-safe?
That is a different issue, IMO. This issue is not about allowing Rust programs to interleave calls to exit with non-Rust code that might also call exit. It's just about allowing Rust programs to safely call std::process::exit from Rust.
Well, we could add the lock and declare this issue as "fixed". But then we'd have to open another issue because the UB is still there, only more difficult to exploit.
If libcs cannot provide a reasonably safe exit, IMO we should call the syscall directly, and provide an alternative unsafe function one can call instead if one needs the atexit handlers to be run.
_exit()
and _Exit()
(both functionally equivalent) skip the atexit handlers. The POSIX specification doesn't state that calling them twice is UB, unlike for exit()
.
Is that true? There can still be
atexit
calls running concurrently toexit
iterating through the list of handlers to call. So I think there still is a race condition there.
Yes, it's thread-safe. atexit
is documented to be thread-safe. There is no race-condition here. glibc also locks this.
OTOH, there are other functions we could call instead -- quick_exit and _Exit. Is either of them documented as thread-safe?
quick_exit
has the same problems as exit
. _Exit
is just a wrapper around the exit_group
syscall on Linux AFAIK.
I think it's meaningfully different, though. The exit lock allows Rust code that calls std::process::exit to be safely composed with non-Rust code that calls atexit.
Is that true? There can still be
atexit
calls running concurrently toexit
iterating through the list of handlers to call. So I think there still is a race condition there.
As far as I can tell. See: https://github.com/rust-lang/rust/issues/126600#issuecomment-2174312879. Concurrent calls to exit
and atexit
appear to be properly synchronized, and there are comments indicating that concurrent calls to exit
and atexit
are meant to be safe:
See also this glibc bug report about a race during concurrent calls to atexit
and exit
, which was accepted and fixed: https://sourceware.org/bugzilla/show_bug.cgi?id=14333
This seems consistent with the POSIX specification, which requires that it is safe to call atexit
and exit
simultaneously, but declares calling exit
more than once to be undefined behavior.
Reading through the old issue, it seems @richfelker is of the opinion that
exit is thread-safe. Installing atexit handlers which free resources other threads might still be using is not thread-safe and is a bug in whatever code is installing those atexit handlers, and should be fixed there.
That directly contradicts the Linux man page when it says "MT-Unsafe race:exit". The POSIX and C standards state
If exit() is called more than once, the behavior is undefined.
I guess it is up to interpretation whether two concurrent calls to exit
count as "called more than once" -- glibc seems to say that yes, it does. I would say that makes exit
not-thread-safe. @richfelker either has a different interpretation of "called more than once" or a different definition of "thread-safe".
C also has quick_exit, but the documentation for that says
If a program calls the quick_exit function more than once, or calls the exit function in addition to the quick_exit function, the behavior is undefined.
So _Exit
seems to be the only well-behaved option.
(Turns out I am just repeating what @bjorn3 and @tbu- also already said above.)
For some notion of "reasonable". ;)
Heh, yeah, C is tricky. That's why y'all invented Rust. ;)
Basically you are saying that we have to deprecate
exit
without replacement.
I don't think I'm saying that! Adding a lock to std::process::exit
seems like it leaves Rust in a sound place. Then we can work to shift the ecosystem so that users who want to write a program that links together different languages that each safely call exit
can do so. But I don't see why Rust needs to be under an obligation to provide a std::process::exit
call that can be safely interspersed with exit calls from other languages. Of course that's a desirable end state, but it seems impossible to achieve given the current C standard.
I think it is fair to say that "being able to exit the process at any moment by calling a specific function" is a requirement generally expected to be supported by all reasonable platforms. As evidence I bring up every single programming language that provides such a function -- they provide them, presumably, because programmers want or need them.
I do agree with this! But for me the tension is resolved by splitting this issue into two: what we can do today in Rust to provide composability between std::process::exit
and atexit
, and what we can do in the long term by fixing the C standard to allow composability between std::process::exit
and exit
.
With set_var and remove_var, we decided that we want Rust code to be sound under composition with "well-behaved" C (and other non-Rust) code. For the environment, that meant code which only reads and never mutates the environment. We thus had to make set_var and remove_var unsafe.
For exit, the question then is -- what does "well-behaved" C code look like? Is the answer "code that never calls exit
ever"? Or "only call exit
on the main thread"? I guess we can rule out C libraries calling exit
as obviously being already broken, but what about a C binary using a Rust library calling exit
? Following the usual practices of the C world, the C binary can expect the Rust library to not call exit
, and therefore it can safely call exit
on its main thread (or on a single dedicated thread). Rust providing a safe exit
breaks this concept, and therefore we cannot provide a safe wrapper around exit
.
what we can do today in Rust to provide composability between std::process::exit and atexit
As we have decided when making environment mutation unsafe, this is insufficient. Rust's soundness story demands more.
That directly contradicts the Linux man page when it says "MT-Unsafe race:exit".
Yeah, this came up in the old thread, too. @richfelker commented on it here:
That's the Linux man page, which seems to be going by a different definition of MT-unsafe or else just wrong.
I guess it is up to interpretation whether two concurrent calls to
exit
count as "called more than once" -- glibc seems to say that yes, it does. I would say that makesexit
not-thread-safe. @richfelker either has a different interpretation of "called more than once" or a different definition of "thread-safe".
@richfelker had an explanation for this too in the old thread:
If there's any proper action to be had here, it's on the C side. The text about calling
exit
more than once being UB was written in the absence of threads and was obviously intended to make recursiveexit
(viaatexit
handlers) undefined, not to be a constraint on MT programs outside the scope of C. This should just be fixed (at least in POSIX and other more specific implementation specs) to make it so only the recursive case is undefined.
The text in the POSIX standard makes a lot more sense to me if you assume threads hadn't been invented when the text was written, as there would be no way to call exit concurrently. The only way to call exit more than once would be to have an atexit
handler call exit
recursively.
For exit, the question then is -- what does "well-behaved" C code look like?
C doesn't offer the programmer any assistance in using exit
safely. In my opinion, a well behaved C binary is one in which the author has audited every line of code that will be linked into that binary (regardless of whether that code is written in C or another language like Rust) and asserts that there is no code path that can allow multiple threads to call libc's exit
function more than once.
what about a C binary using a Rust library calling
exit
? Following the usual practices of the C world, the C binary can expect the Rust library to not callexit
, and therefore it can safely callexit
on its main thread (or on a single dedicated thread). Rust providing a safeexit
breaks this concept, and therefore we cannot provide a safe wrapper aroundexit
.
This is where I have a different opinion! The C binary cannot expect the Rust library not to call std::process::exit
. Someone programming in C is responsible for ensuring there is no undefined behavior when linking in libraries written in other languages. It is their responsibility to check whether the Rust library they're linking calls std::process::exit
, and, if it does, they must somehow arrange to only trigger that call to std::process::exit
when they can ensure that no other threads running non-Rust code will also trigger a call to libc's exit
.
In other words: with a C binary, the programmer is responsible for upholding the requirements about not calling exit
twice, and cannot rely on the Rust runtime to provide that guarantee.
As we have decided when making environment mutation unsafe, this is insufficient. Rust's soundness story demands more.
It seems subtly different to me! With environment mutation, a well-behaved Rust binary that called set_env
could call a well behaved C library that called getenv
and cause memory unsafety. Whereas here, a well-behaved Rust library that calls std::process::exit
could only cause memory unsafety when linked into a C binary, where it is explicitly the C programmer's responsibility to look for and prevent multiple calls to exit
.
That said, my opinion on Rust's soundness story is worth a lot less than yours. :)
If libcs cannot provide a reasonably safe exit, IMO we should call the syscall directly, and provide an alternative unsafe function one can call instead if one needs the
atexit
handlers to be run.
This take seems reasonable to me, for what it's worth. The one downside I see is that if you add a function like this to the standard library, that change is largely backwards compatible:
/// Like `std::process::exit`, but invokes `libc::exit` in order to call
/// any handlers registered with `libc::atexit`.
///
/// Calling this function from multiple threads concurrently or calling
/// this function more than once is undefined behavior.
pub unsafe fn exit_calling_handlers(code: i32) -> !;
But if in the future the POSIX and C standards are adjusted to make it safe to call exit
concurrently from multiple threads, it will be a non-backwards compatible change to make std::process::exit
call handlers again, and remove exit_calling_handlers
.
I do agree with this! But for me the tension is resolved by splitting this issue into two: what we can do today in Rust to provide composability between std::process::exit and atexit, and what we can do in the long term by fixing the C standard to allow composability between std::process::exit and exit.
From https://github.com/rust-lang/rust/issues/126600#issuecomment-2200627275:
[...] Only if you have multiple Rust staticlibs/cdylibs which don't share the lock can you still get issues.
So even with a lock pure rust would still be unsound.
This is consequence of the standard library not aiming to be or require a runtime that has exclusive rights to some parts of the system API. E.g. the hotspot JVM claims ownership of certain signal handlers and they require that you go through them if you want to add hooks.
Rust doesn't do that kind of thing. The consequence is we can't pretend to be able to guard access to exit()
.
If a program calls the quick_exit function more than once, or calls the exit function in addition to the quick_exit function, the behavior is undefined.
So
_Exit
seems to be the only well-behaved option.(Turns out I am just repeating what @bjorn3 and @tbu- also already said above.)
Can we preceed that with a call to fflush(NULL)
? That is safe, and it should avoid “why didn’t my standard streams get flushed?” complaints.
[...] Only if you have multiple Rust staticlibs/cdylibs which don't share the lock can you still get issues.
So even with a lock pure rust would still be unsound.
Sidenote: Just like with set_env
, it would sure be nice if we had a way for multiple otherwise-independent images within a process to coordinate with each other. Then we could patch this by 'just' having the standard library coordinate with other copies of itself to share a lock, at least until exit
is fixed on libc's side. Unfortunately, as I discovered when investigating this for set_env
, there aren't any portable libc APIs suitable for such coordination. The only reasonable one is dlsym
, but that doesn't work in some configurations.
[...] Only if you have multiple Rust staticlibs/cdylibs which don't share the lock can you still get issues.
So even with a lock pure rust would still be unsound.
Sidenote: Just like with
set_env
, it would sure be nice if we had a way for multiple otherwise-independent images within a process to coordinate with each other. Then we could patch this by 'just' having the standard library coordinate with other copies of itself to share a lock, at least untilexit
is fixed on libc's side. Unfortunately, as I discovered when investigating this forset_env
, there aren't any portable libc APIs suitable for such coordination. The only reasonable one isdlsym
, but that doesn't work in some configurations.
Oh, interesting! I haven't spent much time working with shared or static libraries in Rust, and naively I'd always assumed that linking together multiple libraries using Rust would result in the runtime sharing some global state between the runtimes.
Is there more written up on what you tried with set_env
, @comex? I would have expected this to be largely solved by exporting a global symbol containing the env/process lock, but maybe it is not so simple with static libraries.
For set_env there is no way to force all callers of getenv (including the C ones) to take the environment lock. Getenv is safe to call in multithreaded programs and a fair amount of libc functions in fact call it.
The text in the POSIX standard makes a lot more sense to me if you assume threads hadn't been invented when the text was written, as there would be no way to call exit concurrently. The only way to call exit more than once would be to have an atexit handler call exit recursively.
I have seen that argument and find it plausible. But with glibc explicitly documenting exit
as not-thread-safe, and their implementation evidently being non-thread-safe, it seems like other people have other interpretations of this wording.
For instance, this list does not have exit
as a thread-safe function either.
So, @richfelker might be in the minority here with their interpretation. Has anyone tried bringing this up with the glibc folks?
This is where I have a different opinion! The C binary cannot expect the Rust library not to call std::process::exit. Someone programming in C is responsible for ensuring there is no undefined behavior when linking in libraries written in other languages. It is their responsibility to check whether the Rust library they're linking calls std::process::exit, and, if it does, they must somehow arrange to only trigger that call to std::process::exit when they can ensure that no other threads running non-Rust code will also trigger a call to libc's exit.
That does not seem like a workable approach to me. It does not match precedent e.g. for getenv / setenv, where the answer is "only call getenv if there might be threds"; it is not "audit everything to find uses of setenv". A complete audit also does not match actual practice to my knowledge, so I don't think what you are saying reflects the reality out there. It may reflect what you wish the reality should be, but that's not enough. The question was "how do C programs in practice negotiate the use of exit
", not "how do you think they should negotiate the use of exit
".
Now, it is very possible that the answer is "they don't, and it is pure luck that things don't explode all the time".
It seems subtly different to me! With environment mutation, a well-behaved Rust binary that called set_env could call a well behaved C library that called getenv and cause memory unsafety. Whereas here, a well-behaved Rust library that calls std::process::exit could only cause memory unsafety when linked into a C binary, where it is explicitly the C programmer's responsibility to look for and prevent multiple calls to exit.
Where does it say that it is "explicitly the C programmer's responsibility to look for and prevent multiple calls to exit"? This is just your personal opinion for how C programs should be written, not some official guideline, as far as I know. The official glibc docs just say it's not thread-safe, end of story -- very similar to setenv.
But if in the future the POSIX and C standards are adjusted to make it safe to call exit concurrently from multiple threads, it will be a non-backwards compatible change to make std::process::exit call handlers again, and remove exit_calling_handlers.
We can just say that process::exit
calls the handlers if that is possible to do in a thread-safe way. (I assume on Windows similar handlers exist and they do get executed and it is thread-safe to call exit
concurrently in multiple threads.)
I have seen that argument and find it plausible. But with glibc explicitly documenting
exit
as not-thread-safe, and their implementation evidently being non-thread-safe, it seems like other people have other interpretations of this wording.
Is there any libc, other than glibc, that uses glibc's interpretation?
I have seen that argument and find it plausible. But with glibc explicitly documenting
exit
as not-thread-safe, and their implementation evidently being non-thread-safe, it seems like other people have other interpretations of this wording.
Yeah, sorry, you and I are on the same page here about the wording being unclear in a way that allows different libc implementations to come up with different interpretations of the wording. I just thought you might not have seen Rich's argument for why he interprets the wording the way that he does.
So, @richfelker might be in the minority here with their interpretation. Has anyone tried bringing this up with the glibc folks?
There's a libc mailing list thread where Andreas Schwab (a libc maintainer) posts that exit
must be thread-safe, which is at odds with the MT-Unsafe designation in the manual. That's all I can find for historical threads on the subject. I agree that it'd be worth starting a new discussion specifically about why the glibc manual does not match the POSIX and C standards.
The question was "how do C programs in practice negotiate the use of
exit
", not "how do you think they should negotiate the use ofexit
".
Oh, I see. Your question was:
For exit, the question then is -- what does "well-behaved" C code look like?
and I didn't read the quotes around "well-behaved" as you meant them. For me, the phrase "well-behaved C code" means "C code that does not trigger undefined behavior." I agree there is a difference between "the C code most programmers write" and "C code that strictly adheres to the standard and therefore never triggers UB", and that most C programmers probably call exit
in multi-threaded contexts without careful locking.
A complete audit also does not match actual practice to my knowledge, so I don't think what you are saying reflects the reality out there. It may reflect what you wish the reality should be, but that's not enough.
Right, I was talking about responsibility, not actual practice.
It does not match precedent e.g. for getenv / setenv, where the answer is "only call getenv if there might be threds"; it is not "audit everything to find uses of setenv".
The equivalent precedent is "don't call exit
." Standard practice in C libraries is not to call exit
, because libraries generally shouldn't manipulate the state of the process hosting the library. But if a C library really wants to call exit
, it could do so safely in a function that is clearly documented as MT-Unsafe.
This seems equivalent to getenv / setenv. Standard practice in C libraries is to call only getenv
and never call setenv
, and then hope that everyone else follows the same rule. But a library that really wants to call setenv
could do so in a function that is clearly documented to be MT-unsafe.
So it seems to me that the answer to the question of "how do C programs in practice negotiate the use of exit
" is that C programs in practice don't link libraries that call exit
.
Where does it say that it is "explicitly the C programmer's responsibility to look for and prevent multiple calls to exit"? This is just your personal opinion for how C programs should be written, not some official guideline, as far as I know. The official glibc docs just say it's not thread-safe, end of story -- very similar to setenv.
Declaring that calling exit
more than once is UB is in the ISO C standard:
If a program calls the
exit
function more than once, or calls thequick_exit
function in addition to the exit function, the behavior is undefined.
That's not a matter of opinion—that's what the standard says. Where did I miscommunicate? With C code, ultimately someone or something is responsible for ensuring that the code is semantically valid, and part of ensuring that the code is semantically valid is ensuring that exit
is not called twice.
The current implementation of
std::process::exit
is unsound on Linux and possibly related platforms where it defers to libc'sexit()
function.exit()
is documented to be not thread-safe, and hencestd::process::exit
is not thread-safe either, despite being a non-unsafe
function.To show that this isn't just a theoretical problem, here is a minimal example that segfaults on my machine (Ubuntu with glibc 2.37):
The example contains
unsafe
code, but only to install exit handlers. AFAICT nothing about thelibc::atexit
call is unsafe. The UB is introduced by callingstd::process::exit
concurrently afterwards.If you are curious, https://github.com/MaterializeInc/database-issues/issues/6528 lays out what causes the segfault (it's a use-after-free). That's not terribly relevant though, given that glibc has no obligation to ensure
exit()
is thread safe when it's clearly documented not to be. Instead, Rust should either markstd::process::exit
asunsafe
(which is something it could do for the 2024 edition), or introduce locking to ensure only a single thread gets to callexit()
at a time.Current status
Mitigation on our end (only covers pure Rust programs): #126606
The current status is that the specification language stems from a time when C did not specify multithreading and the intent was to forbid reentrancy. External discussion indicates that there's intent to fix it both in libc implementations and the specs: