rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.61k stars 12.74k forks source link

thread_local macro stability precludes safe async signal handling #30003

Open mahkoh opened 8 years ago

mahkoh commented 8 years ago

The thread_local! macro accepts arbitrary (non-Sync) objects to be put into thread local storage. It is not hard to construct a case where this causes signal handlers to observe inconsistent state:

extern {
    fn signal(num: i32, handler: extern fn(i32)) -> extern fn(i32);
}

use std::cell::{RefCell};

/// First and second value always the same.
thread_local!(static X: RefCell<(usize,usize)> = RefCell::new((0,0)));

extern fn handler(_: i32) {
    X.with(|x| {
        let x = x.borrow();
        println!("{:?}", *x);
    });
}

fn main() {
    unsafe { signal(2, handler); }
    X.with(|x| {
        let mut x = x.borrow_mut();
        x.0 += 1;
        // raise(2)
        x.1 += 1;
    });
}

RefCell is not signal safe. A mutable borrow will not mark the RefCell as being borrowed in this case. This can be simulated as follows:

Expected result: panic/abort/segfault or similar. Actual result: (1, 0) is printed.

Fixing RefCell by adding a memory barrier does not fix the problem since there might be many other non-Sync types that were not written with signal handling in mind and that use unsafe constructs. For correctness, TLS would have to be restricted to types that are async safe via a new marker trait. With such a trait, signal handling would be safe by default in all rust code and all signals handlers could call arbitrary rust functions (as long as said functions don't call non-rust code which might not be async safe.)


This concerns me because adding a signal handler is a safe operation in lrs and all #[thread_local] objects that require mutation are wrapped in a single threaded mutex implementation with interior mutability. And if it is decided that async signal handling is never safe in rust, then #[thread_local] might be stabilized and might also start to accept arbitrary objects which would practically force me to create a full compiler fork for the sake of safety. The current implementation in lrs is already unsafe because the single threaded mutex implementation has to be marked Sync to be placed in a #[thread_local]. For correctness, there would have to be the above mentioned marker trait that restricts what can be put in a #[thread_local].

eefriedman commented 8 years ago

There are other substantial problems with asynchronous signal handlers... for example, malloc() isn't signal-safe. It might be possible to deal with that... but it would be a lot of work, a lot of if in_signal_hander { signal_handler_abort() }, and a bunch of changes to crates outside the standard library which wrap C libraries. Even if we did manage to pull it off, signal handlers would still be a minefield of aborts and/or deadlocks, so making them safe doesn't actually gain much.

Given all of that, it's extremely unlikely the standard library will ever provide safe async signal handlers.


#[thread_local] is not going to be stabilized as-is; it's possible that accessing a raw thread_local global will be considered unsafe in the final design.

mahkoh commented 8 years ago

There are other substantial problems with asynchronous signal handlers... for example, malloc() isn't signal-safe.

It is certainly possible to write a signal-safe malloc and it is also possible to write programs that don't call any C functions. Furthermore, there are probably also malloc implementations out there that are not thread-safe. Yet this does not stop the language from requiring statics to be Sync.

steveklabnik commented 5 years ago

Triage: this is still marked as needs-decision for @rust-lang/lang, but i'm not sure there was ever a discussion on it.

Personal opinion: the root issue here is the unsafe, and so it's not likely to be fixed.

mkroening commented 1 year ago

I just published InterruptRefCell, which uses a compiler fence to synchronize with signal handlers and interrupt handlers to share a RefCell with them. In the original example at the top, the signal would be deferred until after the RefCell is not accessed from the main thread anymore. Of course, one could explicitly enable signals while accessing an InterruptRefCell, though.

RalfJung commented 1 year ago

In the original example at the top, the signal would be deferred until after the RefCell is not accessed from the main thread anymore.

Which mechanism achieves that?

mkroening commented 1 year ago

Which mechanism achieves that?

Ah, sorry. I mask (disable) all signals when borrowing InterruptRefCell and restore the previous mask once the borrow is done. When masked, incoming signals are deferred.

RalfJung commented 1 year ago

Ah okay, so that way the RefCell is "thread-safe" for signals. Makes sense.

Enabling signals while having the InterruptRefCell open should even be fine since other "threads" will see borrow flag and avoid giving a 2nd borrow, right? The concurrency due to signals/interrupts only matters during borrow/borrow_mut to avoid race conditions. (And there might have to be some fences as well to avoid reordering.)

mkroening commented 1 year ago

Yeah, enabling them should be "fine" but unwanted, since trying to borrow mutably in a signal handler while the main context holds another reference would not work.

So losing mutual exclusion with signal handlers on RefCells is safe but bad, similar to deadlocks and mutexes, I would say.

mkroening commented 1 year ago

And there might have to be some fences as well to avoid reordering.

Yeah, compiler fences are the base for this.

RalfJung commented 1 year ago

Note that you absolutely need to have interrupts disabled whenever the RefCell's borrow flags are accessed. This is crucial for soundness. So doing just "best-effort" interrupt disabling is unsound. See https://github.com/mkroening/interrupt-ref-cell/issues/5.

So losing mutual exclusion with signal handlers on RefCells is safe but bad, similar to deadlocks and mutexes, I would say.

No, it's unsound. You get a data race between the thread releasing its hold of the RefCell and the handler trying to acquire it.