Tracking Issue for `panic::update_hook`

Badel2 commented 2 years ago

Feature gate: #![feature(panic_update_hook)]

This is a tracking issue for the panic::update_hook function.

This function allows creating panic hooks that forward the call to the previously set panic hook, without race conditions. It works by taking a closure that receives the old panic hook and the panic info, and it is atomically set as the panic hook by ensuring that during the execution of panic::update_hook no other thread can modify the panic hook.

This common use case:

let prev = panic::take_hook();
panic::set_hook(Box::new(move |info| {
    println!("panic handler A");
    prev(info);
}));

Becomes:

panic::update_hook(|prev, info| {
    println!("panic handler A");
    prev(info);
});

Using the old approach is fine if there is some guarantee that there are no other threads trying to change the panic hook. The main goal of this feature is to enable the use case of safely adding panic hooks after the application has been initialized, because the existing functions panic::set_hook and panic::take_hook are not enough to implement that, consider this case:

Thread A calls panic::take_hook()
Thread B calls panic::take_hook()
Thread A calls panic::set_hook()
Thread B calls panic::set_hook()

And the result is that the original panic hook has been lost, as well as the panic hook set by thread A. The resulting panic hook will be the one set by thread B, which forwards to the default panic hook. Using the new panic::update_hook function, this race condition is impossible, and the result will be either A, B, original or B, A, original.

Public API

// std::panic

pub fn update_hook<F>(hook_fn: F)
where
    F: Fn(&(dyn Fn(&PanicInfo<'_>) + Send + Sync + 'static), &PanicInfo<'_>)
        + Sync
        + Send
        + 'static,

Steps / History

[x] Implementation: #92598
[ ] Final comment period (FCP)
[ ] Stabilization PR

Unresolved Questions

Is the API flexible enough for most use cases? For example it is not possible to drop the previous handler, but if the user wanted to replace the old handler they could just use panic::set_hook. An alternative implementation is to take a closure that creates a panic handler, like this (available in commit 8bdf5c3de6c6e4e01f7f6241cd0f2a606c7486df):

panic::update_hook(|prev| {
    Box::new(move |info| {
        println!("panic handler A");
        prev(info);
    })
});

That approach should be more flexible because it transfers ownership to the closure, but if the outermost closure panics the error message is "panicked while processing panic" which is not nice. This is a consequence of holding the HOOK_LOCK while executing the closure. Could be avoided using catch_unwind?

dtolnay commented 1 year ago

In #92598, I see that proc-macro2 is featured as the real-world motivation for this API. However I don't see how update_hook would be pertinent to the race condition faced by proc-macro2. Calling the original panic hook from its own panic hook is not what proc-macro2 wants to do.

Badel2 commented 1 year ago

Hi @dtolnay, as I understand it, proc-macro2 wants to silence panics inside of a block of code. This is achieved using a custom panic handler that ignores any errors. This panic handler is set using the following code:

    let null_hook: Box<PanicHook> = Box::new(|_panic_info| { /* ignore */ });
    let sanity_check = &*null_hook as *const PanicHook;
    let original_hook = panic::take_hook();
    panic::set_hook(null_hook);

    let works = panic::catch_unwind(proc_macro::Span::call_site).is_ok();
    WORKS.store(works as usize + 1, Ordering::Relaxed);

    let hopefully_null_hook = panic::take_hook();
    panic::set_hook(original_hook);
    if sanity_check != &*hopefully_null_hook {
        panic!("observed race condition in proc_macro2::inside_proc_macro");
    }

With panic::update_hook, that code can be simplified to:

    // Could also use a thread local to silence panics only in this thread
    static SILENCE_PANICS: AtomicBool = AtomicBool::new(true);

    panic::update_hook(|prev, info| {
        if !SILENCE_PANICS.load(Ordering::SeqCst) {
            prev(info);
        }
    });

    let works = panic::catch_unwind(proc_macro::Span::call_site).is_ok();
    WORKS.store(works as usize + 1, Ordering::Relaxed);

    SILENCE_PANICS.store(false, Ordering::SeqCst);

It is not ideal because the update_hook code does not unset the panic handler, making any future panics less performant. However I wasn't able to figure out a way to safely "unset" the panic handler after it is done, as it may have been overwritten by another call to update_hook.

The only possible race condition is if another thread uses panic::set_hook to overwrite the hook. I assumed that won't happen if panic::update_hook exists, but it is true that it may happen. I guess it would be useful to have an API that allows to compare a hook to the current hook and see if that race condition did happen? Also I am not sure if the current API is the best one, as it makes it impossible to drop the "prev" handler. So suggestions to improve that are welcome.

Another alternative that crossed my mind is to make panic::set_hook return the old hook. Not sure if that's possible, and the API would be harder to use, but we also may want to consider that.

And I was just looking at the tracking issue for std::alloc::set_alloc_error_hook, which has a similar API but it is not stable yet, so it would be nice to ensure that the issue of race conditions is avoided there from the beginning.

HadrienG2 commented 11 months ago

One use case for this API would be safely inhibiting the default chatty panic hooks in tests that assert that functions should panic.

By default, the panic hook prints an error message and an optional backtrace every time a panic happens, irrespective of whether that panic is later caught or not. This is a reasonable design compromise given that it simplifies the implementation and that catching panics is rare, but in tests that exercise panicky code, this results in the emission of hundreds of megabytes of backtraces when RUST_BACKTRACE is on, which can become a performance problem.

As an alternative, I added an assert_panics() function to my tests, which asserts that a function panics without emitting all the standard panic hook output to stderr. This works by installing a wrapper to the default panic hook which can be silenced on a per-thread-basis, and activating/deactivating this panic hook silencing in assert_panics() using a simple RAII guard.

Unfortunately, as a code comment in the implementation suggests, this isn't 100% safe in the presence of multi-threaded testing (which is cargo test's default execution mode) because another test could well come and race to replace the global panic hook with something else. And because #[test] doesn't have access to the sequential main() function of cargo test, there is no way I could move the panic hook replacement to the safer sequential part of the program to avoid this.

An atomic panic hook replacement function like update_hook() could be one way to resolve this, and for this particular use case, not being able to uninstall the panic hook replacement wouldn't be a problem.

Alternatively, given that it seems other people are facing the same problem and each coming up with their own solution (see proc-macro2 discussion above), including something like assert_panics() to the standard library might be worth considering. In this case, you'd probably want to integrate a more general catch_unwind_quiet() primitive which has an API similar to catch_unwind but disables all the printing. This way, people can decide how they want to handle panics or lack thereof.

rust-lang / rust