pgcentralfoundation / pgrx

Build Postgres Extensions with Rust!
Other
3.71k stars 249 forks source link

Check wakeup_flags from `wait_latch` function to exit on `WL_POSTMASTER_DEATH` #1938

Closed var77 closed 5 days ago

var77 commented 3 weeks ago

There are some cases when the wait_latch loop in background worker is not being interrupted after the postmaster was exited.

It can be reproduced by running postgres directly using /opt/homebrew/opt/postgresql@17/bin/postgres -D /opt/homebrew/var/postgresql@17 and after the background worker will be started you will need to send SIGKILL to the postmaster process. (the issue is not reproducible when managing postgres via pg_ctl, I have encountered this issue during local development as homebrew service runs postgres directly using postgres binary)

Then you can see that the postgres will be exited, but background worker process will remain active.

Simple bgworker extension code to reproduce the issue:

use pgrx::{bgworkers::*, prelude::*};
use std::time::Duration;

::pgrx::pg_module_magic!();

#[allow(non_snake_case)]
#[pg_guard]
pub extern "C" fn _PG_init() {
    BackgroundWorkerBuilder::new("bgworker_latch")
        .set_function("background_worker_main")
        .set_library("bgworker_latch")
        .enable_spi_access()
        .load();
}

#[pg_guard]
#[no_mangle]
pub extern "C" fn background_worker_main() {
    BackgroundWorker::attach_signal_handlers(SignalWakeFlags::SIGHUP | SignalWakeFlags::SIGTERM);

    while BackgroundWorker::wait_latch(Some(Duration::from_secs(5))) {
        log!("bgworker loop");
        std::thread::sleep(std::time::Duration::from_secs(1));
    }
}

It also suggests in the documentation to check for return code after calling WaitLatch to make sure the postmaster was not exited.

https://www.postgresql.org/docs/current/bgworker.html

Make sure the WL_POSTMASTER_DEATH flag is set when calling that function, and verify the return code for a prompt exit in the emergency case that postgres itself has terminated.