pgcentralfoundation / pgrx

Build Postgres Extensions with Rust!
Other
3.57k stars 242 forks source link

SpiTupleTable is unsound #1209

Open david-monroe opened 1 year ago

david-monroe commented 1 year ago

While investigating memory reclamation of SpiTupleTable, I realized that its lifetime is not bound by the lifetime of the surrounding SPI connection. Therefore, SPI_finish frees the memory referenced by the SpiTupleTable without restricting the lifetime of the SpiTupleTable.

CREATE OR REPLACE FUNCTION three()
    RETURNS TEXT LANGUAGE plrust
AS
$$
    // create the cursor we actually care about
    let mut res = Spi::connect(|c| c.open_cursor("select 'hello' from generate_series(1, 10000)", None).fetch(10000).unwrap());

    // here we just perform some allocations to make sure that the previous cursor gets invalidated
    for i in 0..1000 {
        Spi::connect(|c| c.open_cursor("select 1", None).fetch(1).unwrap());
    }

    // later elements are probably more likely to point to deallocated memory
    for i in 0..1000 {
        res.next();
    }

    // segfault
    Ok(res.next().unwrap().get::<String>(1)?)
$$;

select three();
2023-07-13 17:37:11.654 UTC [51] LOG:  server process (PID 58) was terminated by signal 11: Segmentation fault
2023-07-13 17:37:11.654 UTC [51] DETAIL:  Failed process was running: select three()
2023-07-13 17:37:11.654 UTC [51] LOG:  terminating any other active server processes
2023-07-13 17:37:11.655 UTC [51] LOG:  all server processes terminated; reinitializing

While this example uses cursors, the same is possibly true for plain queries as well.

eeeebbbbrrrr commented 1 year ago

Thanks. We'll look into this immediately.

eeeebbbbrrrr commented 1 year ago

https://github.com/pgcentralfoundation/pgrx/pull/1210 fixes this such that the above code won't compile anymore, which I think is the correct answer here.

My pgrx-based interpretation of what this code should look like is:

#[pg_extern]
fn issue1209() -> Result<Option<String>, Box<dyn std::error::Error>> {
    let res = Spi::connect(|c| {
        let mut cursor = c.open_cursor("SELECT 'hello' FROM generate_series(1, 10000)", None);
        let table = cursor.fetch(10000)?;
        table.into_iter().map(|row| row.get(1)).collect::<Result<Vec<_>, _>>()
    })?;

    Ok(res.first().cloned().flatten())
}

The function body ought to be paste-able into a plrust function.

workingjubilee commented 1 year ago

We added some positive tests in #1210, but to be completely sure, we want a negative test that confirms compile-fail on the now-banned usage. This will require setting up ui_test or something similar.

eeeebbbbrrrr commented 1 year ago

Hi @david-monroe! We've released pgrx v0.9.8 with the fix for this, and I'm wrapping up a v0.10.0-beta.1 as well.

Tomorrow (Friday July 14, 2023) we'll put out a new PL/Rust with this fix. I need some time to incorporate some unit tests over there first.

Thanks for the report and the re-creates. Please let us know about any other soundness and/or usability issues you run across.

david-monroe commented 1 year ago

I'm surprised that you think this issue is closed. Yes, the approach above no longer works, but it seems that you have not thoroughly addressed the lifetime issues.

To issues that came to mind:

  1. By allocating into the parent context, you're now leaking all &str, &[u8] until the end of the transaction. It's no longer possible to use plrust for large datasets containing such types.
CREATE OR REPLACE FUNCTION four()
    RETURNS TEXT LANGUAGE plrust
AS
$$
    let tmp: &'static str = Spi::connect(|c| {
        c.select("select repeat('x', 1024 * 1024)", None, None).unwrap().next().unwrap().get(1).unwrap().unwrap()
    });
    Ok(None)
$$;

begin;

select four(); -- every call to four increases memory usage by 1MB.
  1. It's still possible to corrupt memory by using the incorrect FromDatum implementation:
CREATE OR REPLACE FUNCTION three()
    RETURNS TEXT LANGUAGE plrust
AS
$$
    use std::cell::RefCell;
    thread_local! {
        static RES: RefCell<Option<&'static str>> = RefCell::new(None);
    }

    RES.with(|res| {
        let mut res = res.borrow_mut();
        if res.is_none() {
            let tmp: &'static str = Spi::connect(|c| {
                c.select("select repeat('x', 1024 * 1024)", None, None).unwrap().next().unwrap().get(1).unwrap().unwrap()
            });
            *res = Some(tmp);
        }
        Ok(Some(res.unwrap().to_string()))
    })
$$;

select three();
2023-07-14 09:15:27.875 UTC [51] LOG:  server process (PID 3961) was terminated by signal 11: Segmentation fault
2023-07-14 09:15:27.875 UTC [51] DETAIL:  Failed process was running: select three()

This is a bit hard to reproduce. But it's clear from the code what is going wrong.


I think a thorough solution to this would also ensure that SpiTupleTable properly deallocates memory. If SpiTupleTable properly managed its memory and extracted datums were bound by the lifetime of the SpiTupleTable, then you would not have to allocate into the parent context and neither would you be able to ever construct a &'static str.