pgcentralfoundation / pgrx

Build Postgres Extensions with Rust!
Other
3.6k stars 247 forks source link

Invalid memory reference error when running test in project that sets session_preload_libraries in _bootstart.sql #816

Closed osawyerr closed 1 year ago

osawyerr commented 1 year ago

This happens on MacOS on M1 processor PGX - 0.5.6 _bootstap.sql file is based on a similar file from ZomboDB project.

Restarting my machine fixes the issue and test is successful. There was a similar error that was reported sometime back - https://github.com/tcdi/pgx/issues/357. I'm not sure if its a similar issue here.

To recreate:

Error

error: test failed, to rerun pass '--lib'

Caused by:
  process didn't exit successfully: `....` (signal: 11, SIGSEGV: invalid memory reference)

Not sure how useful this is but I was able to capture a stack trace in my IDE. Message was "Postgres failed to start"

[foo-9a14b65cd5846f08] rust_panic panicking.rs:746
[foo-9a14b65cd5846f08] rust_panic_with_hook panicking.rs:716
[foo-9a14b65cd5846f08] {closure#0} panicking.rs:588
[foo-9a14b65cd5846f08] __rust_end_short_backtrace<std::panicking::begin_panic_handler::{closure_env#0}, !> backtrace.rs:138
[foo-9a14b65cd5846f08] begin_panic_handler panicking.rs:584
[foo-9a14b65cd5846f08] panic_fmt panicking.rs:142
[foo-9a14b65cd5846f08] unwrap_failed result.rs:1814
[foo-9a14b65cd5846f08] expect<(u32, alloc::string::String), std::sync::mpsc::RecvError> result.rs:1064
[foo-9a14b65cd5846f08] monitor_pg framework.rs:558
[foo-9a14b65cd5846f08] start_pg framework.rs:478
[foo-9a14b65cd5846f08] initialize_test_framework framework.rs:230
[foo-9a14b65cd5846f08] run_test framework.rs:124
[foo-9a14b65cd5846f08] pg_test_create_index mod.rs:155
[foo-9a14b65cd5846f08] {closure#0} mod.rs:155
[foo-9a14b65cd5846f08] call_once<foo::schema::tests::pg_test_create_index::{closure_env#0}, ()> function.rs:248
[Inlined] [foo-9a14b65cd5846f08] core::ops::function::FnOnce::call_once function.rs:248
[foo-9a14b65cd5846f08] __rust_begin_short_backtrace<fn()> lib.rs:572
[Inlined] [foo-9a14b65cd5846f08] _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once boxed.rs:1935
[Inlined] [foo-9a14b65cd5846f08] _$LT$core..panic..unwind_safe..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once unwind_safe.rs:271
[Inlined] [foo-9a14b65cd5846f08] std::panicking::try::do_call panicking.rs:492
[Inlined] [foo-9a14b65cd5846f08] std::panicking::try panicking.rs:456
[Inlined] [foo-9a14b65cd5846f08] std::panic::catch_unwind panic.rs:137
[Inlined] [foo-9a14b65cd5846f08] test::run_test_in_process lib.rs:595
[foo-9a14b65cd5846f08] {closure#0} lib.rs:489
[Inlined] [foo-9a14b65cd5846f08] test::run_test::run_test_inner::_$u7b$$u7b$closure$u7d$$u7d$ lib.rs:516
[foo-9a14b65cd5846f08] __rust_begin_short_backtrace<test::run_test::run_test_inner::{closure_env#1}, ()> backtrace.rs:122
[Inlined] [foo-9a14b65cd5846f08] std::thread::Builder::spawn_unchecked_::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$ mod.rs:505
[Inlined] [foo-9a14b65cd5846f08] _$LT$core..panic..unwind_safe..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once unwind_safe.rs:271
[Inlined] [foo-9a14b65cd5846f08] std::panicking::try::do_call panicking.rs:492
[Inlined] [foo-9a14b65cd5846f08] std::panicking::try panicking.rs:456
[Inlined] [foo-9a14b65cd5846f08] std::panic::catch_unwind panic.rs:137
[Inlined] [foo-9a14b65cd5846f08] std::thread::Builder::spawn_unchecked_::_$u7b$$u7b$closure$u7d$$u7d$ mod.rs:504
[foo-9a14b65cd5846f08] call_once<std::thread::{impl#0}::spawn_unchecked_::{closure_env#1}<test::run_test::run_test_inner::{closure_env#1}, ()>, ()> function.rs:248
[Inlined] [foo-9a14b65cd5846f08] _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once boxed.rs:1935
[Inlined] [foo-9a14b65cd5846f08] _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once boxed.rs:1935
[foo-9a14b65cd5846f08] thread_start thread.rs:108
[libsystem_pthread.dylib] _pthread_start 0x00000001aad8c26c
eeeebbbbrrrr commented 1 year ago

Can you share the _PG_init() function? Or can you link to your repo so we can take a look?

thomcc commented 1 year ago

Do we need to pass --test-threads=1 to cargo test in cargo pgx test?

I'd expect this to be an issue, at least (but TBH, I am unsure why I didn't see it in #777 if it is an issue).

eeeebbbbrrrr commented 1 year ago

Naw. cargo pgx test is supposed to support multiple threads.

My hunch is that this extension is doing something with Postgres when every session starts that should only ever be happening once. Which might mean that using session_preload_libraries just isn't correct. It might also point out some kind of issue with pgx itself -- pgx ought not be crashing anyways.

As @osawyerr pointed out over in Discord, ZDB does this and it doesn't have these problems. It also doesn't do anything fancy during _PG_init(). So I'd like to confirm that's the case here.

A backtrace from the postgres session that core dumped would be useful too. It appears the backtrace above is from the outer "cargo test" controller process. You can convince MacOS to just write them to /cores/.

I got a couple myself that have built up over time!

 du -ksh /cores/
147G    /cores/

EDIT: Each"cargo test" thread gets its own connection to Postgres, which is where the extension is being loaded/run. So the test framework is concurrent, but it's ultimately through multiple Postgres processes, not threads.

thomcc commented 1 year ago

Each"cargo test" thread gets its own connection to Postgres, which is where the extension is being loaded/run. So the test framework is concurrent, but it's ultimately through multiple Postgres processes, not threads

Thanks, that makes sense. I suspected that might have been the case (or else I'd have hit this issue with my thread safety checks), but figured I'd mention it regardless.

osawyerr commented 1 year ago

@eeeebbbbrrrr _PG_init() is empty.

I think I can consistently reproduce this now. I added the repo here - https://github.com/osawyerr/foo

MacOS on M1 processor PGX - 0.5.6 Postgres 14.5

eeeebbbbrrrr commented 1 year ago

Incredible.

I can recreate it locally, so that's good (I suppose!). It kinda smells like MacOS isn't reloading the shared library or something. I dunno.

It's also annoying that this is gonna be hard to debug.

Thanks for the report, I'll look into it this week.

eeeebbbbrrrr commented 1 year ago

I just tested on my AMD Linux machine and this behaves as we'd expect. So it's gotta be MacOS-related. Joy!

I think @thomcc is gonna dig into this.

thomcc commented 1 year ago

This is likely related to the cases I described in https://github.com/rust-lang/rust/issues/88737#issuecomment-1178525208, so yeah, I'm probably in a good position to look into it. Worst case we force the dylib to have a distinct name.

thomcc commented 1 year ago

Oh, we already handle this in plrust https://github.com/tcdi/plrust/blob/c1be4dfd5517e466ba3e61c01fc56f8072d21ef0/src/generation.rs#L5-L23. Yeah, we should just do the same thing as there. I think there are probably hacks we could do to get around it (involving TLS dtors), but if we use a distinct name there should be no issue.

workingjubilee commented 1 year ago

@thomcc: "I am pretty sure (this issue) is mostly unrelated to the loader. The actual cause is probably a bug in our unwinding code.

Currently, if we panic in a pg_extern function that is called from inside a pg_test function, then we segfault. this seems to be because libtest ends up dropping some Box twice, although it may be unrelated heap corruption that makes it look that way. Either way, if we exit via a segfault, we don't end up cleaning up after ourselves, and we don't detect the partially messed-up state the next time through. If we get lucky and don't crash, then there's no issue where the function fails to be reloaded.

I think libtest dropping a box twice sounds likely to mean there's an edge-case where we end up returning twice from the same rust function (or something like this)."