Open nh2 opened 3 years ago
So just a few quick thoughts. The htop
output is showing a postgres
process that is a child of initdb
. This is a situation that occurs early on during creation of the initial db cluster. This postgres
instance is only used for this purpose and is not the postgres
process that tests will communicate with during the testing process.
Here is where the initdb
is called: https://github.com/jfischoff/tmp-postgres/blob/593e3ebcb7643afd6095f6269de3552d01c7ff40/src/Database/Postgres/Temp/Internal/Core.hs#L317
I would run tmp-postgres
with full verbosity to get all of the output from initdb
.
You can do that with this config: https://github.com/jfischoff/tmp-postgres/blob/593e3ebcb7643afd6095f6269de3552d01c7ff40/src/Database/Postgres/Temp/Internal.hs#L257
Although, I would assume the output would be visible already in the strace.
Hi, for static-haskell-nix I run a lot of packages in its CI.
For
tmp-postgres
, that CI seems to have discovered a way to get the test suite stuck indefinitely after a postgres crash (unclear of what nature that crash exactly is,coredumpctl
suggests signals3
and6
were involved).The failure doesn't seem to be deterministic, because sometimes the tests ran through fine.
With the test hanging for hours, this is the process tree as shown in
htop
:This is how I observe the postgres crash/shutdown:
Unfortunately I didn't have core dump files enabled so I can't
gdb
into the postgres process to get more info.To provide additional data, here are
strace
invocations of the 3 involved processes, in case that helps:The CI run in which this occurred is here.
If I were to make a super rough guess at what's happening, I'd say that there's a code path that allows some postgres process to die (for whatever reason), but that information isn't propagated up (or blocked on some IO pipe) to shut down the other processes and display an error message.
This is really just a drive-by issue report, as I'm not currently a user of
tmp-postgres
nor have a good understanding of its working.But maybe it can be useful for you to catch the odd async-exception-not-handled or other race condition, thus improving the package.
Cheers!