fix: shutdown does not kill walredo processes

While investigating Pageserver logs from the cases where systemd hangs during shutdown (https://github.com/neondatabase/cloud/issues/11387), I noticed that even if Pageserver shuts down cleanly[^1], there are lingering walredo processes.

[^1]: Meaning, pageserver finishes its shutdown procedure and calls exit(0) on its own terms, instead of hitting the systemd unit's TimeoutSec= limit and getting SIGKILLed.

While systemd should never lock up like it does, maybe we can avoid hitting that bug by cleaning up properly.

Changes

This PR adds a shutdown method to WalRedoManager and hooks it up to tenant shutdown.

We keep track of intent to shutdown through the new enum ProcessOnceCell stored inside the pre-existing redo_process field. A gate is added to keep track of running processes, using the new type struct Process.

Future Work

Requests that don't need the redo process will not observe the shutdown (see doc comment). Doing so would be nice for completeness sake, but doesn't provide much benefit because Tenant and Timeline already shut down all walredo users.

Testing

I did manual testing to confirm that the problem exists before this PR and that it's gone after. Setup:

neon_local with a single tenant, create some data using pgbench
ensure walredo process is running, not pid
watch strace -e kill,wait4 -f -p "$(pgrep pageserver)"
neon_local pageserver stop

With this PR, we always observe

$ strace -e kill,wait4 -f -p "$(pgrep pageserver)"
...
[pid 591120] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=591215, si_uid=1000} ---
[pid 591134] kill(591174, SIGKILL)      = 0
[pid 591134] wait4(591174,  <unfinished ...>
[pid 591142] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=591174, si_uid=1000, si_status=SIGKILL, si_utime=0, si_stime=0} ---
[pid 591134] <... wait4 resumed>[{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], 0, NULL) = 591174
...
+++ exited with 0 +++

Before this PR, we'd usually observe just

...
[pid 596239] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=596455, si_uid=1000} ---
...
+++ exited with 0 +++

Refs

refs https://github.com/neondatabase/cloud/issues/11387

neondatabase / neon