stolon-keeper not reaping wal-e zombie processes

itmecho commented 4 years ago

What happened: Deployed a 3 node cluster to Kubernetes and restored from S3 using wal-e. Restore completes successfully and everything looks fine. After a while the keepers are getting terminated due to the fact that there are so many wal-e zombie processes on the underlying node, kernel.pid_max is reached. It seems like stolon-keeper isn't reaping the wal-e process properly during standby mode.

What you expected to happen: wal-e processes should be cleaned up after completion

How to reproduce it (as minimally and precisely as possible): Deploy the stolon-keeper example on a kube cluster using the AMI kope.io/k8s-1.16-debian-stretch-amd64-hvm-ebs-2020-01-17. Restore from S3 using wal-e and run the cluster in standby mode. Each check wal-e does for new WAL files leaves a zombie process on the host.

Anything else we need to know?: I've tried rolling back to stolon v0.13.0 but the problem persists.

Checking the parent pid of the defunct processes shows that the stolon-keeper binary is the parent process but it's state is Ssl. The docker containers for the keepers are no longer there but the process still shows up.

Environment:

Stolon version: v0.16.0
Stolon running environment (if useful to understand the bug): Kubernetes v1.16.7 (AMI: kope.io/k8s-1.16-debian-stretch-amd64-hvm-ebs-2020-01-17)
Others: Postgres 9.6.10

sgotti commented 4 years ago

@itmecho From what you wrote I can infer that you are referring to the postgres restore_command and you're running stolon inside a docker container (also if in the way to reproduce this isn't clarified) right?

If so the restore_command isn't managed by the stolon keeper but by postgres and postgres usually correctly reaps child processes. I just tested it now with a simple restore_command and everything worked correctly.

I suspect that for some reasons your wal-e processes aren't exiting and postgres execute new ones accumulating them.

Then since you're running stolon inside a docker container the pid 1 of the container is stolon and so it'll inherit the wal-e processes (instead of init/systemd outside containers) of the dead postgres.

This is an old known problem when using containers: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/

To fix this the stolon-keeper could also act as a child reaper for such case (but it's not worth doing it) or better execute it with a basic init system inside the container (like using docker --init option or creating an image using https://github.com/krallin/tini when using k8s). But this won't fix the real issue you're reporting.

itmecho commented 4 years ago

Thanks for the reply!

Yea sorry, it's .standbyConfig.archiveRecoverySettings.restoreCommand in the stolon spec. Hmm OK, it's just strange that I have a cluster running an older AMI and kube version and I don't see this problem there. Just wondering why I would need an init system in the container when it worked fine before?

They wouldn't be showing up as zombie processes if they weren't exiting though would they? I found exceptions in the logs where it was failing to get credentials from the AWS metadata api so I increased the timeout for that in boto to 3 seconds and the exceptions have gone away now but the defunct processes are still growing.

You think using the init system would fix it?

sgotti commented 4 years ago

Yea sorry, it's .standbyConfig.archiveRecoverySettings.restoreCommand in the stolon spec. Hmm OK, it's just strange that I have a cluster running an older AMI and kube version and I don't see this problem there. Just wondering why I would need an init system in the container when it worked fine before?

Nothing changed on this side on stolon-keeper on this part.

You should check if it's something related to postgres not correctly reaping them when exiting (but I don't think so). Probably the wal-e processes are running for a long time and in the meantime postgres is restarted by the keeper (or crashed).

You think using the init system would fix it?

Yes, an init system that reaps processes inherited by a dead parent will remove zombie processes. But you should check why this is happening (see above).

Are you using k8s or just docker? For standalone docker you could just use the --init option.

For k8s you can:

Enable pod shared pid namespace option (https://kubernetes.io/docs/tasks/configure-pod-container/share-process-namespace/)

or

Create an image that uses something like https://github.com/krallin/tini (its docs explains how to create such image).

itmecho commented 4 years ago

My bad, this was an issue with prefetch being set when it shouldn't be! Setting it to 0 in the restore command has fixed the problem!

Thanks for your help and suggestions

crdemon09 commented 1 year ago

My bad, this was an issue with prefetch being set when it shouldn't be! Setting it to 0 in the restore command has fixed the problem!

Thanks for your help and suggestions

Could you please explain how you set this prefetch command? How the full command looks in your restore_command?

sorintlab / stolon

stolon-keeper not reaping wal-e zombie processes #777