Describe the bug
The readiness probe prevents recovery of large databases. If I restart a node from a clean data directory (e.g. after a hard drive failure), it takes a long time for the server to come back up. Particularly the step "Started downloading snapshot for database XXX" can take several minutes (or longer). Unfortunately, this step never finishes, because the readiness probe always kills the container before the download is done.
For now I have set the timeout to one hour via the Helm chart values YAML, but it would be great if there were a more intelligent way to do this, since having a shorter timeout is definitely useful under normal circumstances and I don't always want to reinstall the Helm chart and restart the whole deployment before and after a single-node recovery.
To Reproduce
Steps to reproduce the behavior:
Create a large database
Kill off one of the nodes and delete its hard drive
Describe the bug The readiness probe prevents recovery of large databases. If I restart a node from a clean data directory (e.g. after a hard drive failure), it takes a long time for the server to come back up. Particularly the step "Started downloading snapshot for database XXX" can take several minutes (or longer). Unfortunately, this step never finishes, because the readiness probe always kills the container before the download is done.
For now I have set the timeout to one hour via the Helm chart values YAML, but it would be great if there were a more intelligent way to do this, since having a shorter timeout is definitely useful under normal circumstances and I don't always want to reinstall the Helm chart and restart the whole deployment before and after a single-node recovery.
To Reproduce Steps to reproduce the behavior:
Expected behavior Recovery should succeed.