Open aelbarkani opened 5 years ago
I've bumped into similar problem,
Back-off restarting
keeps happening because liveness probe returns connection refused. This leads the container to restart and from scratch the WAL recovery.
Exited Containers that are restarted by the kubelet are restarted with an exponential back-off delay (10s, 20s, 40s …) capped at five minutes, and is reset after ten minutes of successful execution
If it's not too much data (<5 min) the WAL will be completed, but then the container restarts after the wait period.
At least increasing the default initialDelaySeconds
seems to be necessary for even basic use cases...
Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT
Version of Helm and Kubernetes: 1.10
Which chart: stable/influxdb
What happened: When a WAL recovery lasts too much the liveness probe fails, causing a CrashLoopbackOff error.
What you expected to happen: The liveness probe shouldn't fail while the db is recovering (only readiness probe). Otherwise the DB will never be able to recover.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know: duplicate of https://github.com/helm/charts/issues/10405