Closed c0c0n3 closed 5 years ago
I think it happens when QL becomes unresponsive, and so it's killed by k8s:
Warning Unhealthy 54m (x1061 over 4d21h) kubelet, ip-172-20-60-68.eu-central-1.compute.internal Liveness probe failed: Get http://172.20.44.1:8668/v2/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Normal Pulling 54m (x101 over 4d21h) kubelet, ip-172-20-60-68.eu-central-1.compute.internal pulling image "smartsdk/quantumleap:rc"
Normal Killing 54m (x100 over 4d21h) kubelet, ip-172-20-60-68.eu-central-1.compute.internal Killing container with id docker://quantumleap:Container failed liveness probe.. Container will be killed and recreated.
Normal Pulled 53m (x101 over 4d21h) kubelet, ip-172-20-60-68.eu-central-1.compute.internal Successfully pulled image "smartsdk/quantumleap:rc"
Normal Created 53m (x101 over 4d21h) kubelet, ip-172-20-60-68.eu-central-1.compute.internal Created container
Normal Started 53m (x101 over 4d21h) kubelet, ip-172-20-60-68.eu-central-1.compute.internal Started container
Warning Unhealthy 53m (x3 over 4d21h) kubelet, ip-172-20-60-68.eu-central-1.compute.internal Liveness probe failed: Get http://172.20.44.1:8668/v2/health: dial tcp 172.20.44.1:8668: connect: connection refused
Warning Unhealthy 8m50s (x1127 over 4d21h) kubelet, ip-172-20-60-68.eu-central-1.compute.internal Readiness probe failed: Get http://172.20.44.1:8668/v2/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
i believe this was solved with allowing for yellow state of crate cluster.
We've been experiencing an unusually high number of restarts in our K8s cluster. For example in the last 3 days K8s restarted QL 103 and 99 times in each of the two pods, respectively.