Open marech opened 5 years ago
I will take an uneducated guess that runc is failing to fully start the container and hangs. I have seen with podman running ceph containers runc fails to open /dev/stderr on ubuntu 18.04.2 and hangs. Systemd auto restarts podman on failure multiple times leaving runc processes hanging and making system load higher to the point that systemctl daemon-reload takes 10 - 15 sec to compelte if not longer.
Weird part is that nothing crashes, like all containers are up and running. Good point about OS, will try meanwhile 16.04 ubuntu or any other suggestions which way i should dig? :)
It seems to me that your logs was filtered showing only lines contains runc. If that is the case, you'd better check logs around to find out what the first command attaching to runc.
Tried on ubuntu 16.04, sadly same problem.
@zq-david-wang yep logs were filtered too much, here i gathered little bit more forkstat.log
That sh /probes/readiness.sh 6379
looks suspicious.
How exactly spawning processes in existing container works? Something like : kubelet(or i guess any other script which interacts with runc) wants to run script -> delegates it to runc -> runc then attaches to existing container -> spawns and executes script -> script finishes -> runc detaches from container?
kubelet iiveness/readiness probe period is tunable, if you suspect that is the problem, you can disable liveness/readiness probe to verify.
yep liveness/readiness probes are the root cause, disabled them and load dropped by 10-15%.
forkstat
now is calm, does not spam about runc
Thanks for help!
Hey!
Im trying to debug high load on servers and somehow it lead me to runc.
So the problem is that when i run
forkstat
command on server i see a lot of following output and it does not stop:Could someone give some insights what this means or how to read, interpret it? Like something in containers are forking a lot?
The whole story is helm redis-ha chart related, when i deploy redis-ha to kubernetes cluster.