Closed macEar closed 3 years ago
I have added liveness probes to all pods to chart v1.4.9. Could you upgrade to it ?
Yes, we will upgrade shortly and after I will write back if it helps or not
We upgraded to v1.4.9 and decided to take some time to observe if the problem shows again. If no problem shows, I guess we could close this issue. I'll report back in a week.
So far so good. I close this issue and I will reopen it if the problem shows up again. Thanks.
What happened: After fluend worker inside splunk pods unexpectedly finishes with signal SIGKILL, it leaves defunct processes behind. In our case we set insufficient cpu limits for splunk logging pods and fluentd process was constantly finishing its work leaving zombie processes. In the pod logs we can see the following messages:
We noticed that splunk left more than 2000 defunct processes in a day. Here is the shortened output of
ps -ef --forest
command where we can see that the parent process is fluentd:FYI, our limits settings from
values.yml
:What you expected to happen: No zombie processes
How to reproduce it (as minimally and precisely as possible):
ps -ef --forest
.Environment:
kubectl version
): v1.19.3cat /etc/os-release
): CentOS Linux 7.9.2009