kubernetes / node-problem-detector

This is a place for various problem detectors running on the Kubernetes nodes.
Apache License 2.0
3.02k stars 633 forks source link

NPD with custom plugins can spawn zombie processes #941

Open xuegege5290 opened 3 months ago

xuegege5290 commented 3 months ago

Custom plugins may spawn zombie processes, such as:

# in NPD container
Z    Wed Nov 30 23:38:18 2022 netwo <defunct> 1221841
Z    Wed Nov 30 23:38:46 2022 netwo <defunct> 1223132
Z    Wed Nov 30 23:38:47 2022 check <defunct> 1223134
Z    Wed Nov 30 23:38:49 2022 netwo <defunct> 1223183
Z    Wed Nov 30 23:38:53 2022 check <defunct> 1223378
Z    Wed Nov 30 23:39:48 2022 netwo <defunct> 1225778
# on host
root       83196  0.2  0.0      0     0 ?        Z    23:38   0:00 [network_problem] <defunct>
root       83198  0.1  0.0      0     0 ?        Z    23:38   0:00 [check_thread_co] <defunct>
root       83201  0.0  0.0      0     0 ?        Z    23:38   0:00 [network_problem] <defunct>
root       83207  0.0  0.0      0     0 ?        Z    23:38   0:00 [check_file_nr.s] <defunct>
root       83293  0.8  0.0      0     0 ?        Z    23:39   0:00 [network_problem] <defunct>

network_problem.sh is an official plugin.

I try to debug it with strace and execsnoop.

I found NPD spawn the bash process, and bash spawn a sub process because of conntrack_count=$(< $CT_COUNT_PATH). If the system load is so high or some thing else, the first bash process is timeout and killed by NPD and the sub bash would be killed by SIGPIPE. NPD will call wait4 with pid of bash process, but never wait the sub bash process.

So should we let NPD regularly recycle zombie processes to avoid system crashes?

### Tasks
xuegege5290 commented 3 months ago

/remove-lifecycle stale

xuegege5290 commented 3 months ago

same problem https://github.com/kubernetes/node-problem-detector/issues/726

xuegege5290 commented 3 months ago

@Random-Liu @wangzhen127

wenjianhn commented 1 week ago

@xuegege5290 please verify if dumb-init is able to reap the zombie processes.