metrics and objects deployments generating tons of zombie processes and using up cluster node process limits

What happened: Deploying metrics, metrics aggregator and kube-objects (all images tagged with 1.2.1) seems to cause lots of zombie processes to be created on the cluster node where the deployment is and eventually cluster node is overwhelmed and crashes (Amazon EKS 1.22)

What you expected to happen: Metrics and object collections should function normally.

How to reproduce it (as minimally and precisely as possible): Deploy Splunk Connect with below YAML

global: logLevel: info
splunk: hec: host: http-inputs-hoopp.splunkcloud.com insecureSSL: false port: 443 protocol: https token: splunk-kubernetes-logging: enabled: true journalLogPath: /var/log/journal logs: isg-containers: logFormatType: cri from: container: isg- pod: '*' multiline: firstline: /^\d{4}-\d{2}-\d{2} \d{1,2}:\d{1,2}:\d{1,2}.\d{3}/ sourcetype: kube:container timestampExtraction: format: '%Y-%m-%d %H:%M:%S.%NZ' regexp: time="(?\d{4}-\d{2}-\d{2}T[0-2]\d:[0-5]\d:[0-5]\d.\d{9}Z)" image: registry: docker.io name: splunk/fluentd-hec tag: 1.3.1 pullPolicy: Always resources: limits: memory: 1.5Gi splunk: hec: indexName: eks_logs splunk-kubernetes-metrics: image: registry: docker.io name: splunk/k8s-metrics tag: 1.2.1 pullPolicy: Always imageAgg: registry: docker.io name: splunk/k8s-metrics-aggr tag: 1.2.1 pullPolicy: Always rbac: create: true serviceAccount: create: true name: splunk-kubernetes-metrics splunk: hec: indexName: eks_metrics splunk-kubernetes-objects: image: registry: docker.io name: splunk/kube-objects tag: 1.2.1 pullPolicy: Always kubernetes: insecureSSL: true objects: apps: v1:

interval: 30s name: deployments
interval: 30s name: daemon_sets
interval: 30s name: replica_sets
interval: 30s name: stateful_sets core: v1:
interval: 30s name: pods
interval: 30s name: namespaces
interval: 30s name: nodes
interval: 30s name: services
interval: 30s name: config_maps
interval: 30s name: secrets
interval: 30s name: persistent_volumes
interval: 30s name: service_accounts
interval: 30s name: persistent_volume_claims
interval: 30s name: resource_quotas
interval: 30s name: component_statuses
mode: watch name: events rbac: create: true serviceAccount: create: true name: splunk-kubernetes-objects splunk: hec: indexName: eks_meta Anything else we need to know?:

Scaling down the deployment for metrics and objects to 0 makes the zombie processes disappear immediately Environment:

Kubernetes version (use kubectl version): EKS 1.22
Ruby version (use ruby --version):
OS (e.g: cat /etc/os-release):
NAME="Amazon Linux" VERSION="2" ID="amzn" ID_LIKE="centos rhel fedora" VERSION_ID="2" PRETTY_NAME="Amazon Linux 2" ANSI_COLOR="0;33" CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2" HOME_URL="https://amazonlinux.com/"
Splunk version: - check YAML above
Splunk Connect for Kubernetes helm chart version: 1.4.3
Others:

splunk / splunk-connect-for-kubernetes

metrics and objects deployments generating tons of zombie processes and using up cluster node process limits #857