Closed b43646 closed 6 months ago
It seems an under-provision problem, nothing to do with the memory leak. The koordlet collects and stores a series of pod-level metrics, so its memory usage has a linear correlation with the number of pods on the node. Since the koordlet is initialized with an empty TSDB, it can have a rise of RSS by the pod metrics collection.
hi jason
The following configuration is for reference only, all generated after installation by Helm.
[root@k1 test]# kubectl -n koordinator-system get ds -o yaml
apiVersion: v1
items:
- apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
deprecated.daemonset.template.generation: "1"
meta.helm.sh/release-name: koordinator
meta.helm.sh/release-namespace: default
creationTimestamp: "2024-04-02T06:56:51Z"
generation: 1
labels:
app.kubernetes.io/managed-by: Helm
koord-app: koordlet
name: koordlet
namespace: koordinator-system
resourceVersion: "379111"
uid: def34102-53df-4a8e-91df-b62de603dab2
spec:
minReadySeconds: 10
revisionHistoryLimit: 10
selector:
matchLabels:
koord-app: koordlet
template:
metadata:
creationTimestamp: null
labels:
koord-app: koordlet
runtimeproxy.koordinator.sh/skip-hookserver: "true"
spec:
containers:
- args:
- -cgroup-root-dir=/host-cgroup/
- -feature-gates=BECPUEvict=true,BEMemoryEvict=true,CgroupReconcile=true,Accelerators=true
- -runtime-hooks-host-endpoint=/var/run/koordlet/koordlet.sock
- --logtostderr=true
- --v=4
command:
- /koordlet
env:
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: registry.cn-beijing.aliyuncs.com/koordinator-sh/koordlet:v1.4.1
imagePullPolicy: Always
name: koordlet
resources:
limits:
cpu: 500m
memory: 256Mi
requests:
cpu: "0"
memory: "0"
securityContext:
allowPrivilegeEscalation: true
capabilities:
add:
- SYS_ADMIN
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/localtime
name: host-time
readOnly: true
- mountPath: /host-cgroup/
name: host-cgroup-root
- mountPath: /host-sys-fs/
mountPropagation: Bidirectional
name: host-sys-fs
- mountPath: /host-var-run/
name: host-var-run
readOnly: true
- mountPath: /host-run/
name: host-run
readOnly: true
- mountPath: /host-var-run-koordlet/
mountPropagation: Bidirectional
name: host-var-run-koordlet
- mountPath: /prediction-checkpoints
mountPropagation: Bidirectional
name: host-koordlet-checkpoint-dir
- mountPath: /host-sys/
name: host-sys
readOnly: true
- mountPath: /etc/kubernetes/
name: host-kubernetes
readOnly: true
- mountPath: /host-etc-hookserver/
mountPropagation: Bidirectional
name: host-etc-hookserver
- mountPath: /var/lib/kubelet
name: host-kubelet-rootdir
readOnly: true
- mountPath: /dev
mountPropagation: HostToContainer
name: host-dev
- mountPath: /metric-data/
name: metric-db-path
dnsPolicy: ClusterFirst
hostNetwork: true
hostPID: true
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: koordlet
serviceAccountName: koordlet
terminationGracePeriodSeconds: 10
tolerations:
- operator: Exists
volumes:
- hostPath:
path: /etc/localtime
type: ""
name: host-time
- hostPath:
path: /sys/fs/cgroup/
type: ""
name: host-cgroup-root
- hostPath:
path: /sys/fs/
type: ""
name: host-sys-fs
- hostPath:
path: /var/run/
type: ""
name: host-var-run
- hostPath:
path: /run/
type: ""
name: host-run
- hostPath:
path: /var/run/koordlet
type: DirectoryOrCreate
name: host-var-run-koordlet
- hostPath:
path: /var/run/koordlet/prediction-checkpoints
type: DirectoryOrCreate
name: host-koordlet-checkpoint-dir
- hostPath:
path: /sys/
type: ""
name: host-sys
- hostPath:
path: /etc/kubernetes/
type: ""
name: host-kubernetes
- hostPath:
path: /etc/runtime/hookserver.d/
type: ""
name: host-etc-hookserver
- hostPath:
path: /var/lib/kubelet/
type: ""
name: host-kubelet-rootdir
- hostPath:
path: /dev
type: ""
name: host-dev
- emptyDir:
medium: Memory
sizeLimit: 150Mi
name: metric-db-path
updateStrategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 20%
type: RollingUpdate
status:
currentNumberScheduled: 3
desiredNumberScheduled: 3
numberAvailable: 1
numberMisscheduled: 0
numberReady: 3
numberUnavailable: 2
observedGeneration: 1
updatedNumberScheduled: 3
kind: List
metadata:
resourceVersion: ""
@saintube Even without adding new pods, the memory consumption of koordlet keeps increasing. As a result, koordlet is eventually killed due to OOM.
Based on today's communication in the group, I started validating the configuration of koordlet with 512MB specifications in the afternoon, and deployed 200 pods. The verification environment is the same as described in the above issue.
From the Prometheus monitoring data, it can be observed that the memory usage of koordlet is continuously increasing. Increasing the configuration specifications can only delay the occurrence of OOM events. The relevant monitoring information is as follows:
[root@k1 test]# date
Tue Apr 2 08:43:33 GMT 2024
[root@k1 test]# kc -n koordinator-system get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
koord-descheduler-dc5dc679c-fs6cm 1/1 Running 0 106m 10.234.201.20 k1 <none> <none>
koord-descheduler-dc5dc679c-whw6q 1/1 Running 0 106m 10.234.29.178 k3 <none> <none>
koord-manager-db6f4bdb9-mlsvs 1/1 Running 0 106m 10.234.24.91 k2 <none> <none>
koord-manager-db6f4bdb9-vtt46 1/1 Running 0 106m 10.234.29.167 k3 <none> <none>
koord-scheduler-7db78c8867-tg8q6 1/1 Running 0 106m 10.234.29.188 k3 <none> <none>
koord-scheduler-7db78c8867-xdmk5 1/1 Running 0 106m 10.234.24.101 k2 <none> <none>
koordlet-hbdmp 1/1 Running 0 99m 10.0.0.62 k1 <none> <none>
koordlet-mjlrx 1/1 Running 0 100m 10.0.0.106 k3 <none> <none>
koordlet-x9ddf 1/1 Running 0 99m 10.0.0.100 k2 <none> <none>
[root@k1 test]# date
Tue Apr 2 13:44:01 GMT 2024
[root@k1 test]# kc -n koordinator-system get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
koord-descheduler-dc5dc679c-fs6cm 1/1 Running 0 6h47m 10.234.201.20 k1 <none> <none>
koord-descheduler-dc5dc679c-whw6q 1/1 Running 0 6h47m 10.234.29.178 k3 <none> <none>
koord-manager-db6f4bdb9-mlsvs 1/1 Running 0 6h47m 10.234.24.91 k2 <none> <none>
koord-manager-db6f4bdb9-vtt46 1/1 Running 0 6h47m 10.234.29.167 k3 <none> <none>
koord-scheduler-7db78c8867-tg8q6 1/1 Running 0 6h47m 10.234.29.188 k3 <none> <none>
koord-scheduler-7db78c8867-xdmk5 1/1 Running 0 6h47m 10.234.24.101 k2 <none> <none>
koordlet-hbdmp 1/1 Running 1 (6m4s ago) 6h39m 10.0.0.62 k1 <none> <none>
koordlet-mjlrx 1/1 Running 0 6h40m 10.0.0.106 k3 <none> <none>
koordlet-x9ddf 1/1 Running 0 6h40m 10.0.0.100 k2 <none> <none>
[root@k1 test]# kc -n koordinator-system get ds -o yaml
apiVersion: v1
items:
- apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
deprecated.daemonset.template.generation: "2"
meta.helm.sh/release-name: koordinator
meta.helm.sh/release-namespace: default
creationTimestamp: "2024-04-02T06:56:51Z"
generation: 2
labels:
app.kubernetes.io/managed-by: Helm
koord-app: koordlet
name: koordlet
namespace: koordinator-system
resourceVersion: "477218"
uid: def34102-53df-4a8e-91df-b62de603dab2
spec:
minReadySeconds: 10
revisionHistoryLimit: 10
selector:
matchLabels:
koord-app: koordlet
template:
metadata:
creationTimestamp: null
labels:
koord-app: koordlet
runtimeproxy.koordinator.sh/skip-hookserver: "true"
spec:
containers:
- args:
- -cgroup-root-dir=/host-cgroup/
- -feature-gates=BECPUEvict=true,BEMemoryEvict=true,CgroupReconcile=true,Accelerators=true
- -runtime-hooks-host-endpoint=/var/run/koordlet/koordlet.sock
- --logtostderr=true
- --v=4
command:
- /koordlet
env:
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: registry.cn-beijing.aliyuncs.com/koordinator-sh/koordlet:v1.4.1
imagePullPolicy: Always
name: koordlet
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: "0"
memory: "0"
securityContext:
allowPrivilegeEscalation: true
capabilities:
add:
- SYS_ADMIN
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/localtime
name: host-time
readOnly: true
- mountPath: /host-cgroup/
name: host-cgroup-root
- mountPath: /host-sys-fs/
mountPropagation: Bidirectional
name: host-sys-fs
- mountPath: /host-var-run/
name: host-var-run
readOnly: true
- mountPath: /host-run/
name: host-run
readOnly: true
- mountPath: /host-var-run-koordlet/
mountPropagation: Bidirectional
name: host-var-run-koordlet
- mountPath: /prediction-checkpoints
mountPropagation: Bidirectional
name: host-koordlet-checkpoint-dir
- mountPath: /host-sys/
name: host-sys
readOnly: true
- mountPath: /etc/kubernetes/
name: host-kubernetes
readOnly: true
- mountPath: /host-etc-hookserver/
mountPropagation: Bidirectional
name: host-etc-hookserver
- mountPath: /var/lib/kubelet
name: host-kubelet-rootdir
readOnly: true
- mountPath: /dev
mountPropagation: HostToContainer
name: host-dev
- mountPath: /metric-data/
name: metric-db-path
dnsPolicy: ClusterFirst
hostNetwork: true
hostPID: true
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: koordlet
serviceAccountName: koordlet
terminationGracePeriodSeconds: 10
tolerations:
- operator: Exists
volumes:
- hostPath:
path: /etc/localtime
type: ""
name: host-time
- hostPath:
path: /sys/fs/cgroup/
type: ""
name: host-cgroup-root
- hostPath:
path: /sys/fs/
type: ""
name: host-sys-fs
- hostPath:
path: /var/run/
type: ""
name: host-var-run
- hostPath:
path: /run/
type: ""
name: host-run
- hostPath:
path: /var/run/koordlet
type: DirectoryOrCreate
name: host-var-run-koordlet
- hostPath:
path: /var/run/koordlet/prediction-checkpoints
type: DirectoryOrCreate
name: host-koordlet-checkpoint-dir
- hostPath:
path: /sys/
type: ""
name: host-sys
- hostPath:
path: /etc/kubernetes/
type: ""
name: host-kubernetes
- hostPath:
path: /etc/runtime/hookserver.d/
type: ""
name: host-etc-hookserver
- hostPath:
path: /var/lib/kubelet/
type: ""
name: host-kubelet-rootdir
- hostPath:
path: /dev
type: ""
name: host-dev
- emptyDir:
medium: Memory
sizeLimit: 150Mi
name: metric-db-path
updateStrategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 20%
type: RollingUpdate
status:
currentNumberScheduled: 3
desiredNumberScheduled: 3
numberAvailable: 3
numberMisscheduled: 0
numberReady: 3
observedGeneration: 2
updatedNumberScheduled: 3
kind: List
metadata:
resourceVersion: ""
Based on today's communication in the group, I started validating the configuration of koordlet with 512MB specifications in the afternoon, and deployed 200 pods. The verification environment is the same as described in the above issue.
From the Prometheus monitoring data, it can be observed that the memory usage of koordlet is continuously increasing. Increasing the configuration specifications can only delay the occurrence of OOM events. The relevant monitoring information is as follows:
[root@k1 test]# date Tue Apr 2 08:43:33 GMT 2024 [root@k1 test]# kc -n koordinator-system get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES koord-descheduler-dc5dc679c-fs6cm 1/1 Running 0 106m 10.234.201.20 k1 <none> <none> koord-descheduler-dc5dc679c-whw6q 1/1 Running 0 106m 10.234.29.178 k3 <none> <none> koord-manager-db6f4bdb9-mlsvs 1/1 Running 0 106m 10.234.24.91 k2 <none> <none> koord-manager-db6f4bdb9-vtt46 1/1 Running 0 106m 10.234.29.167 k3 <none> <none> koord-scheduler-7db78c8867-tg8q6 1/1 Running 0 106m 10.234.29.188 k3 <none> <none> koord-scheduler-7db78c8867-xdmk5 1/1 Running 0 106m 10.234.24.101 k2 <none> <none> koordlet-hbdmp 1/1 Running 0 99m 10.0.0.62 k1 <none> <none> koordlet-mjlrx 1/1 Running 0 100m 10.0.0.106 k3 <none> <none> koordlet-x9ddf 1/1 Running 0 99m 10.0.0.100 k2 <none> <none> [root@k1 test]# date Tue Apr 2 13:44:01 GMT 2024 [root@k1 test]# kc -n koordinator-system get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES koord-descheduler-dc5dc679c-fs6cm 1/1 Running 0 6h47m 10.234.201.20 k1 <none> <none> koord-descheduler-dc5dc679c-whw6q 1/1 Running 0 6h47m 10.234.29.178 k3 <none> <none> koord-manager-db6f4bdb9-mlsvs 1/1 Running 0 6h47m 10.234.24.91 k2 <none> <none> koord-manager-db6f4bdb9-vtt46 1/1 Running 0 6h47m 10.234.29.167 k3 <none> <none> koord-scheduler-7db78c8867-tg8q6 1/1 Running 0 6h47m 10.234.29.188 k3 <none> <none> koord-scheduler-7db78c8867-xdmk5 1/1 Running 0 6h47m 10.234.24.101 k2 <none> <none> koordlet-hbdmp 1/1 Running 1 (6m4s ago) 6h39m 10.0.0.62 k1 <none> <none> koordlet-mjlrx 1/1 Running 0 6h40m 10.0.0.106 k3 <none> <none> koordlet-x9ddf 1/1 Running 0 6h40m 10.0.0.100 k2 <none> <none> [root@k1 test]# kc -n koordinator-system get ds -o yaml apiVersion: v1 items: - apiVersion: apps/v1 kind: DaemonSet metadata: annotations: deprecated.daemonset.template.generation: "2" meta.helm.sh/release-name: koordinator meta.helm.sh/release-namespace: default creationTimestamp: "2024-04-02T06:56:51Z" generation: 2 labels: app.kubernetes.io/managed-by: Helm koord-app: koordlet name: koordlet namespace: koordinator-system resourceVersion: "477218" uid: def34102-53df-4a8e-91df-b62de603dab2 spec: minReadySeconds: 10 revisionHistoryLimit: 10 selector: matchLabels: koord-app: koordlet template: metadata: creationTimestamp: null labels: koord-app: koordlet runtimeproxy.koordinator.sh/skip-hookserver: "true" spec: containers: - args: - -cgroup-root-dir=/host-cgroup/ - -feature-gates=BECPUEvict=true,BEMemoryEvict=true,CgroupReconcile=true,Accelerators=true - -runtime-hooks-host-endpoint=/var/run/koordlet/koordlet.sock - --logtostderr=true - --v=4 command: - /koordlet env: - name: NODE_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: spec.nodeName image: registry.cn-beijing.aliyuncs.com/koordinator-sh/koordlet:v1.4.1 imagePullPolicy: Always name: koordlet resources: limits: cpu: 500m memory: 512Mi requests: cpu: "0" memory: "0" securityContext: allowPrivilegeEscalation: true capabilities: add: - SYS_ADMIN privileged: true terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/localtime name: host-time readOnly: true - mountPath: /host-cgroup/ name: host-cgroup-root - mountPath: /host-sys-fs/ mountPropagation: Bidirectional name: host-sys-fs - mountPath: /host-var-run/ name: host-var-run readOnly: true - mountPath: /host-run/ name: host-run readOnly: true - mountPath: /host-var-run-koordlet/ mountPropagation: Bidirectional name: host-var-run-koordlet - mountPath: /prediction-checkpoints mountPropagation: Bidirectional name: host-koordlet-checkpoint-dir - mountPath: /host-sys/ name: host-sys readOnly: true - mountPath: /etc/kubernetes/ name: host-kubernetes readOnly: true - mountPath: /host-etc-hookserver/ mountPropagation: Bidirectional name: host-etc-hookserver - mountPath: /var/lib/kubelet name: host-kubelet-rootdir readOnly: true - mountPath: /dev mountPropagation: HostToContainer name: host-dev - mountPath: /metric-data/ name: metric-db-path dnsPolicy: ClusterFirst hostNetwork: true hostPID: true restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: koordlet serviceAccountName: koordlet terminationGracePeriodSeconds: 10 tolerations: - operator: Exists volumes: - hostPath: path: /etc/localtime type: "" name: host-time - hostPath: path: /sys/fs/cgroup/ type: "" name: host-cgroup-root - hostPath: path: /sys/fs/ type: "" name: host-sys-fs - hostPath: path: /var/run/ type: "" name: host-var-run - hostPath: path: /run/ type: "" name: host-run - hostPath: path: /var/run/koordlet type: DirectoryOrCreate name: host-var-run-koordlet - hostPath: path: /var/run/koordlet/prediction-checkpoints type: DirectoryOrCreate name: host-koordlet-checkpoint-dir - hostPath: path: /sys/ type: "" name: host-sys - hostPath: path: /etc/kubernetes/ type: "" name: host-kubernetes - hostPath: path: /etc/runtime/hookserver.d/ type: "" name: host-etc-hookserver - hostPath: path: /var/lib/kubelet/ type: "" name: host-kubelet-rootdir - hostPath: path: /dev type: "" name: host-dev - emptyDir: medium: Memory sizeLimit: 150Mi name: metric-db-path updateStrategy: rollingUpdate: maxSurge: 0 maxUnavailable: 20% type: RollingUpdate status: currentNumberScheduled: 3 desiredNumberScheduled: 3 numberAvailable: 3 numberMisscheduled: 0 numberReady: 3 observedGeneration: 2 updatedNumberScheduled: 3 kind: List metadata: resourceVersion: ""
@b43646 OK, we are investigating the described case. BTW, did you find any suspicious logs of the koordlet pod?
@saintube Based on the description in the first post, I selected the log information from 15:50 to 15:54 for reference.
@saintube Based on the description in the first post, I selected the log information from 15:50 to 15:54 for reference.
@b43646 It seems to be the systemd logs of the kubelet and container, instead of the koordlet's. You could use kubectl logs -n koordinator-system $KOORDLET_POD_NAME | less
.
@saintube According to your guidance, I found that the logs for my pod only contain today's data. Additionally, I observed that the /metric-data directory is already full of data. I will rerun it to capture the logs of the OOM pod. Also, please help reproduce the issue in your environment.
kubectl -n koordinator-system logs koordlet-hbdmp >> koordlet-hbdmp-512M.log
[root@k1 ~]# kubectl -n koordinator-system exec -it koordlet-hbdmp -- /bin/bash
root@k1:/# df -hT
Filesystem Type Size Used Avail Use% Mounted on
overlay overlay 39G 21G 18G 54% /
tmpfs tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
tmpfs tmpfs 150M 150M 0 100% /metric-data
devtmpfs devtmpfs 7.8G 0 7.8G 0% /dev
shm tmpfs 64M 0 64M 0% /dev/shm
/dev/sda3 xfs 39G 21G 18G 54% /etc/localtime
tmpfs tmpfs 7.8G 267M 7.5G 4% /host-run
@saintube According to your guidance, I found that the logs for my pod only contain today's data. Additionally, I observed that the /metric-data directory is already full of data. I will rerun it to capture the logs of the OOM pod. Also, please help reproduce the issue in your environment.
kubectl -n koordinator-system logs koordlet-hbdmp >> koordlet-hbdmp-512M.log
[root@k1 ~]# kubectl -n koordinator-system exec -it koordlet-hbdmp -- /bin/bash root@k1:/# df -hT Filesystem Type Size Used Avail Use% Mounted on overlay overlay 39G 21G 18G 54% / tmpfs tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup tmpfs tmpfs 150M 150M 0 100% /metric-data devtmpfs devtmpfs 7.8G 0 7.8G 0% /dev shm tmpfs 64M 0 64M 0% /dev/shm /dev/sda3 xfs 39G 21G 18G 54% /etc/localtime tmpfs tmpfs 7.8G 267M 7.5G 4% /host-run
@b43646 Thanks for your information. We're trying to reproduce the case. The key point is distinguishing between the regular metrics storage overhead and the memory leak. The latter is buggy, while the former is an under-provision problem where the OOM can be easily avoided by increasing the pod memory limit or enlarging the collect/store intervals. We are also planning to add a sheet to illustrate the relationship between the memory cost of koordlet and the number of pod metrics on each node, hoping to help users configure their koordlet's resource requirements.
@saintube
Thanks for your prompt response. There's an issue worth noting here: although the koordinator's default configuration for TSDB storage is a maximum of 100MB, in actual operation, the /metric-data directory is filling up with metric data, reaching a size of 150MB, exceeding the default value.
func NewDefaultConfig() *Config {
return &Config{
MetricGCIntervalSeconds: 300,
MetricExpireSeconds: 1800,
TSDBPath: "/metric-data/",
TSDBRetentionDuration: 12 * time.Hour,
TSDBEnablePromMetrics: true,
TSDBStripeSize: tsdb.DefaultStripeSize,
TSDBMaxBytes: 100 * 1024 * 1024, // 100 MB
TSDBWALSegmentSize: 1 * 1024 * 1024, // 1 MB
TSDBMaxBlockChunkSegmentSize: 5 * 1024 * 1024, // 5 MB
TSDBMinBlockDuration: 30 * time.Minute, // 30 minutes
TSDBMaxBlockDuration: 30 * time.Minute, // 30 minutes
TSDBHeadChunksWriteBufferSize: 1024 * 1024, // 1 MB
}
}
@saintube
The OOM issue can still be reproduced. The occurrence of OOM happened around 09:39. Here are the related details:
[root@k1 ~]# kc -n koordinator-system get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
koord-descheduler-dc5dc679c-fs6cm 1/1 Running 0 2d3h 10.234.201.20 k1 <none> <none>
koord-descheduler-dc5dc679c-whw6q 1/1 Running 0 2d3h 10.234.29.178 k3 <none> <none>
koord-manager-db6f4bdb9-mlsvs 1/1 Running 0 2d3h 10.234.24.91 k2 <none> <none>
koord-manager-db6f4bdb9-vtt46 1/1 Running 0 2d3h 10.234.29.167 k3 <none> <none>
koord-scheduler-7db78c8867-tg8q6 1/1 Running 0 2d3h 10.234.29.188 k3 <none> <none>
koord-scheduler-7db78c8867-xdmk5 1/1 Running 0 2d3h 10.234.24.101 k2 <none> <none>
koordlet-g5bhr 1/1 Running 1 (30m ago) 6h35m 10.0.0.106 k3 <none> <none>
koordlet-q9rw9 1/1 Running 0 6h35m 10.0.0.100 k2 <none> <none>
koordlet-s8fgg 1/1 Running 0 6h35m 10.0.0.62 k1 <none> <none>
[root@k3 ~]# cat /var/log/messages | grep "out of memory"
Apr 4 09:39:34 k3 kernel: Memory cgroup out of memory: Kill process 16498 (koordlet) score 1723 or sacrifice child
Hello @saintube, Have you been able to reproduce this issue in your environment?Is there any additional verification that I need to provide?
Hello @saintube, Have you been able to reproduce this issue in your environment?Is there any additional verification that I need to provide?
@b43646 Thanks for your information. We have reproduced the case and been investigating the problem. This issue would be fixed before v1.5.
Hi @b43646, we've found a memory leak problem on the unclosed TSDB querier. Please take a look at #1995 and try the latest koordlet to check if your issue is resolved.
After fixing the above issue, we've tested the recommended memory limits of the koordlet DaemonSet according to different pod numbers per node:
Appendix
@saintube Great Job, thanks for your help. I will verify it as soon as possible
@saintube After 7 days of testing and validation, with the default 256MiB memory, Koordlet did not encounter any Out Of Memory (OOM) exceptions. The validation results passed.
[root@demo ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
10.0.132.100 Ready node 7d12h v1.29.1
10.0.132.152 Ready node 7d12h v1.29.1
10.0.132.228 Ready node 7d12h v1.29.1
[root@demo ~]# kubectl -n koordinator-system get pods | grep koordlet
koordlet-289qf 1/1 Running 1 (7d12h ago) 7d12h
koordlet-6mwch 1/1 Running 1 (7d12h ago) 7d12h
koordlet-kfhqj 1/1 Running 1 (7d12h ago) 7d12h
[root@demo ~]# kubectl describe podgroup gang-example
Name: gang-example
Namespace: default
Labels: <none>
Annotations: <none>
API Version: scheduling.sigs.k8s.io/v1alpha1
Kind: PodGroup
Metadata:
Creation Timestamp: 2024-04-28T13:48:03Z
Generation: 208
Resource Version: 1596886
UID: 529228b0-9d94-4fa1-9494-d4c47548d176
Spec:
Min Member: 100
Schedule Timeout Seconds: 100
Status:
Phase: Running
Running: 200
Schedule Start Time: 2024-04-28T13:48:19Z
Scheduled: 102
Events: <none>
/close
@hormes: Closing this issue.
What happened:
After running for a while, it was found that the koordlet pod was killed due to OOM. Without any new container scheduling, the memory consumption of the koordlet pod continued to increase, indicating a possible memory leak issue with koordlet.
What you expected to happen:
koordlet can run stably without continuous increase in memory consumption.
How to reproduce it (as minimally and precisely as possible):
Prometheus monitoring data:
From the graph, it can be seen that from 10:00 to 16:00, the memory consumption of Koordinator has been continuously increasing, and eventually it was killed due to OOM. The relevant log information is as follows.
Anything else we need to know?:
Environment:
kubectl version
):