Closed geotransformer closed 1 week ago
/sig node
/transfer kubeadm
i removed the node label and moved this to k/kubeadm, but this might be a ticket for kubernetes/kubernetes. for sig instrumentstion.
moving it back.
/transfer kubernetes
I think this should be node.
/sig node
Node is responsible for pod lifecycle and they own the log rotation code.
I am not sure if log rotations actually work for static pods. I see some mentions of static pods and log rotation in the docs but not sure.
https://kubernetes.io/docs/concepts/cluster-administration/logging/#system-component-logs
I think this should be node.
/sig node
Node is responsible for pod lifecycle and they own the log rotation code.
I am not sure if log rotations actually work for static pods. I see some mentions of static pods and log rotation in the docs but not sure.
https://kubernetes.io/docs/concepts/cluster-administration/logging/#system-component-logs
Actually the log rotation for the static pod works fine. See the below. Only the etcd and k8s control-plane static pods. So how kubelet determines the container log should be rotated. Is it any apiserver request for it? If it has to query apiserver, but the apiserver container itself is down/retarted. In this case, kubelet, might not able to cleanup previous etcd/apiserver container logs.
root@mst1:/var/log/pods/default_static-web-mst1_ab30fcbb6481a1c91aa9873ea51974ff/web# ll total 24 drwxr-xr-x 2 root root 4096 Sep 25 21:46 ./ drwxr-xr-x 3 root root 4096 Sep 25 21:28 ../ -rw-r----- 1 root root 5650 Sep 25 21:34 ---------- 3.log -rw-r----- 1 root root 5649 Sep 25 21:46 ---------- 4.log
What is the pod manifest for api server?
In those docs, it says that mounting to a shared volume for static pods may cause issues with rotation.
Kubelet handles log rotation. Code
Code
$ sudo cat etcd.yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/etcd.advertise-client-urls: https://10.192.1.21:2379
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://10.192.1.21:2379
- --cert-file=/k8s/kubernetes/pki/etcd/server.crt
- --cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305
- --client-cert-auth=true
- --data-dir=/k8s/etcd
- --experimental-initial-corrupt-check=true
- --experimental-watch-progress-notify-interval=5s
- --initial-advertise-peer-urls=https://10.192.1.21:2380
- --initial-cluster=tb16-pod1-c1-mm1-master1=https://10.192.1.21:2380
- --key-file=/k8s/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379,https://10.192.1.21:2379
- --listen-metrics-urls=http://0.0.0.0:2381
- --listen-peer-urls=https://10.192.1.21:2380
- --name=tb16-pod1-c1-mm1-master1
- --peer-cert-file=/k8s/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/k8s/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/k8s/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/k8s/kubernetes/pki/etcd/ca.crt
image: registry.k8s.io/etcd:3.5.9-0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 0.0.0.0
path: /health?exclude=NOSPACE&serializable=true
port: 2381
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: etcd
resources:
requests:
cpu: 100m
memory: 100Mi
startupProbe:
failureThreshold: 24
httpGet:
host: 0.0.0.0
path: /health?serializable=false
port: 2381
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /k8s/etcd
name: etcd-k8s
- mountPath: /k8s/kubernetes/pki/etcd
name: etcd-certs
hostNetwork: true
priorityClassName: system-node-critical
securityContext:
seccompProfile:
type: RuntimeDefault
volumes:
- hostPath:
path: /k8s/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /k8s/etcd
type: DirectoryOrCreate
name: etcd-k8s
status: {}
/triage accepted
This issue has not been updated in over 1 year, and should be re-triaged.
You can:
/triage accepted
(org members only)/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted
/remove-sig instrumentation
I couldn't reproduce this with the k/k from the fresh master(Tue Oct 8 14:42:22 2024).
Here is what I did:
# grep -B2 v=10 /etc/kubernetes/manifests/kube-apiserver.yaml
- --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
- --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
- --v=10
Here how it's evolved:
# ls -lha /var/log/pods/kube-system_kube-apiserver-devel_b7ee32bd1387aa32d0edfc860e37b46e/kube-apiserver/
total 24M
drwxr-xr-x 2 root root 4,0K loka 10 12:58 .
drwxr-xr-x 3 root root 4,0K loka 10 12:57 ..
-rw------- 1 root root 6,9M loka 10 13:01 0.log
-rw-r--r-- 1 root root 6,6M loka 10 12:58 0.log.20241010-125714.gz
-rw------- 1 root root 11M loka 10 12:58 0.log.20241010-125825
# ls -lha /var/log/pods/kube-system_kube-apiserver-devel_b7ee32bd1387aa32d0edfc860e37b46e/kube-apiserver/
total 28M
drwxr-xr-x 2 root root 4,0K loka 10 13:02 .
drwxr-xr-x 3 root root 4,0K loka 10 12:57 ..
-rw------- 1 root root 60K loka 10 13:02 0.log
-rw-r--r-- 1 root root 6,6M loka 10 12:58 0.log.20241010-125714.gz
-rw-r--r-- 1 root root 505K loka 10 13:02 0.log.20241010-125825.gz
-rw------- 1 root root 21M loka 10 13:02 0.log.20241010-130224
# ls -lha /var/log/pods/kube-system_kube-apiserver-devel_b7ee32bd1387aa32d0edfc860e37b46e/kube-apiserver/
total 20M
drwxr-xr-x 2 root root 4,0K loka 10 13:03 .
drwxr-xr-x 3 root root 4,0K loka 10 12:57 ..
-rw------- 1 root root 7,0M loka 10 13:04 0.log
-rw-r--r-- 1 root root 505K loka 10 13:02 0.log.20241010-125825.gz
-rw-r--r-- 1 root root 1,2M loka 10 13:02 0.log.20241010-130224.gz
-rw-r--r-- 1 root root 519K loka 10 13:03 0.log.20241010-130254.gz
-rw------- 1 root root 11M loka 10 13:03 0.log.20241010-130334
# ls -lha /var/log/pods/kube-system_kube-apiserver-devel_b7ee32bd1387aa32d0edfc860e37b46e/kube-apiserver/
total 17M
drwxr-xr-x 2 root root 4,0K loka 10 13:15 .
drwxr-xr-x 3 root root 4,0K loka 10 12:57 ..
-rw------- 1 root root 4,3M loka 10 13:16 0.log
-rw-r--r-- 1 root root 595K loka 10 13:12 0.log.20241010-131024.gz
-rw-r--r-- 1 root root 561K loka 10 13:13 0.log.20241010-131204.gz
-rw-r--r-- 1 root root 613K loka 10 13:15 0.log.20241010-131344.gz
-rw------- 1 root root 11M loka 10 13:15 0.log.20241010-131524
So, rotation, compression and clean up seems to work as expected. /close
@bart0sh: Closing this issue.
What happened?
Kubeadm cluster with kubelet configured with the following config
containerLogMaxSize: 25Mi containerLogMaxFiles: 2
crictl stop the static etcd/apiserver/controller/manager pod container. pod logs are not rotated.
root@mst1:/var/log/pods/kube-system_kube-controller-manager-mst1_d502ec3b6222a8e5430ef26afe24e26f/kube-controll er-manager# ls -alh total 236K drwxr-x--- 2 root root 4.0K Sep 25 21:47 . drwxr-x--- 3 root root 4.0K Sep 24 00:39 .. -rw-r----- 1 root root 34K Sep 24 00:39 -------- 0.log -rw-r----- 1 root root 40K Sep 24 00:41 -------- 1.log -rw-r----- 1 root root 147K Sep 25 21:09 ---------2.log -rw-r----- 1 root root 1.2K Sep 25 21:47 ----------3.log
Notes: 1> log rotation works for other non-staic pod or static pod root@mst1:/var/log/pods/default_static-web-mst1_ab30fcbb6481a1c91aa9873ea51974ff/web# ll total 24 drwxr-xr-x 2 root root 4096 Sep 25 21:46 ./ drwxr-xr-x 3 root root 4096 Sep 25 21:28 ../ -rw-r----- 1 root root 5650 Sep 25 21:34 ---------- 3.log -rw-r----- 1 root root 5649 Sep 25 21:46 ---------- 4.log
What did you expect to happen?
at most only 2 pod logs are kept
How can we reproduce it (as minimally and precisely as possible)?
sudo crictl stop <etcd/apisever >
Anything else we need to know?
No response
Kubernetes version
Cloud provider
baremetal
OS version
ubuntu 20.04.6
Install tools
Container runtime (CRI) and version (if applicable)
containerd 1.7.2
Related plugins (CNI, CSI, ...) and versions (if applicable)