vmware-archive / kube-prod-runtime

A standard infrastructure environment for Kubernetes
Apache License 2.0
764 stars 134 forks source link

Fluentd Pods stuck in "ContainerCreating" status in AKS post upgrade #1005

Closed CincomGithubService closed 3 years ago

CincomGithubService commented 3 years ago

We upgraded our AKS cluster from version 1.18.8 to 1.19.3. During the upgrade, the pods were restarted/recreated on upgraded nodes. The Fluend-es-* pods have all been stuck in "ContainerCreating" status since then. We tried uninstalling and re-installing BKPR/kube-prod-runtime as well. All of the pods come up fine, everything else works fine except for these fluentd-es pods which give this error messages : "Unable to attach or mount volumes: unmounted volumes=[varlibdockercontainers], unattached volumes=[varlogbuffers varlogpos fluentd-es-token-6sttb config configd varlibdockercontainers varlog]: timed out waiting for the condition". The other error message that we see in the events for these fluentd pods is "MountVolume.SetUp failed for volume "varlibdockercontainers" : hostPath type check failed: /var/lib/docker/containers is not a directory" prior to giving the other error.

We have not been able to figure out a solution. If someone can assist us, we would really appreciate.

kubectl get all --namespace=kubeprod NAME READY STATUS RESTARTS AGE pod/alertmanager-0 2/2 Running 0 30m pod/cert-manager-666c8b7f77-8l6pc 1/1 Running 0 30m pod/elasticsearch-logging-0 2/2 Running 0 30m pod/elasticsearch-logging-1 2/2 Running 0 30m pod/elasticsearch-logging-2 2/2 Running 0 30m pod/external-dns-77d4fc59bf-f4xwl 1/1 Running 0 30m pod/fluentd-es-2ksjg 0/1 ContainerCreating 0 30m pod/fluentd-es-6c892 0/1 ContainerCreating 0 30m pod/fluentd-es-9jf4n 0/1 ContainerCreating 0 30m pod/fluentd-es-rq4m9 0/1 ContainerCreating 0 28m pod/fluentd-es-x9bcr 0/1 ContainerCreating 0 30m pod/grafana-0 1/1 Running 0 30m pod/kibana-58446f5b6-xxns4 1/1 Running 0 30m pod/kube-state-metrics-848584cb68-vrtkq 2/2 Running 0 30m pod/nginx-ingress-controller-564c9845cf-7wf97 1/1 Running 0 30m pod/nginx-ingress-controller-564c9845cf-xj97x 1/1 Running 0 30m pod/node-exporter-77bdd 1/1 Running 0 30m pod/node-exporter-9pxml 1/1 Running 0 30m pod/node-exporter-bt6sp 1/1 Running 0 30m pod/node-exporter-nf4zw 1/1 Running 0 28m pod/node-exporter-sh7z7 1/1 Running 0 30m pod/oauth2-proxy-6fd457b756-jpsgr 1/1 Running 0 30m pod/oauth2-proxy-6fd457b756-lccht 1/1 Running 0 30m pod/prometheus-0 2/2 Running 0 30m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/alertmanager ClusterIP 172.29.133.30 9093/TCP 30m service/elasticsearch-logging ClusterIP None 9200/TCP 30m service/grafana ClusterIP 172.29.134.113 3000/TCP 30m service/kibana-logging ClusterIP 172.29.133.137 5601/TCP 30m service/nginx-ingress LoadBalancer 172.29.133.234 52.253.76.150 80:30008/TCP,443:31158/TCP 30m service/oauth2-proxy ClusterIP 172.29.132.90 4180/TCP 30m service/prometheus ClusterIP 172.29.133.41 9090/TCP 30m

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/fluentd-es 5 5 0 5 0 30m daemonset.apps/node-exporter 5 5 5 5 5 30m

NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/cert-manager 1/1 1 1 30m deployment.apps/external-dns 1/1 1 1 30m deployment.apps/kibana 1/1 1 1 30m deployment.apps/kube-state-metrics 1/1 1 1 30m deployment.apps/nginx-ingress-controller 2/2 2 2 30m deployment.apps/oauth2-proxy 2/2 2 2 30m

NAME DESIRED CURRENT READY AGE replicaset.apps/cert-manager-666c8b7f77 1 1 1 30m replicaset.apps/external-dns-77d4fc59bf 1 1 1 30m replicaset.apps/kibana-58446f5b6 1 1 1 30m replicaset.apps/kube-state-metrics-848584cb68 1 1 1 30m replicaset.apps/nginx-ingress-controller-564c9845cf 2 2 2 30m replicaset.apps/oauth2-proxy-6fd457b756 2 2 2 30m

NAME READY AGE statefulset.apps/alertmanager 1/1 30m statefulset.apps/elasticsearch-logging 3/3 30m statefulset.apps/grafana 1/1 30m statefulset.apps/prometheus 1/1 30m

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE horizontalpodautoscaler.autoscaling/nginx-ingress-controller Deployment/nginx-ingress-controller 9%/80% 2 10 2 30m horizontalpodautoscaler.autoscaling/oauth2-proxy Deployment/oauth2-proxy 10%/80% 2 10 2 30m

NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE cronjob.batch/elasticsearch-curator 10 10 * False 0 30m

kubectl describe pod/fluentd-es-x9bcr --namespace=kubeprod Name: fluentd-es-x9bcr Namespace: kubeprod Priority: 0 Node: aks-nodepool1-50503828-vmss000000/172.29.128.4 Start Time: Thu, 10 Dec 2020 21:22:42 -0500 Labels: controller-revision-hash=7b4497658c name=fluentd-es pod-template-generation=1 Annotations: prometheus.io/path: /metrics prometheus.io/port: 24231 prometheus.io/scrape: true scheduler.alpha.kubernetes.io/critical-pod: Status: Pending IP:
IPs: Controlled By: DaemonSet/fluentd-es Containers: fluentd-es: Container ID:
Image: bitnami/fluentd:1.11.1-debian-10-r27 Image ID:
Port: Host Port: Command: fluentd Args: --config=/opt/bitnami/fluentd/conf/fluentd.conf --plugin=/opt/bitnami/fluentd/plugins --log=/opt/bitnami/fluentd/logs/fluentd.log --log-rotate-age=5 --log-rotate-size=104857600 --no-supervisor State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: memory: 500Mi Requests: cpu: 100m memory: 200Mi Environment: ES_HOST: elasticsearch-logging.kubeprod.svc Mounts: /opt/bitnami/fluentd/conf from config (ro) /opt/bitnami/fluentd/conf/config.d from configd (ro) /var/lib/docker/containers from varlibdockercontainers (ro) /var/log from varlog (ro) /var/log/fluentd-buffers from varlogbuffers (rw) /var/log/fluentd-pos from varlogpos (rw) /var/run/secrets/kubernetes.io/serviceaccount from fluentd-es-token-6sttb (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: config: Type: ConfigMap (a volume populated by a ConfigMap) Name: fluentd-es-30f242f Optional: false configd: Type: ConfigMap (a volume populated by a ConfigMap) Name: fluentd-es-configd-cbf6e63 Optional: false varlibdockercontainers: Type: HostPath (bare host directory volume) Path: /var/lib/docker/containers HostPathType: Directory varlog: Type: HostPath (bare host directory volume) Path: /var/log HostPathType: Directory varlogbuffers: Type: HostPath (bare host directory volume) Path: /var/log/fluentd-buffers HostPathType: DirectoryOrCreate varlogpos: Type: HostPath (bare host directory volume) Path: /var/log/fluentd-pos HostPathType: DirectoryOrCreate fluentd-es-token-6sttb: Type: Secret (a volume populated by a Secret) SecretName: fluentd-es-token-6sttb Optional: false QoS Class: Burstable Node-Selectors: Tolerations: node.kubernetes.io/disk-pressure:NoSchedule node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/not-ready:NoExecute node.kubernetes.io/pid-pressure:NoSchedule node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/unschedulable:NoSchedule Events: Type Reason Age From Message


Normal Scheduled 31m default-scheduler Successfully assigned kubeprod/fluentd-es-x9bcr to aks-nodepool1-50503828-vmss000000 Warning FailedMount 29m kubelet, aks-nodepool1-50503828-vmss000000 Unable to attach or mount volumes: unmounted volumes=[varlibdockercontainers], unattached volumes=[config configd varlibdockercontainers varlog varlogbuffers varlogpos fluentd-es-token-6sttb]: timed out waiting for the condition Warning FailedMount 24m kubelet, aks-nodepool1-50503828-vmss000000 Unable to attach or mount volumes: unmounted volumes=[varlibdockercontainers], unattached volumes=[varlibdockercontainers varlog varlogbuffers varlogpos fluentd-es-token-6sttb config configd]: timed out waiting for the condition Warning FailedMount 18m (x2 over 22m) kubelet, aks-nodepool1-50503828-vmss000000 Unable to attach or mount volumes: unmounted volumes=[varlibdockercontainers], unattached volumes=[fluentd-es-token-6sttb config configd varlibdockercontainers varlog varlogbuffers varlogpos]: timed out waiting for the condition Warning FailedMount 15m (x2 over 27m) kubelet, aks-nodepool1-50503828-vmss000000 Unable to attach or mount volumes: unmounted volumes=[varlibdockercontainers], unattached volumes=[configd varlibdockercontainers varlog varlogbuffers varlogpos fluentd-es-token-6sttb config]: timed out waiting for the condition Warning FailedMount 13m kubelet, aks-nodepool1-50503828-vmss000000 Unable to attach or mount volumes: unmounted volumes=[varlibdockercontainers], unattached volumes=[varlogpos fluentd-es-token-6sttb config configd varlibdockercontainers varlog varlogbuffers]: timed out waiting for the condition Warning FailedMount 11m (x2 over 20m) kubelet, aks-nodepool1-50503828-vmss000000 Unable to attach or mount volumes: unmounted volumes=[varlibdockercontainers], unattached volumes=[varlogbuffers varlogpos fluentd-es-token-6sttb config configd varlibdockercontainers varlog]: timed out waiting for the condition Warning FailedMount 8m55s kubelet, aks-nodepool1-50503828-vmss000000 Unable to attach or mount volumes: unmounted volumes=[varlibdockercontainers], unattached volumes=[varlog varlogbuffers varlogpos fluentd-es-token-6sttb config configd varlibdockercontainers]: timed out waiting for the condition Warning FailedMount 51s (x23 over 31m) kubelet, aks-nodepool1-50503828-vmss000000 MountVolume.SetUp failed for volume "varlibdockercontainers" : hostPath type check failed: /var/lib/docker/containers is not a directory

CincomGithubService commented 3 years ago

Closing the issue as we resolved the issue. We created the missing directory "/var/lib/docker/containers" on each of the nodes in the cluster using a node shell and all the fluentd pods came up fine after that. We overlooked the error on our part.

Thanks.