Pods with PVC can't be scheduled due to missing csi.hetzner.cloud driver in node

AlessandroSechi commented 1 year ago

/kind bug

1. What kops version are you running? The command kops version, will display this information.

1.26.2

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag.

 Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.2", GitCommit:"5835544ca568b757a8ecae5c153f317e5736700e", GitTreeState:"clean", BuildDate:"2022-09-21T14:33:49Z", GoVersion:"go1.19.1", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.3", GitCommit:"9e644106593f3f4aa98f8a84b23db5fa378900bd", GitTreeState:"clean", BuildDate:"2023-03-15T13:33:12Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"}

3. What cloud provider are you using?

Hetzner

4. What commands did you run? What is the simplest way to reproduce this issue?

Added a new node to a 2 nodes cluster with kops edit ig my-node, then minSize: 3 and maxSize: 3 Applied changes with kops update cluster --yes

Then kops validate cluster returns Your cluster cluster1.fsn1.hetzner.mywebsite.com is ready

5. What happened after the commands executed?

No pods which uses PVC can be scheduled. In describe pod I see

Warning  FailedAttachVolume  5s    attachdetach-controller  AttachVolume.Attach failed for volume "pvc-60560158-d8b4-4908-aa6f-8df0c53d810e" : rpc error: code = NotFound desc = failed to publish volume: volume not found
AttachVolume.Attach failed for volume "pvc-60560158-d8b4-4908-aa6f-8df0c53d810e" : CSINode nodes-fsn1-67d56c4deadcd70f does not contain driver csi.hetzner.cloud

6. What did you expect to happen?

Have ability to run pods using PVC

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2023-05-04T18:26:50Z"
  name: cluster1.fsn1.hetzner.mywebsite.com
spec:
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: hetzner
  configBase: do://hetznerk8skopscluster/cluster1.fsn1.hetzner.mywebsite.com
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - instanceGroup: control-plane-fsn1-1
      name: etcd-1
    - instanceGroup: control-plane-fsn1-2
      name: etcd-2
    - instanceGroup: control-plane-fsn1-3
      name: etcd-3
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - instanceGroup: control-plane-fsn1-1
      name: etcd-1
    - instanceGroup: control-plane-fsn1-2
      name: etcd-2
    - instanceGroup: control-plane-fsn1-3
      name: etcd-3
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  - ::/0
  kubernetesVersion: 1.26.3
  networkCIDR: 10.10.0.0/16
  networking:
    calico: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  - ::/0
  subnets:
  - name: fsn1
    type: Public
    zone: fsn1
  topology:
    dns:
      type: None
    masters: public
    nodes: public

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2023-05-04T18:26:51Z"
  labels:
    kops.k8s.io/cluster: cluster1.fsn1.hetzner.mywebsite.com
  name: control-plane-fsn1-1
spec:
  image: debian-11
  machineType: cpx11
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - fsn1

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2023-05-04T18:26:51Z"
  labels:
    kops.k8s.io/cluster: cluster1.fsn1.hetzner.mywebsite.com
  name: control-plane-fsn1-2
spec:
  image: debian-11
  machineType: cpx11
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - fsn1

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2023-05-04T18:26:51Z"
  labels:
    kops.k8s.io/cluster: cluster1.fsn1.hetzner.mywebsite.com
  name: control-plane-fsn1-3
spec:
  image: debian-11
  machineType: cpx11
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - fsn1

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2023-05-04T18:26:51Z"
  generation: 1
  labels:
    kops.k8s.io/cluster: cluster1.fsn1.hetzner.mywebsite.com
  name: nodes-fsn1
spec:
  image: debian-11
  machineType: cpx21
  maxSize: 3
  minSize: 3
  role: Node
  subnets:
  - fsn1

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?

hakman commented 1 year ago

@AlessandroSechi Are there any messages in the CSI driver pods log? Did the CSI driver work before increasing the number of nodes?

AlessandroSechi commented 1 year ago

Are there any messages in the CSI driver pods log?

@hakman Yes I checked logs, and I see some errors:

W0619 18:12:45.943030       1 reflector.go:436] k8s.io/client-go/informers/factory.go:134: watch of *v1.CSINode ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0619 18:12:45.943042       1 reflector.go:436] k8s.io/client-go/informers/factory.go:134: watch of *v1.PersistentVolume ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0619 18:12:45.943105       1 reflector.go:436] k8s.io/client-go/informers/factory.go:134: watch of *v1.VolumeAttachment ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0619 21:35:28.192333       1 reflector.go:436] k8s.io/client-go/informers/factory.go:134: watch of *v1.CSINode ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0619 21:35:28.192341       1 reflector.go:436] k8s.io/client-go/informers/factory.go:134: watch of *v1.PersistentVolume ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0619 21:35:28.192350       1 reflector.go:436] k8s.io/client-go/informers/factory.go:134: watch of *v1.VolumeAttachment ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
I0619 21:36:12.321871       1 trace.go:205] Trace[163361275]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:134 (19-Jun-2023 21:35:29.125) (total time: 43177ms):
Trace[163361275]: ---"Objects listed" 43177ms (21:36:00.303)
Trace[163361275]: [43.177606769s] [43.177606769s] END
W0624 21:38:51.893897       1 reflector.go:436] k8s.io/client-go/informers/factory.go:134: watch of *v1.CSINode ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0624 21:38:51.893916       1 reflector.go:436] k8s.io/client-go/informers/factory.go:134: watch of *v1.PersistentVolume ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0624 21:38:51.893907       1 reflector.go:436] k8s.io/client-go/informers/factory.go:134: watch of *v1.VolumeAttachment ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
E0624 21:39:05.882381       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.CSINode: failed to list *v1.CSINode: Get "https://100.64.0.1:443/apis/storage.k8s.io/v1/csinodes?resourceVersion=16974325": dial tcp 100.64.0.1:443: connect: connection refused
E0624 21:39:06.548055       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.PersistentVolume: failed to list *v1.PersistentVolume: Get "https://100.64.0.1:443/api/v1/persistentvolumes?resourceVersion=16974252": dial tcp 100.64.0.1:443: connect: connection refused

I also noticed some other error in hcloud-controller-manager

E0626 11:13:01.139544       1 controller.go:310] error processing service ingress-controller/ingress-nginx-controller (will retry): failed to ensure load balancer: hcloud/loadBalancers.EnsureLoadBalancer: hcops/LoadBalancerOps.Create: neither load-balancer.hetzner.cloud/location nor load-balancer.hetzner.cloud/network-zone set
I0626 11:18:01.147669       1 controller.go:407] Ensuring load balancer for service ingress-controller/ingress-nginx-controller
I0626 11:18:01.158815       1 load_balancers.go:108] "ensure Load Balancer" op="hcloud/loadBalancers.EnsureLoadBalancer" service="ingress-nginx-controller" nodes=[nodes-fsn1-67d56c4deadcd70f nodes-fsn1-729c4c76bc120662 nodes-fsn1-361c1a49dec42261]
I0626 11:18:01.160916       1 event.go:294] "Event occurred" object="ingress-controller/ingress-nginx-controller" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
I0626 11:18:01.477477       1 event.go:294] "Event occurred" object="ingress-controller/ingress-nginx-controller" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: hcloud/loadBalancers.EnsureLoadBalancer: hcops/LoadBalancerOps.Create: neither load-balancer.hetzner.cloud/location nor load-balancer.hetzner.cloud/network-zone set"
E0626 11:18:01.477840       1 controller.go:310] error processing service ingress-controller/ingress-nginx-controller (will retry): failed to ensure load balancer: hcloud/loadBalancers.EnsureLoadBalancer: hcops/LoadBalancerOps.Create: neither load-balancer.hetzner.cloud/location nor load-balancer.hetzner.cloud/network-zone set

In fact, machine was not added in Hetzner LB as target. Tried to repeat the increase (deleted node + reapplied update cluster) but same result. Apparently there's some issue in scaling up which affects also LB.

Did the CSI driver work before increasing the number of nodes?

Yes everything was working fine

hakman commented 1 year ago

I tried to reproduce the problem without much luck. I used https://github.com/kubernetes/kops/releases/tag/v1.27.0-beta.3, which has newer CCM and CSI divers. Please try it also and see if you can reproduce the issue with this new kOps release.

If issue still appears, please document the cluster creation args, changes to cluster from default, relevant manifest(s) and other steps needed to reproduce this.

AlessandroSechi commented 1 year ago

Hello, issue still reproduces after scaling cluster with Kops 1.27.0-beta.3

Command used for cluster creation:

kops create cluster --name=cluster1.fsn1.hetzner.mywebsite.com --ssh-public-key=/home/key.pub --cloud=hetzner --zones=fsn1 --image=debian-11 --networking=calico --network-cidr=10.10.0.0/16 --master-size cpx11 --master-count 3 --node-count 2 --node-size cpx21

No other changes

Also, this error is present in csi-attacher container

I0702 09:15:56.370725       1 main.go:94] Version: v4.1.0
W0702 09:16:06.386447       1 connection.go:173] Still connecting to unix:///run/csi/socket
W0702 09:16:16.387718       1 connection.go:173] Still connecting to unix:///run/csi/socket
I0702 09:16:17.855259       1 common.go:111] Probing CSI driver for readiness
I0702 09:16:17.892917       1 controller.go:130] Starting CSI attacher
W0702 09:33:15.013352       1 reflector.go:347] k8s.io/client-go/informers/factory.go:150: watch of *v1.VolumeAttachment ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0702 09:33:15.013358       1 reflector.go:347] k8s.io/client-go/informers/factory.go:150: watch of *v1.CSINode ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0702 09:33:15.013428       1 reflector.go:347] k8s.io/client-go/informers/factory.go:150: watch of *v1.PersistentVolume ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
I0702 09:33:59.466265       1 trace.go:219] Trace[1006933274]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:150 (02-Jul-2023 09:33:15.987) (total time: 43477ms):

Also, as collateral issue, node is never added to Hetzner LB targets

hakman commented 1 year ago

It is not clear to me what kind of app you are running, how many replicas and so on, also the steps to reproduce the issue.

What is the status/events of CSI pods (describe deployment and daemonset)? Are all pods running and ready (get pods -A -o wide)? What is the status/events for the app pod (describe pod)? What is the status/events for the PVC (describe pvc)?

In fact, machine was not added in Hetzner LB as target

This comment is also out of context. How does the LB fit here?

AlessandroSechi commented 1 year ago

It is not clear to me what kind of app you are running, how many replicas and so on, also the steps to reproduce the issue.

The app is consul, installed via official helm chart. It has 2 replicas, I'm trying to schedule the third upgrading chart with server.replicas=3.

What is the status/events of CSI pods (describe deployment and daemonset)?

kubectl -n kube-system describe deployment hcloud-cloud-controller-manager

Name:                   hcloud-cloud-controller-manager
Namespace:              kube-system
CreationTimestamp:      Thu, 04 May 2023 18:30:59 +0000
Labels:                 addon.kops.k8s.io/name=hcloud-cloud-controller.addons.k8s.io
                        app.kubernetes.io/managed-by=kops
                        k8s-addon=hcloud-cloud-controller.addons.k8s.io
Annotations:            deployment.kubernetes.io/revision: 2
Selector:               app=hcloud-cloud-controller-manager
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=hcloud-cloud-controller-manager
                    kops.k8s.io/managed-by=kops
  Service Account:  cloud-controller-manager
  Containers:
   hcloud-cloud-controller-manager:
    Image:      hetznercloud/hcloud-cloud-controller-manager:v1.15.0@sha256:709ddfb2c976d16748d835ed5846333142a6a879dd6c9e5734b6bfac1071ea9f
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/hcloud-cloud-controller-manager
      --allocate-node-cidrs=true
      --allow-untagged-cloud=true
      --cloud-provider=hcloud
      --cluster-cidr=100.64.0.0/10
      --configure-cloud-routes=false
      --leader-elect=false
      --v=2
      --use-service-account-credentials=true
    Requests:
      cpu:     100m
      memory:  50Mi
    Environment:
      NODE_NAME:         (v1:spec.nodeName)
      HCLOUD_TOKEN:     <set to the key 'token' in secret 'hcloud'>    Optional: false
      HCLOUD_NETWORK:   <set to the key 'network' in secret 'hcloud'>  Optional: false
    Mounts:             <none>
  Volumes:              <none>
  Priority Class Name:  system-cluster-critical
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   hcloud-cloud-controller-manager-5fdb77d49b (1/1 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  84m   deployment-controller  Scaled up replica set hcloud-cloud-controller-manager-5fdb77d49b to 1
  Normal  ScalingReplicaSet  84m   deployment-controller  Scaled down replica set hcloud-cloud-controller-manager-df588cd94 to 0 from 1

kubectl -n kube-system describe deployment hcloud-csi-controller

Name:                   hcloud-csi-controller
Namespace:              kube-system
CreationTimestamp:      Thu, 04 May 2023 18:31:03 +0000
Labels:                 addon.kops.k8s.io/name=hcloud-csi-driver.addons.k8s.io
                        app.kubernetes.io/managed-by=kops
                        k8s-addon=hcloud-csi-driver.addons.k8s.io
Annotations:            deployment.kubernetes.io/revision: 2
Selector:               app=hcloud-csi-controller
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=hcloud-csi-controller
                    kops.k8s.io/managed-by=kops
  Service Account:  hcloud-csi-controller
  Containers:
   csi-attacher:
    Image:      registry.k8s.io/sig-storage/csi-attacher:v4.1.0@sha256:08721106b949e4f5c7ba34b059e17300d73c8e9495201954edc90eeb3e6d8461
    Port:       <none>
    Host Port:  <none>
    Args:
      --default-fstype=ext4
    Environment:  <none>
    Mounts:
      /run/csi from socket-dir (rw)
   csi-resizer:
    Image:        registry.k8s.io/sig-storage/csi-resizer:v1.7.0@sha256:3a7bdf5d105783d05d0962fa06ca53032b01694556e633f27366201c2881e01d
    Port:         <none>
    Host Port:    <none>
    Environment:  <none>
    Mounts:
      /run/csi from socket-dir (rw)
   csi-provisioner:
    Image:      registry.k8s.io/sig-storage/csi-provisioner:v3.4.0@sha256:e468dddcd275163a042ab297b2d8c2aca50d5e148d2d22f3b6ba119e2f31fa79
    Port:       <none>
    Host Port:  <none>
    Args:
      --feature-gates=Topology=true
      --default-fstype=ext4
    Environment:  <none>
    Mounts:
      /run/csi from socket-dir (rw)
   hcloud-csi-driver:
    Image:       hetznercloud/hcloud-csi-driver:v2.3.2@sha256:b7ed90d5fab2c3fc63bf3ecb2193d3d18c1ec368c7ad98b2dbf633f0ada6afba
    Ports:       9189/TCP, 9808/TCP
    Host Ports:  0/TCP, 0/TCP
    Command:
      /bin/hcloud-csi-driver-controller
    Liveness:  http-get http://:healthz/healthz delay=10s timeout=3s period=2s #success=1 #failure=5
    Environment:
      CSI_ENDPOINT:      unix:///run/csi/socket
      METRICS_ENDPOINT:  0.0.0.0:9189
      ENABLE_METRICS:    true
      KUBE_NODE_NAME:     (v1:spec.nodeName)
      HCLOUD_TOKEN:      <set to the key 'token' in secret 'hcloud-csi'>  Optional: false
    Mounts:
      /run/csi from socket-dir (rw)
   liveness-probe:
    Image:        registry.k8s.io/sig-storage/livenessprobe:v2.9.0@sha256:2b10b24dafdc3ba94a03fc94d9df9941ca9d6a9207b927f5dfd21d59fbe05ba0
    Port:         <none>
    Host Port:    <none>
    Environment:  <none>
    Mounts:
      /run/csi from socket-dir (rw)
  Volumes:
   socket-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   hcloud-csi-controller-7b6cf877f9 (1/1 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  85m   deployment-controller  Scaled up replica set hcloud-csi-controller-7b6cf877f9 to 1
  Normal  ScalingReplicaSet  84m   deployment-controller  Scaled down replica set hcloud-csi-controller-65f85947bb to 0 from 1

kubectl -n kube-system describe daemonset hcloud-csi-node

Name:           hcloud-csi-node
Selector:       app=hcloud-csi
Node-Selector:  <none>
Labels:         addon.kops.k8s.io/name=hcloud-csi-driver.addons.k8s.io
                app=hcloud-csi
                app.kubernetes.io/managed-by=kops
                k8s-addon=hcloud-csi-driver.addons.k8s.io
Annotations:    deprecated.daemonset.template.generation: 2
Desired Number of Nodes Scheduled: 6
Current Number of Nodes Scheduled: 6
Number of Nodes Scheduled with Up-to-date Pods: 6
Number of Nodes Scheduled with Available Pods: 6
Number of Nodes Misscheduled: 0
Pods Status:  6 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  app=hcloud-csi
           kops.k8s.io/managed-by=kops
  Containers:
   csi-node-driver-registrar:
    Image:      registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0@sha256:4a4cae5118c4404e35d66059346b7fa0835d7e6319ff45ed73f4bba335cf5183
    Port:       <none>
    Host Port:  <none>
    Args:
      --kubelet-registration-path=/var/lib/kubelet/plugins/csi.hetzner.cloud/socket
    Environment:  <none>
    Mounts:
      /registration from registration-dir (rw)
      /run/csi from plugin-dir (rw)
   hcloud-csi-driver:
    Image:       hetznercloud/hcloud-csi-driver:v2.3.2@sha256:b7ed90d5fab2c3fc63bf3ecb2193d3d18c1ec368c7ad98b2dbf633f0ada6afba
    Ports:       9189/TCP, 9808/TCP
    Host Ports:  0/TCP, 0/TCP
    Command:
      /bin/hcloud-csi-driver-node
    Liveness:  http-get http://:healthz/healthz delay=10s timeout=3s period=2s #success=1 #failure=5
    Environment:
      CSI_ENDPOINT:      unix:///run/csi/socket
      METRICS_ENDPOINT:  0.0.0.0:9189
      ENABLE_METRICS:    true
    Mounts:
      /dev from device-dir (rw)
      /run/csi from plugin-dir (rw)
      /var/lib/kubelet from kubelet-dir (rw)
   liveness-probe:
    Image:        registry.k8s.io/sig-storage/livenessprobe:v2.9.0@sha256:2b10b24dafdc3ba94a03fc94d9df9941ca9d6a9207b927f5dfd21d59fbe05ba0
    Port:         <none>
    Host Port:    <none>
    Environment:  <none>
    Mounts:
      /run/csi from plugin-dir (rw)
  Volumes:
   kubelet-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet
    HostPathType:  Directory
   plugin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins/csi.hetzner.cloud/
    HostPathType:  DirectoryOrCreate
   registration-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins_registry/
    HostPathType:  Directory
   device-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:  Directory
Events:
  Type     Reason            Age    From                  Message
  ----     ------            ----   ----                  -------
  Normal   SuccessfulDelete  87m    daemonset-controller  Deleted pod: hcloud-csi-node-kpjmp
  Normal   SuccessfulCreate  86m    daemonset-controller  Created pod: hcloud-csi-node-nkglw
  Normal   SuccessfulDelete  86m    daemonset-controller  Deleted pod: hcloud-csi-node-z9d2w
  Warning  FailedDaemonPod   86m    daemonset-controller  Found failed daemon pod kube-system/hcloud-csi-node-j97jz on node control-plane-fsn1-2-32c8473e2e051454, will try to kill it
  Normal   SuccessfulDelete  86m    daemonset-controller  Deleted pod: hcloud-csi-node-j97jz
  Normal   SuccessfulCreate  86m    daemonset-controller  Created pod: hcloud-csi-node-qhtzp
  Normal   SuccessfulCreate  86m    daemonset-controller  Created pod: hcloud-csi-node-t9hlj
  Normal   SuccessfulDelete  85m    daemonset-controller  Deleted pod: hcloud-csi-node-c42lh
  Normal   SuccessfulCreate  85m    daemonset-controller  Created pod: hcloud-csi-node-g4bmv
  Normal   SuccessfulDelete  85m    daemonset-controller  Deleted pod: hcloud-csi-node-k7l97
  Normal   SuccessfulCreate  85m    daemonset-controller  Created pod: hcloud-csi-node-2xwkm
  Normal   SuccessfulCreate  85m    daemonset-controller  Created pod: hcloud-csi-node-4chhv
  Warning  FailedDaemonPod   79m    daemonset-controller  Found failed daemon pod kube-system/hcloud-csi-node-t9hlj on node control-plane-fsn1-1-3946975372c0925c, will try to kill it
  Normal   SuccessfulDelete  79m    daemonset-controller  Deleted pod: hcloud-csi-node-t9hlj
  Normal   SuccessfulCreate  79m    daemonset-controller  Created pod: hcloud-csi-node-k25ct
  Warning  FailedDaemonPod   78m    daemonset-controller  Found failed daemon pod kube-system/hcloud-csi-node-k25ct on node control-plane-fsn1-1-3946975372c0925c, will try to kill it
  Normal   SuccessfulDelete  78m    daemonset-controller  Deleted pod: hcloud-csi-node-k25ct
  Normal   SuccessfulCreate  78m    daemonset-controller  Created pod: hcloud-csi-node-g6rrm
  Warning  FailedDaemonPod   76m    daemonset-controller  Found failed daemon pod kube-system/hcloud-csi-node-g6rrm on node control-plane-fsn1-1-3946975372c0925c, will try to kill it
  Normal   SuccessfulDelete  76m    daemonset-controller  Deleted pod: hcloud-csi-node-g6rrm
  Normal   SuccessfulCreate  76m    daemonset-controller  Created pod: hcloud-csi-node-qsfkp
  Warning  FailedDaemonPod   67m    daemonset-controller  Found failed daemon pod kube-system/hcloud-csi-node-2xwkm on node control-plane-fsn1-3-42f1b452575dc956, will try to kill it
  Normal   SuccessfulDelete  67m    daemonset-controller  Deleted pod: hcloud-csi-node-2xwkm
  Normal   SuccessfulCreate  67m    daemonset-controller  Created pod: hcloud-csi-node-6tr4h
  Warning  FailedDaemonPod   66m    daemonset-controller  Found failed daemon pod kube-system/hcloud-csi-node-qsfkp on node control-plane-fsn1-1-3946975372c0925c, will try to kill it
  Warning  FailedDaemonPod   66m    daemonset-controller  Found failed daemon pod kube-system/hcloud-csi-node-qhtzp on node control-plane-fsn1-2-32c8473e2e051454, will try to kill it
  Normal   SuccessfulDelete  66m    daemonset-controller  Deleted pod: hcloud-csi-node-qhtzp
  Normal   SuccessfulDelete  66m    daemonset-controller  Deleted pod: hcloud-csi-node-qsfkp
  Normal   SuccessfulCreate  66m    daemonset-controller  Created pod: hcloud-csi-node-vrmjt
  Normal   SuccessfulCreate  66m    daemonset-controller  Created pod: hcloud-csi-node-k86xn
  Warning  FailedDaemonPod   61m    daemonset-controller  Found failed daemon pod kube-system/hcloud-csi-node-6tr4h on node control-plane-fsn1-3-42f1b452575dc956, will try to kill it
  Normal   SuccessfulDelete  61m    daemonset-controller  Deleted pod: hcloud-csi-node-6tr4h
  Normal   SuccessfulCreate  61m    daemonset-controller  Created pod: hcloud-csi-node-8w9wg
  Warning  FailedDaemonPod   52m    daemonset-controller  Found failed daemon pod kube-system/hcloud-csi-node-vrmjt on node control-plane-fsn1-1-3946975372c0925c, will try to kill it
  Normal   SuccessfulDelete  52m    daemonset-controller  Deleted pod: hcloud-csi-node-vrmjt
  Normal   SuccessfulCreate  52m    daemonset-controller  Created pod: hcloud-csi-node-p6xkq
  Warning  FailedDaemonPod   49m    daemonset-controller  Found failed daemon pod kube-system/hcloud-csi-node-p6xkq on node control-plane-fsn1-1-3946975372c0925c, will try to kill it
  Normal   SuccessfulDelete  49m    daemonset-controller  Deleted pod: hcloud-csi-node-p6xkq
  Normal   SuccessfulCreate  49m    daemonset-controller  Created pod: hcloud-csi-node-2zj6j
  Warning  FailedDaemonPod   48m    daemonset-controller  Found failed daemon pod kube-system/hcloud-csi-node-2zj6j on node control-plane-fsn1-1-3946975372c0925c, will try to kill it
  Normal   SuccessfulDelete  48m    daemonset-controller  Deleted pod: hcloud-csi-node-2zj6j
  Normal   SuccessfulCreate  48m    daemonset-controller  Created pod: hcloud-csi-node-d6qxv
  Warning  FailedDaemonPod   48m    daemonset-controller  Found failed daemon pod kube-system/hcloud-csi-node-8w9wg on node control-plane-fsn1-3-42f1b452575dc956, will try to kill it
  Normal   SuccessfulDelete  48m    daemonset-controller  Deleted pod: hcloud-csi-node-8w9wg
  Normal   SuccessfulCreate  48m    daemonset-controller  Created pod: hcloud-csi-node-s9ttb
  Warning  FailedDaemonPod   45m    daemonset-controller  Found failed daemon pod kube-system/hcloud-csi-node-s9ttb on node control-plane-fsn1-3-42f1b452575dc956, will try to kill it
  Normal   SuccessfulDelete  45m    daemonset-controller  Deleted pod: hcloud-csi-node-s9ttb
  Normal   SuccessfulCreate  45m    daemonset-controller  Created pod: hcloud-csi-node-zkw2p
  Warning  FailedDaemonPod   41m    daemonset-controller  Found failed daemon pod kube-system/hcloud-csi-node-zkw2p on node control-plane-fsn1-3-42f1b452575dc956, will try to kill it
  Normal   SuccessfulDelete  41m    daemonset-controller  Deleted pod: hcloud-csi-node-zkw2p
  Normal   SuccessfulCreate  41m    daemonset-controller  Created pod: hcloud-csi-node-lrtv7
  Normal   SuccessfulCreate  32m    daemonset-controller  Created pod: hcloud-csi-node-ttbwt
  Warning  FailedDaemonPod   29m    daemonset-controller  Found failed daemon pod kube-system/hcloud-csi-node-d6qxv on node control-plane-fsn1-1-3946975372c0925c, will try to kill it
  Normal   SuccessfulDelete  29m    daemonset-controller  Deleted pod: hcloud-csi-node-d6qxv
  Normal   SuccessfulCreate  29m    daemonset-controller  Created pod: hcloud-csi-node-s6lgc
  Warning  FailedDaemonPod   27m    daemonset-controller  Found failed daemon pod kube-system/hcloud-csi-node-s6lgc on node control-plane-fsn1-1-3946975372c0925c, will try to kill it
  Normal   SuccessfulDelete  27m    daemonset-controller  Deleted pod: hcloud-csi-node-s6lgc
  Normal   SuccessfulCreate  27m    daemonset-controller  Created pod: hcloud-csi-node-lmv78
  Warning  FailedDaemonPod   25m    daemonset-controller  Found failed daemon pod kube-system/hcloud-csi-node-k86xn on node control-plane-fsn1-2-32c8473e2e051454, will try to kill it
  Normal   SuccessfulDelete  25m    daemonset-controller  Deleted pod: hcloud-csi-node-k86xn
  Normal   SuccessfulCreate  25m    daemonset-controller  Created pod: hcloud-csi-node-hz6mn
  Warning  FailedDaemonPod   24m    daemonset-controller  Found failed daemon pod kube-system/hcloud-csi-node-lmv78 on node control-plane-fsn1-1-3946975372c0925c, will try to kill it
  Normal   SuccessfulDelete  24m    daemonset-controller  Deleted pod: hcloud-csi-node-lmv78
  Normal   SuccessfulCreate  24m    daemonset-controller  Created pod: hcloud-csi-node-hnbs9
  Normal   SuccessfulCreate  8m55s  daemonset-controller  Created pod: hcloud-csi-node-kc2rs

Are all pods running and ready (get pods -A -o wide)?

Yes, except the new consul replica

kubectl get pods -A -o wide
NAMESPACE            NAME                                                            READY   STATUS              RESTARTS        AGE     IP               NODE                                    NOMINATED NODE   READINESS GATES
common               alertmanager-prometheus-kube-prometheus-alertmanager-0          2/2     Running             1 (22h ago)     22h     100.105.181.20   nodes-fsn1-729c4c76bc120662             <none>           <none>
common               consul-connect-injector-69b95b77bd-szfvx                        1/1     Running             123 (42m ago)   56d     100.105.181.53   nodes-fsn1-729c4c76bc120662             <none>           <none>
common               consul-server-0                                                 1/1     Running             0               58d     100.103.23.83    nodes-fsn1-361c1a49dec42261             <none>           <none>
common               consul-server-1                                                 1/1     Running             0               19h     100.105.181.44   nodes-fsn1-729c4c76bc120662             <none>           <none>
common               consul-server-2                                                 0/1     ContainerCreating   0               8m40s   <none>           nodes-fsn1-6e1adf53318e07f1             <none>           <none>
common               consul-webhook-cert-manager-6c85944667-wstmx                    1/1     Running             0               22h     100.105.181.61   nodes-fsn1-729c4c76bc120662             <none>           <none>
common               mariadb-mariadb-galera-0                                        2/2     Running             0               19h     100.105.181.55   nodes-fsn1-729c4c76bc120662             <none>           <none>
common               mariadb-mariadb-galera-1                                        2/2     Running             0               19h     100.105.181.45   nodes-fsn1-729c4c76bc120662             <none>           <none>
common               prometheus-grafana-b5dccf59-bhcr2                               3/3     Running             2 (9h ago)      22h     100.105.181.10   nodes-fsn1-729c4c76bc120662             <none>           <none>
common               prometheus-kube-prometheus-operator-64f776b465-q9xrw            1/1     Running             0               22h     100.105.181.9    nodes-fsn1-729c4c76bc120662             <none>           <none>
common               prometheus-kube-state-metrics-6bdd65d76-r4gqc                   1/1     Running             0               58d     100.105.181.17   nodes-fsn1-729c4c76bc120662             <none>           <none>
common               prometheus-prometheus-kube-prometheus-prometheus-0              2/2     Running             0               56d     100.105.181.52   nodes-fsn1-729c4c76bc120662             <none>           <none>
common               prometheus-prometheus-node-exporter-75kfn                       1/1     Running             1 (20h ago)     56d     10.10.0.7        nodes-fsn1-729c4c76bc120662             <none>           <none>
common               prometheus-prometheus-node-exporter-fn4hr                       1/1     Running             4 (67m ago)     2d18h   10.10.0.5        control-plane-fsn1-2-32c8473e2e051454   <none>           <none>
common               prometheus-prometheus-node-exporter-qc8kk                       1/1     Running             0               56d     10.10.0.3        nodes-fsn1-361c1a49dec42261             <none>           <none>
common               prometheus-prometheus-node-exporter-sc86v                       1/1     Running             2 (26m ago)     49m     10.10.0.4        control-plane-fsn1-1-3946975372c0925c   <none>           <none>
common               prometheus-prometheus-node-exporter-tfjqs                       1/1     Running             0               10m     10.10.0.9        nodes-fsn1-6e1adf53318e07f1             <none>           <none>
common               prometheus-prometheus-node-exporter-v4zl2                       1/1     Running             5 (43m ago)     15d     10.10.0.6        control-plane-fsn1-3-42f1b452575dc956   <none>           <none>
common               storage-proxy-77ddc56cf6-v6srl                                  1/1     Running             1 (20h ago)     45d     100.105.181.62   nodes-fsn1-729c4c76bc120662             <none>           <none>
common               volpod                                                          1/1     Running             0               19h     100.105.181.24   nodes-fsn1-729c4c76bc120662             <none>           <none>
ingress-controller   cert-manager-5ff989dc45-6zfbf                                   1/1     Running             51 (51m ago)    56d     100.105.181.35   nodes-fsn1-729c4c76bc120662             <none>           <none>
ingress-controller   cert-manager-cainjector-d8c5dc896-pm682                         1/1     Running             4 (30m ago)     22h     100.105.181.21   nodes-fsn1-729c4c76bc120662             <none>           <none>
ingress-controller   cert-manager-webhook-67bd96ff64-dmmk5                           1/1     Running             1 (20h ago)     22h     100.105.181.38   nodes-fsn1-729c4c76bc120662             <none>           <none>
ingress-controller   ingress-nginx-controller-7fb5787978-7rzbc                       1/1     Running             0               18h     100.105.181.33   nodes-fsn1-729c4c76bc120662             <none>           <none>
ingress-controller   ingress-nginx-controller-7fb5787978-ctk8f                       1/1     Running             0               18h     100.105.181.7    nodes-fsn1-729c4c76bc120662             <none>           <none>
kube-system          calico-kube-controllers-66fc944d4b-xcj2p                        1/1     Running             1 (29m ago)     88m     100.103.65.141   control-plane-fsn1-2-32c8473e2e051454   <none>           <none>
kube-system          calico-node-9cdwg                                               1/1     Running             0               88m     10.10.0.7        nodes-fsn1-729c4c76bc120662             <none>           <none>
kube-system          calico-node-fw228                                               1/1     Running             0               85m     10.10.0.4        control-plane-fsn1-1-3946975372c0925c   <none>           <none>
kube-system          calico-node-qklxl                                               1/1     Running             0               10m     10.10.0.9        nodes-fsn1-6e1adf53318e07f1             <none>           <none>
kube-system          calico-node-wh8zl                                               1/1     Running             0               87m     10.10.0.5        control-plane-fsn1-2-32c8473e2e051454   <none>           <none>
kube-system          calico-node-wlhxt                                               1/1     Running             0               86m     10.10.0.3        nodes-fsn1-361c1a49dec42261             <none>           <none>
kube-system          calico-node-zwftj                                               1/1     Running             0               86m     10.10.0.6        control-plane-fsn1-3-42f1b452575dc956   <none>           <none>
kube-system          coredns-6d7f697665-5gmss                                        1/1     Running             0               88m     100.105.181.41   nodes-fsn1-729c4c76bc120662             <none>           <none>
kube-system          coredns-6d7f697665-7n7gd                                        1/1     Running             0               88m     100.105.181.13   nodes-fsn1-729c4c76bc120662             <none>           <none>
kube-system          coredns-autoscaler-6f7745894d-6dscd                             1/1     Running             0               88m     100.105.181.60   nodes-fsn1-729c4c76bc120662             <none>           <none>
kube-system          etcd-manager-events-control-plane-fsn1-1-3946975372c0925c       1/1     Running             2 (22h ago)     22h     10.10.0.4        control-plane-fsn1-1-3946975372c0925c   <none>           <none>
kube-system          etcd-manager-events-control-plane-fsn1-2-32c8473e2e051454       1/1     Running             2 (22h ago)     58d     10.10.0.5        control-plane-fsn1-2-32c8473e2e051454   <none>           <none>
kube-system          etcd-manager-events-control-plane-fsn1-3-42f1b452575dc956       1/1     Running             3 (22h ago)     58d     10.10.0.6        control-plane-fsn1-3-42f1b452575dc956   <none>           <none>
kube-system          etcd-manager-main-control-plane-fsn1-1-3946975372c0925c         1/1     Running             2 (22h ago)     22h     10.10.0.4        control-plane-fsn1-1-3946975372c0925c   <none>           <none>
kube-system          etcd-manager-main-control-plane-fsn1-2-32c8473e2e051454         1/1     Running             2 (22h ago)     58d     10.10.0.5        control-plane-fsn1-2-32c8473e2e051454   <none>           <none>
kube-system          etcd-manager-main-control-plane-fsn1-3-42f1b452575dc956         1/1     Running             3 (22h ago)     58d     10.10.0.6        control-plane-fsn1-3-42f1b452575dc956   <none>           <none>
kube-system          hcloud-cloud-controller-manager-5fdb77d49b-cn2vm                1/1     Running             0               88m     10.10.0.4        control-plane-fsn1-1-3946975372c0925c   <none>           <none>
kube-system          hcloud-csi-controller-7b6cf877f9-mcvnf                          5/5     Running             0               88m     100.105.181.19   nodes-fsn1-729c4c76bc120662             <none>           <none>
kube-system          hcloud-csi-node-g4bmv                                           3/3     Running             0               87m     100.103.23.75    nodes-fsn1-361c1a49dec42261             <none>           <none>
kube-system          hcloud-csi-node-hnbs9                                           3/3     Running             0               26m     100.116.56.206   control-plane-fsn1-1-3946975372c0925c   <none>           <none>
kube-system          hcloud-csi-node-hz6mn                                           3/3     Running             0               27m     100.103.65.144   control-plane-fsn1-2-32c8473e2e051454   <none>           <none>
kube-system          hcloud-csi-node-kc2rs                                           3/3     Running             0               10m     100.101.101.65   nodes-fsn1-6e1adf53318e07f1             <none>           <none>
kube-system          hcloud-csi-node-lrtv7                                           3/3     Running             0               43m     100.116.242.7    control-plane-fsn1-3-42f1b452575dc956   <none>           <none>
kube-system          hcloud-csi-node-nkglw                                           3/3     Running             0               88m     100.105.181.40   nodes-fsn1-729c4c76bc120662             <none>           <none>
kube-system          kops-controller-8qvl6                                           1/1     Running             46 (42m ago)    58d     10.10.0.5        control-plane-fsn1-2-32c8473e2e051454   <none>           <none>
kube-system          kops-controller-f8nsh                                           1/1     Running             43 (49m ago)    58d     10.10.0.6        control-plane-fsn1-3-42f1b452575dc956   <none>           <none>
kube-system          kops-controller-mwvnp                                           1/1     Running             5 (54m ago)     22h     10.10.0.4        control-plane-fsn1-1-3946975372c0925c   <none>           <none>
kube-system          kube-apiserver-control-plane-fsn1-1-3946975372c0925c            2/2     Running             10 (26m ago)    22h     10.10.0.4        control-plane-fsn1-1-3946975372c0925c   <none>           <none>
kube-system          kube-apiserver-control-plane-fsn1-2-32c8473e2e051454            2/2     Running             52 (67m ago)    58d     10.10.0.5        control-plane-fsn1-2-32c8473e2e051454   <none>           <none>
kube-system          kube-apiserver-control-plane-fsn1-3-42f1b452575dc956            2/2     Running             50 (43m ago)    58d     10.10.0.6        control-plane-fsn1-3-42f1b452575dc956   <none>           <none>
kube-system          kube-controller-manager-control-plane-fsn1-1-3946975372c0925c   1/1     Running             8 (54m ago)     22h     10.10.0.4        control-plane-fsn1-1-3946975372c0925c   <none>           <none>
kube-system          kube-controller-manager-control-plane-fsn1-2-32c8473e2e051454   1/1     Running             78 (42m ago)    58d     10.10.0.5        control-plane-fsn1-2-32c8473e2e051454   <none>           <none>
kube-system          kube-controller-manager-control-plane-fsn1-3-42f1b452575dc956   1/1     Running             65 (63m ago)    58d     10.10.0.6        control-plane-fsn1-3-42f1b452575dc956   <none>           <none>
kube-system          kube-proxy-control-plane-fsn1-1-3946975372c0925c                1/1     Running             1 (22h ago)     22h     10.10.0.4        control-plane-fsn1-1-3946975372c0925c   <none>           <none>
kube-system          kube-proxy-control-plane-fsn1-2-32c8473e2e051454                1/1     Running             1 (22h ago)     58d     10.10.0.5        control-plane-fsn1-2-32c8473e2e051454   <none>           <none>
kube-system          kube-proxy-control-plane-fsn1-3-42f1b452575dc956                1/1     Running             2 (22h ago)     58d     10.10.0.6        control-plane-fsn1-3-42f1b452575dc956   <none>           <none>
kube-system          kube-proxy-nodes-fsn1-361c1a49dec42261                          1/1     Running             0               58d     10.10.0.3        nodes-fsn1-361c1a49dec42261             <none>           <none>
kube-system          kube-proxy-nodes-fsn1-6e1adf53318e07f1                          1/1     Running             0               10m     10.10.0.9        nodes-fsn1-6e1adf53318e07f1             <none>           <none>
kube-system          kube-proxy-nodes-fsn1-729c4c76bc120662                          1/1     Running             0               58d     10.10.0.7        nodes-fsn1-729c4c76bc120662             <none>           <none>
kube-system          kube-scheduler-control-plane-fsn1-1-3946975372c0925c            1/1     Running             4 (31m ago)     22h     10.10.0.4        control-plane-fsn1-1-3946975372c0925c   <none>           <none>
kube-system          kube-scheduler-control-plane-fsn1-2-32c8473e2e051454            1/1     Running             48 (42m ago)    58d     10.10.0.5        control-plane-fsn1-2-32c8473e2e051454   <none>           <none>
kube-system          kube-scheduler-control-plane-fsn1-3-42f1b452575dc956            1/1     Running             45 (69m ago)    58d     10.10.0.6        control-plane-fsn1-3-42f1b452575dc956   <none>           <none>

What is the status/events for the app pod (describe pod)?

Now, I see error changed, but pod still can't be scheduled.

kubectl -n common describe pod consul-server-2
Name:             consul-server-2
Namespace:        common
Priority:         0
Service Account:  consul-server
Node:             nodes-fsn1-6e1adf53318e07f1/10.10.0.9
Start Time:       Sun, 02 Jul 2023 10:35:41 +0000
Labels:           app=consul
                  chart=consul-helm
                  component=server
                  controller-revision-hash=consul-server-76b6576bff
                  hasDNS=true
                  release=consul
                  statefulset.kubernetes.io/pod-name=consul-server-2
Annotations:      consul.hashicorp.com/config-checksum: 0b003a5539ab09175e389b7e89105615c0394b10c54fd6893c3a084f5ce99f2e
                  consul.hashicorp.com/connect-inject: false
Status:           Pending
IP:
IPs:              <none>
Controlled By:    StatefulSet/consul-server
Containers:
  consul:
    Container ID:
    Image:         hashicorp/consul:1.14.2
    Image ID:
    Ports:         8500/TCP, 8502/TCP, 8301/TCP, 8301/UDP, 8302/TCP, 8302/UDP, 8300/TCP, 8600/TCP, 8600/UDP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/UDP, 0/TCP, 0/UDP, 0/TCP, 0/TCP, 0/UDP
    Command:
      /bin/sh
      -ec

      cp /consul/config/extra-from-values.json /consul/extra-config/extra-from-values.json
      [ -n "${HOST_IP}" ] && sed -Ei "s|HOST_IP|${HOST_IP?}|g" /consul/extra-config/extra-from-values.json
      [ -n "${POD_IP}" ] && sed -Ei "s|POD_IP|${POD_IP?}|g" /consul/extra-config/extra-from-values.json
      [ -n "${HOSTNAME}" ] && sed -Ei "s|HOSTNAME|${HOSTNAME?}|g" /consul/extra-config/extra-from-values.json

      exec /usr/local/bin/docker-entrypoint.sh consul agent \
        -advertise="${ADVERTISE_IP}" \
        -config-dir=/consul/config \
        -config-file=/consul/extra-config/extra-from-values.json

    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  100Mi
    Requests:
      cpu:      100m
      memory:   100Mi
    Readiness:  exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader \
2>/dev/null | grep -E '".+"'
] delay=5s timeout=5s period=3s #success=1 #failure=2
    Environment:
      ADVERTISE_IP:               (v1:status.podIP)
      HOST_IP:                    (v1:status.hostIP)
      POD_IP:                     (v1:status.podIP)
      CONSUL_DISABLE_PERM_MGMT:  true
    Mounts:
      /consul/config from config (rw)
      /consul/data from data-common (rw)
      /consul/extra-config from extra-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-djbbh (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data-common:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-common-consul-server-2
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      consul-server-config
    Optional:  false
  extra-config:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-djbbh:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason              Age                   From                     Message
  ----     ------              ----                  ----                     -------
  Normal   Scheduled           11m                   default-scheduler        Successfully assigned common/consul-server-2 to nodes-fsn1-6e1adf53318e07f1
  Warning  FailedMount         9m16s                 kubelet                  Unable to attach or mount volumes: unmounted volumes=[data-common], unattached volumes=[extra-config kube-api-access-djbbh data-common config]: timed out waiting for the condition
  Warning  FailedMount         4m44s (x2 over 7m1s)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[data-common], unattached volumes=[data-common config extra-config kube-api-access-djbbh]: timed out waiting for the condition
  Warning  FailedMount         2m29s                 kubelet                  Unable to attach or mount volumes: unmounted volumes=[data-common], unattached volumes=[config extra-config kube-api-access-djbbh data-common]: timed out waiting for the condition
  Warning  FailedAttachVolume  56s (x13 over 11m)    attachdetach-controller  AttachVolume.Attach failed for volume "pvc-60560158-d8b4-4908-aa6f-8df0c53d810e" : rpc error: code = NotFound desc = failed to publish volume: volume not found
  Warning  FailedMount         15s                   kubelet                  Unable to attach or mount volumes: unmounted volumes=[data-common], unattached volumes=[kube-api-access-djbbh data-common config extra-config]: timed out waiting for the condition

What is the status/events for the PVC (describe pvc)?

I see that apparently PVC has been created some days ago (I made many tests, seems that at some time it worked), but from previous logs it seems is not found.

kubectl -n common get pvc
NAME                            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS     AGE
data-common-consul-server-0     Bound    pvc-c5cd056a-7551-45a0-b3fe-62086acb8dbb   10Gi       RWO            hcloud-volumes   58d
data-common-consul-server-1     Bound    pvc-85745d55-213f-4bbe-af85-6604fe9f75ca   10Gi       RWO            hcloud-volumes   58d
data-common-consul-server-2     Bound    pvc-60560158-d8b4-4908-aa6f-8df0c53d810e   10Gi       RWO            hcloud-volumes   6d22h
data-mariadb-mariadb-galera-0   Bound    pvc-4954ff56-ea22-4a2f-9c18-c94a53b4711f   40Gi       RWO            hcloud-volumes   58d
data-mariadb-mariadb-galera-1   Bound    pvc-a209bb87-746c-4d2e-82b6-9b901d0940f6   40Gi       RWO            hcloud-volumes   58d
prometheus-grafana              Bound    pvc-25606845-ee48-441b-b685-656da862d248   10Gi       RWO            hcloud-volumes   58d

kubectl -n common describe pvc data-common-consul-server-2
Name:          data-common-consul-server-2
Namespace:     common
StorageClass:  hcloud-volumes
Status:        Bound
Volume:        pvc-60560158-d8b4-4908-aa6f-8df0c53d810e
Labels:        app=consul
               chart=consul-helm
               component=server
               hasDNS=true
               release=consul
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: csi.hetzner.cloud
               volume.kubernetes.io/selected-node: nodes-fsn1-722add04b36be89a
               volume.kubernetes.io/storage-provisioner: csi.hetzner.cloud
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      10Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       consul-server-2
Events:        <none>

This comment is also out of context. How does the LB fit here?

Is just something "strange" I noticed with this issue: as I remember, when cluster was created, nodes were automatically added to LB, and this didn't happened now that i scaled cluster. Not sure is somehow related or is intended, just reported to (maybe) be helpful.

AlessandroSechi commented 1 year ago

Steps which lead to issue:

Deploy new cluster with kops create cluster --name=cluster1.fsn1.hetzner.mywebsite.com --ssh-public-key=/home/key.pub --cloud=hetzner --zones=fsn1 --image=debian-11 --networking=calico --network-cidr=10.10.0.0/16 --master-size cpx11 --master-count 3 --node-count 2 --node-size cpx21
Install hashicorp/consul with 2 replicas.
Add new node, with kops edit ig my-node, then minSize: 3 and maxSize: 3.
Run kops update cluster --yes
helm upgrade consul, with server.replicas=3

AlessandroSechi commented 1 year ago

I deleted consul pod and PVC, scaled cluster another time to three nodes and redeployed consul, and pods are now correctly scheduled, so I'm closing issue. Thank you for your time.

hakman commented 1 year ago

2GB of memory for the masters is insufficient. This is probably the cause you are seeing so many pod restarts, which means cluster instability. Please switch to something with 4GB of memory or more and investigate pod crashes if they still happen.

kubernetes / kops

Pods with PVC can't be scheduled due to missing csi.hetzner.cloud driver in node #15550