hetznercloud / csi-driver

Kubernetes Container Storage Interface driver for Hetzner Cloud Volumes
MIT License
620 stars 102 forks source link

VolumeAttachment points to wrong node #685

Closed gmautner closed 1 month ago

gmautner commented 1 month ago

TL;DR

A Hetzner volume from a pvc previously created using the csi.hetzner.cloud started attaching to a different node than the one where the correspoding pod is running. Looking at the VolumeAttachment, you can see that the csi.alpha.kubernetes.io/node-id annotation refers to two nodes at the same time, leading to the attachment of the wrong node. We can also see that the node to which the pod should attach has a csi.kubernetes.io/node-id that does not match with its spec.providerID

Also see the observation at the end of this comment regarding the use of Cilium Egress gateway which might be related to the problem

Expected behavior

The contents of the csi.alpha.kubernetes.io/node-id in the node manifest should correspond to the correct ID.

Observed behavior

Here are the manifests.

Start with the pod. nodeName: hetzner-02-88surfv3-pool-two-d916c6da54a7b84 correctly reflects where the pod is running. It mounts the pvc whose name is data-loki-backend-2.

apiVersion: v1
kind: Pod
metadata:
  annotations:
    checksum/config: a04988c856382f69bc6f8ce10b54f5ad5dc5e54afbeb6bf869e13726110c97f7
    kubectl.kubernetes.io/restartedAt: "2024-08-06T12:24:41-03:00"
  creationTimestamp: "2024-08-06T15:24:52Z"
  generateName: loki-backend-
  labels:
    app.kubernetes.io/component: backend
    app.kubernetes.io/instance: loki
    app.kubernetes.io/name: loki
    app.kubernetes.io/part-of: memberlist
    apps.kubernetes.io/pod-index: "2"
    controller-revision-hash: loki-backend-77f97846f9
    statefulset.kubernetes.io/pod-name: loki-backend-2
  name: loki-backend-2
  namespace: monitoring
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: loki-backend
    uid: e9925a03-8f1f-44f6-a45c-30c2cd2a12c4
  resourceVersion: "41546551"
  uid: 47218d48-5c1e-4edd-9b79-326fdda29659
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app.kubernetes.io/component: backend
        topologyKey: kubernetes.io/hostname
  automountServiceAccountToken: true
  containers:
  - env:
    - name: METHOD
      value: WATCH
    - name: LABEL
      value: loki_rule
    - name: FOLDER
      value: /rules
    - name: RESOURCE
      value: both
    - name: WATCH_SERVER_TIMEOUT
      value: "60"
    - name: WATCH_CLIENT_TIMEOUT
      value: "60"
    - name: LOG_LEVEL
      value: INFO
    image: kiwigrid/k8s-sidecar:1.24.3
    imagePullPolicy: IfNotPresent
    name: loki-sc-rules
    resources:
      limits:
        cpu: "1"
        memory: 256Mi
      requests:
        cpu: 20m
        memory: 256Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /rules
      name: sc-rules-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-nsz9c
      readOnly: true
  - args:
    - -config.file=/etc/loki/config/config.yaml
    - -target=backend
    - -legacy-read-mode=false
    image: docker.io/grafana/loki:3.0.0
    imagePullPolicy: IfNotPresent
    name: loki
    ports:
    - containerPort: 3100
      name: http-metrics
      protocol: TCP
    - containerPort: 9095
      name: grpc
      protocol: TCP
    - containerPort: 7946
      name: http-memberlist
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /ready
        port: http-metrics
        scheme: HTTP
      initialDelaySeconds: 30
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      limits:
        cpu: "1"
        memory: 512Mi
      requests:
        cpu: 40m
        memory: 512Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      readOnlyRootFilesystem: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/loki/config
      name: config
    - mountPath: /etc/loki/runtime-config
      name: runtime-config
    - mountPath: /tmp
      name: tmp
    - mountPath: /var/loki
      name: data
    - mountPath: /rules
      name: sc-rules-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-nsz9c
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: loki-backend-2
  nodeName: hetzner-02-88surfv3-pool-two-d916c6da54a7b84
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 10001
    runAsGroup: 10001
    runAsNonRoot: true
    runAsUser: 10001
  serviceAccount: loki
  serviceAccountName: loki
  subdomain: loki-backend-headless
  terminationGracePeriodSeconds: 300
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: data-loki-backend-2
  - emptyDir: {}
    name: tmp
  - configMap:
      defaultMode: 420
      items:
      - key: config.yaml
        path: config.yaml
      name: loki
    name: config
  - configMap:
      defaultMode: 420
      name: loki-runtime
    name: runtime-config
  - emptyDir: {}
    name: sc-rules-volume
  - name: kube-api-access-nsz9c
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-08-06T15:24:52Z"
    status: "False"
    type: PodReadyToStartContainers
  - lastProbeTime: null
    lastTransitionTime: "2024-08-06T15:24:52Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2024-08-06T15:24:52Z"
    message: 'containers with unready status: [loki-sc-rules loki]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2024-08-06T15:24:52Z"
    message: 'containers with unready status: [loki-sc-rules loki]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2024-08-06T15:24:52Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: docker.io/grafana/loki:3.0.0
    imageID: ""
    lastState: {}
    name: loki
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        reason: ContainerCreating
  - image: kiwigrid/k8s-sidecar:1.24.3
    imageID: ""
    lastState: {}
    name: loki-sc-rules
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        reason: ContainerCreating
  hostIP: 10.255.0.3
  hostIPs:
  - ip: 10.255.0.3
  phase: Pending
  qosClass: Burstable
  startTime: "2024-08-06T15:24:52Z"

Looking at the pvc now, looks almost normal, except that the annotation volume.kubernetes.io/selected-node has the value of hetzner-02-88surfv3-pool-one-220bed1360101ef4 which is a different node. Not sure this is an issue per se, just mentioning.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    k8up.io/backup: "false"
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: csi.hetzner.cloud
    volume.kubernetes.io/selected-node: hetzner-02-88surfv3-pool-one-220bed1360101ef4
    volume.kubernetes.io/storage-provisioner: csi.hetzner.cloud
  creationTimestamp: "2024-07-15T06:20:28Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app.kubernetes.io/component: backend
    app.kubernetes.io/instance: loki
    app.kubernetes.io/name: loki
  name: data-loki-backend-2
  namespace: monitoring
  ownerReferences:
  - apiVersion: apps/v1
    kind: StatefulSet
    name: loki-backend
    uid: e9925a03-8f1f-44f6-a45c-30c2cd2a12c4
  resourceVersion: "21391596"
  uid: ecb0585e-25cb-4c63-a871-762645d62f41
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: hcloud-volumes
  volumeMode: Filesystem
  volumeName: pvc-ecb0585e-25cb-4c63-a871-762645d62f41
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 10Gi
  phase: Bound

Now look at the VolumeAttachment. This is where things get weird. The csi.alpha.kubernetes.io/node-id: "51177823" annotation points to a node-id that belongs to two nodes at the same time, as we'll see.

apiVersion: storage.k8s.io/v1
kind: VolumeAttachment
metadata:
  annotations:
    csi.alpha.kubernetes.io/node-id: "51177823"
  creationTimestamp: "2024-08-06T15:24:59Z"
  finalizers:
  - external-attacher/csi-hetzner-cloud
  name: csi-4a1ffc2cd868e3cfb5827892f12f14471cda2584cfdf33d09549c01b9edbe97d
  resourceVersion: "41554826"
  uid: ce9da29c-2d0a-4f32-820e-4a73673693b5
spec:
  attacher: csi.hetzner.cloud
  nodeName: hetzner-02-88surfv3-pool-two-d916c6da54a7b84
  source:
    persistentVolumeName: pvc-ecb0585e-25cb-4c63-a871-762645d62f41
status:
  attachError:
    message: 'rpc error: code = FailedPrecondition desc = failed to publish volume:
      volume is attached'
    time: "2024-08-06T15:39:20Z"
  attached: false

Let's check the spec of node hetzner-02-88surfv3-pool-two-d916c6da54a7b84 where the pod is running.

See, we have the annotation:

csi.volume.kubernetes.io/nodeid: '{"csi.hetzner.cloud":"51177823"}'

But, at the same, time, the value of spec.providerID is providerID: hcloud://51254076 which doesn't match

apiVersion: v1
kind: Node
metadata:
  annotations:
    alpha.kubernetes.io/provided-node-ip: 10.255.0.3
    csi.volume.kubernetes.io/nodeid: '{"csi.hetzner.cloud":"51177823"}'
    k3s.io/node-args: '["agent","--flannel-iface","eth1","--kubelet-arg","cloud-provider=external","--kubelet-arg","volume-plugin-dir=/var/lib/kubelet/volumeplugins","--kubelet-arg","kube-reserved=cpu=50m,memory=300Mi,ephemeral-storage=1Gi","--kubelet-arg","system-reserved=cpu=250m,memory=300Mi","--node-label","k3s_upgrade=true","--node-taint","node.cilium.io/agent-not-ready:NoExecute","--selinux","true","--server","https://10.255.0.1:6443","--token","********"]'
    k3s.io/node-config-hash: PYQPCH4YGYZXIBF5V5KGH2VBRF4QC4X2WSCRCWYILRDBGXL3U63A====
    k3s.io/node-env: '{"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/c38ba7cc1669e7d80b8156ae743932fd86f5bce3871b8a88bef531dd4e3c02b2"}'
    node.alpha.kubernetes.io/ttl: "0"
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2024-08-03T17:47:53Z"
  finalizers:
  - wrangler.cattle.io/node
  - wrangler.cattle.io/managed-etcd-controller
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/instance-type: ccx43
    beta.kubernetes.io/os: linux
    csi.hetzner.cloud/location: ash
    failure-domain.beta.kubernetes.io/region: ash
    failure-domain.beta.kubernetes.io/zone: ash-dc1
    k3s_upgrade: "true"
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: hetzner-02-88surfv3-pool-two-d916c6da54a7b84
    kubernetes.io/os: linux
    node.kubernetes.io/instance-type: ccx43
    plan.upgrade.cattle.io/k3s-agent: 0b60512d8a6929695a4d737eb94f0cb82e8fbb1babcd4176f94d073f
    topology.kubernetes.io/region: ash
    topology.kubernetes.io/zone: ash-dc1
  name: hetzner-02-88surfv3-pool-two-d916c6da54a7b84
  resourceVersion: "41561381"
  uid: d6327732-721b-41cf-84b0-9b38180ceeac
spec:
  podCIDR: 10.42.6.0/24
  podCIDRs:
  - 10.42.6.0/24
  providerID: hcloud://51254076
status:
  addresses:
  - address: 10.255.0.3
    type: InternalIP
  - address: hetzner-02-88surfv3-pool-two-d916c6da54a7b84
    type: Hostname
  - address: 5.161.228.191
    type: ExternalIP
  allocatable:
    cpu: 15700m
    ephemeral-storage: "349112821281"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 63678632Ki
    pods: "110"
  capacity:
    cpu: "16"
    ephemeral-storage: 359977964Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 64293032Ki
    pods: "110"
  conditions:
  - lastHeartbeatTime: "2024-08-03T17:48:17Z"
    lastTransitionTime: "2024-08-03T17:48:17Z"
    message: Cilium is running on this node
    reason: CiliumIsUp
    status: "False"
    type: NetworkUnavailable
  - lastHeartbeatTime: "2024-08-06T15:50:30Z"
    lastTransitionTime: "2024-08-03T17:47:53Z"
    message: kubelet has sufficient memory available
    reason: KubeletHasSufficientMemory
    status: "False"
    type: MemoryPressure
  - lastHeartbeatTime: "2024-08-06T15:50:30Z"
    lastTransitionTime: "2024-08-03T17:47:53Z"
    message: kubelet has no disk pressure
    reason: KubeletHasNoDiskPressure
    status: "False"
    type: DiskPressure
  - lastHeartbeatTime: "2024-08-06T15:50:30Z"
    lastTransitionTime: "2024-08-03T17:47:53Z"
    message: kubelet has sufficient PID available
    reason: KubeletHasSufficientPID
    status: "False"
    type: PIDPressure
  - lastHeartbeatTime: "2024-08-06T15:50:30Z"
    lastTransitionTime: "2024-08-03T17:48:08Z"
    message: kubelet is posting ready status
    reason: KubeletReady
    status: "True"
    type: Ready
  daemonEndpoints:
    kubeletEndpoint:
      Port: 10250
  images:
  - names:
    - quay.io/cilium/cilium@sha256:351d6685dc6f6ffbcd5451043167cfa8842c6decf80d8c8e426a417c73fb56d4
    sizeBytes: 208222582
  - names:
    - docker.io/bitnami/kubectl@sha256:cfd03da61658004f1615e5401ba8bde3cc4ba3f87afff0ed8875c5d1b0b09e4a
    - docker.io/bitnami/kubectl:1.28.5
    sizeBytes: 80756099
  - names:
    - docker.io/grafana/promtail@sha256:d3de3da9431cfbe74a6a94555050df5257f357e827be8e63f8998d509c37af8b
    - docker.io/grafana/promtail:3.0.0
    sizeBytes: 76423491
  - names:
    - ghcr.io/aquasecurity/trivy@sha256:a8ae69a1080249817e2ee78bd1f97d0cebbbff21bc2bde495c142595dc34452f
    - ghcr.io/aquasecurity/trivy:0.50.2
    sizeBytes: 63755300
  - names:
    - docker.io/rancher/k3s-upgrade@sha256:1716c1ae84ee1b439fbac7c6aa0800703dac13c1c53a8a7ab3f6a9e1fea34b78
    - docker.io/rancher/k3s-upgrade:v1.29.6-k3s2
    sizeBytes: 63342372
  - names:
    - docker.io/hetznercloud/hcloud-csi-driver@sha256:6796ff564d403efb8696d5df9e8085c64c225db4974b8b36dd3c63aacd652fe4
    - docker.io/hetznercloud/hcloud-csi-driver:v2.7.1
    sizeBytes: 33537363
  - names:
    - docker.io/hetznercloud/hcloud-csi-driver@sha256:ca7745faf7ef9c478204382ef98cb1339176a64592331de1668c297a920564c7
    - docker.io/hetznercloud/hcloud-csi-driver:v2.8.0
    sizeBytes: 32984005
  - names:
    - docker.io/grafana/loki@sha256:757b5fadf816a1396f1fea598152947421fa49cb8b2db1ddd2a6e30fae003253
    - docker.io/grafana/loki:3.0.0
    sizeBytes: 26358865
  - names:
    - public.ecr.aws/docker/library/redis@sha256:c8bb255c3559b3e458766db810aa7b3c7af1235b204cfdb304e79ff388fe1a5a
    - public.ecr.aws/docker/library/redis:7.2.4-alpine
    sizeBytes: 18841207
  - names:
    - ghcr.io/kubereboot/kured@sha256:bab89b4e71d7007c64f445e5f13abe5453ea4d9b497944d78cd3a68ddd21f21d
    - ghcr.io/kubereboot/kured:1.16.0
    sizeBytes: 17417002
  - names:
    - docker.io/rancher/kubectl@sha256:9be095ca0bbc74e8947a1d4a0258875304b590057d858eb9738de000f88a473e
    - docker.io/rancher/kubectl:v1.25.4
    sizeBytes: 14428200
  - names:
    - docker.io/grafana/loki-canary@sha256:28d7c00588aa43d24b84fce49a8c39e11eaadf5011c3460e64c81490fcfd963d
    - docker.io/grafana/loki-canary:3.0.0
    sizeBytes: 13838844
  - names:
    - quay.io/prometheus/node-exporter@sha256:fa7fa12a57eff607176d5c363d8bb08dfbf636b36ac3cb5613a202f3c61a6631
    - quay.io/prometheus/node-exporter:v1.8.1
    sizeBytes: 12035093
  - names:
    - registry.k8s.io/sig-storage/csi-node-driver-registrar@sha256:4a4cae5118c4404e35d66059346b7fa0835d7e6319ff45ed73f4bba335cf5183
    - registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0
    sizeBytes: 10147874
  - names:
    - registry.k8s.io/sig-storage/livenessprobe@sha256:2b10b24dafdc3ba94a03fc94d9df9941ca9d6a9207b927f5dfd21d59fbe05ba0
    - registry.k8s.io/sig-storage/livenessprobe:v2.9.0
    sizeBytes: 9194114
  - names:
    - docker.io/rancher/mirrored-pause@sha256:74c4244427b7312c5b901fe0f67cbc53683d06f4f24c6faee65d4182bf0fa893
    - docker.io/rancher/mirrored-pause:3.6
    sizeBytes: 301463
  nodeInfo:
    architecture: amd64
    bootID: 07c343af-0d09-4f73-88db-4554bff8dc59
    containerRuntimeVersion: containerd://1.7.17-k3s1
    kernelVersion: 6.8.6-1-default
    kubeProxyVersion: v1.29.6+k3s2
    kubeletVersion: v1.29.6+k3s2
    machineID: 1e2cab7fe6ee4c4c8254a441743a2f56
    operatingSystem: linux
    osImage: openSUSE MicroOS
    systemUUID: 8cf4f628-9395-4797-aa4c-b16188f08775
  volumesAttached:
  - devicePath: ""
    name: kubernetes.io/csi/csi.hetzner.cloud^100869160
  volumesInUse:
  - kubernetes.io/csi/csi.hetzner.cloud^100869160
  - kubernetes.io/csi/csi.hetzner.cloud^101021545

The node to which the volume is attached is this one, which has the above fields matching but it's not the one it's supposed to attach to:

apiVersion: v1
kind: Node
metadata:
  annotations:
    alpha.kubernetes.io/provided-node-ip: 10.0.0.101
    csi.volume.kubernetes.io/nodeid: '{"csi.hetzner.cloud":"51177823"}'
    k3s.io/node-args: '["agent","--flannel-iface","eth1","--kubelet-arg","cloud-provider=external","--kubelet-arg","volume-plugin-dir=/var/lib/kubelet/volumeplugins","--kubelet-arg","kube-reserved=cpu=50m,memory=300Mi,ephemeral-storage=1Gi","--kubelet-arg","system-reserved=cpu=250m,memory=300Mi","--node-ip","10.0.0.101","--node-label","k3s_upgrade=true","--node-label","node.kubernetes.io/role=egress","--node-name","hetzner-02-88surfv3-egress-svw","--node-taint","node.cilium.io/agent-not-ready:NoExecute","--node-taint","node.kubernetes.io/role=egress:NoSchedule","--selinux","true","--server","https://10.255.0.1:6443","--token","********"]'
    k3s.io/node-config-hash: D577OSQQ476WJINWCFELDYZOPM5RGA3ZFQHBKTPKFW42GHRZD24A====
    k3s.io/node-env: '{"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/c38ba7cc1669e7d80b8156ae743932fd86f5bce3871b8a88bef531dd4e3c02b2"}'
    node.alpha.kubernetes.io/ttl: "0"
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2024-08-01T16:51:53Z"
  finalizers:
  - wrangler.cattle.io/node
  - wrangler.cattle.io/managed-etcd-controller
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/instance-type: ccx13
    beta.kubernetes.io/os: linux
    csi.hetzner.cloud/location: ash
    failure-domain.beta.kubernetes.io/region: ash
    failure-domain.beta.kubernetes.io/zone: ash-dc1
    k3s_upgrade: "true"
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: hetzner-02-88surfv3-egress-svw
    kubernetes.io/os: linux
    node.kubernetes.io/instance-type: ccx13
    node.kubernetes.io/role: egress
    plan.upgrade.cattle.io/k3s-agent: 0b60512d8a6929695a4d737eb94f0cb82e8fbb1babcd4176f94d073f
    topology.kubernetes.io/region: ash
    topology.kubernetes.io/zone: ash-dc1
  name: hetzner-02-88surfv3-egress-svw
  resourceVersion: "41561182"
  uid: 1edc5d9f-6508-47f0-bbbe-df2bf56eea09
spec:
  podCIDR: 10.42.5.0/24
  podCIDRs:
  - 10.42.5.0/24
  providerID: hcloud://51177823
status:
  addresses:
  - address: 10.0.0.101
    type: InternalIP
  - address: hetzner-02-88surfv3-egress-svw
    type: Hostname
  - address: 178.156.130.47
    type: ExternalIP
  allocatable:
    cpu: 1700m
    ephemeral-storage: "76730315715"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 7323880Ki
    pods: "110"
  capacity:
    cpu: "2"
    ephemeral-storage: 79979500Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 7938280Ki
    pods: "110"
  conditions:
  - lastHeartbeatTime: "2024-08-01T16:52:15Z"
    lastTransitionTime: "2024-08-01T16:52:15Z"
    message: Cilium is running on this node
    reason: CiliumIsUp
    status: "False"
    type: NetworkUnavailable
  - lastHeartbeatTime: "2024-08-06T15:50:09Z"
    lastTransitionTime: "2024-08-03T19:53:33Z"
    message: kubelet has sufficient memory available
    reason: KubeletHasSufficientMemory
    status: "False"
    type: MemoryPressure
  - lastHeartbeatTime: "2024-08-06T15:50:09Z"
    lastTransitionTime: "2024-08-03T19:53:33Z"
    message: kubelet has no disk pressure
    reason: KubeletHasNoDiskPressure
    status: "False"
    type: DiskPressure
  - lastHeartbeatTime: "2024-08-06T15:50:09Z"
    lastTransitionTime: "2024-08-03T19:53:33Z"
    message: kubelet has sufficient PID available
    reason: KubeletHasSufficientPID
    status: "False"
    type: PIDPressure
  - lastHeartbeatTime: "2024-08-06T15:50:09Z"
    lastTransitionTime: "2024-08-03T19:53:33Z"
    message: kubelet is posting ready status
    reason: KubeletReady
    status: "True"
    type: Ready
  daemonEndpoints:
    kubeletEndpoint:
      Port: 10250
  images:
  - names:
    - quay.io/cilium/cilium@sha256:351d6685dc6f6ffbcd5451043167cfa8842c6decf80d8c8e426a417c73fb56d4
    sizeBytes: 208222582
  - names:
    - docker.io/grafana/promtail@sha256:d3de3da9431cfbe74a6a94555050df5257f357e827be8e63f8998d509c37af8b
    - docker.io/grafana/promtail:3.0.0
    sizeBytes: 76423491
  - names:
    - ghcr.io/aquasecurity/trivy@sha256:a8ae69a1080249817e2ee78bd1f97d0cebbbff21bc2bde495c142595dc34452f
    - ghcr.io/aquasecurity/trivy:0.50.2
    sizeBytes: 63755300
  - names:
    - docker.io/rancher/k3s-upgrade@sha256:1716c1ae84ee1b439fbac7c6aa0800703dac13c1c53a8a7ab3f6a9e1fea34b78
    - docker.io/rancher/k3s-upgrade:v1.29.6-k3s2
    sizeBytes: 63342372
  - names:
    - docker.io/hetznercloud/hcloud-csi-driver@sha256:6796ff564d403efb8696d5df9e8085c64c225db4974b8b36dd3c63aacd652fe4
    - docker.io/hetznercloud/hcloud-csi-driver:v2.7.1
    sizeBytes: 33537363
  - names:
    - docker.io/hetznercloud/hcloud-csi-driver@sha256:ca7745faf7ef9c478204382ef98cb1339176a64592331de1668c297a920564c7
    - docker.io/hetznercloud/hcloud-csi-driver:v2.8.0
    sizeBytes: 32984005
  - names:
    - quay.io/cilium/operator-generic@sha256:819c7281f5a4f25ee1ce2ec4c76b6fbc69a660c68b7825e9580b1813833fa743
    sizeBytes: 26346322
  - names:
    - ghcr.io/kubereboot/kured@sha256:bab89b4e71d7007c64f445e5f13abe5453ea4d9b497944d78cd3a68ddd21f21d
    - ghcr.io/kubereboot/kured:1.16.0
    sizeBytes: 17417002
  - names:
    - docker.io/rancher/kubectl@sha256:9be095ca0bbc74e8947a1d4a0258875304b590057d858eb9738de000f88a473e
    - docker.io/rancher/kubectl:v1.25.4
    sizeBytes: 14428200
  - names:
    - docker.io/grafana/loki-canary@sha256:28d7c00588aa43d24b84fce49a8c39e11eaadf5011c3460e64c81490fcfd963d
    - docker.io/grafana/loki-canary:3.0.0
    sizeBytes: 13838844
  - names:
    - quay.io/prometheus/node-exporter@sha256:fa7fa12a57eff607176d5c363d8bb08dfbf636b36ac3cb5613a202f3c61a6631
    - quay.io/prometheus/node-exporter:v1.8.1
    sizeBytes: 12035093
  - names:
    - registry.k8s.io/sig-storage/csi-node-driver-registrar@sha256:4a4cae5118c4404e35d66059346b7fa0835d7e6319ff45ed73f4bba335cf5183
    - registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0
    sizeBytes: 10147874
  - names:
    - registry.k8s.io/sig-storage/livenessprobe@sha256:2b10b24dafdc3ba94a03fc94d9df9941ca9d6a9207b927f5dfd21d59fbe05ba0
    - registry.k8s.io/sig-storage/livenessprobe:v2.9.0
    sizeBytes: 9194114
  - names:
    - docker.io/rancher/mirrored-pause@sha256:74c4244427b7312c5b901fe0f67cbc53683d06f4f24c6faee65d4182bf0fa893
    - docker.io/rancher/mirrored-pause:3.6
    sizeBytes: 301463
  nodeInfo:
    architecture: amd64
    bootID: 7f4c5067-e838-4e7f-abd7-4876ea5dd584
    containerRuntimeVersion: containerd://1.7.17-k3s1
    kernelVersion: 6.10.2-1-default
    kubeProxyVersion: v1.29.6+k3s2
    kubeletVersion: v1.29.6+k3s2
    machineID: 1e2cab7fe6ee4c4c8254a441743a2f56
    operatingSystem: linux
    osImage: openSUSE MicroOS
    systemUUID: 74ebc62c-0623-4187-96e3-5bd65d8966dc

So it seems that there's a bug with the annotation csi.volume.kubernetes.io/nodeid which points to a node id belonging to other node, and the VolumeAttachment resource follows this and ends up performing the wrong attachment.

I restarted the hcloud DeamonSet as well as the StatefulSet which claims the volumes but the effect is the same. Also tried deleteing the VolumeAttachment but it re-attaches to the same wrong node when recreated.

Minimal working example

This is just a StatefulSet running on k3s using the hcloud-csi storage provider.

One remark, the nodes with wrong csi.volume.kubernetes.io/nodeid are part of an autoscaling group.

Log output

No response

Additional information

It is worth noting that the node that has the node-id that other nodes are also reporting in their annotations is running Cilium Egress gateway to use its IP address as SNAT for the internet.

gmautner commented 1 month ago

Culprit found. I excluded the hcloud-csi Daemonset from the Cilium Egress Policy and bingo, the node-ids were correctly attributed. I guess I'll recommend documenting it in the kube-hetzner Terraform provider, where this setup came from.

apricote commented 1 month ago

The DaemonSet pods will query the metadata service for the server id. If this IP (169.254.169.254) is being forwarded to some other node, the ID you see in Kubernetes will be wrong, and Volumes will be attached to the wrong nodes.