Single-node (WIP) cluster can't schedule controller

IngwiePhoenix commented 5 months ago

(Yep, I did read the template; but for some odd reason I am not seing the signup verification email. I am pretty sure it's a layer 8 problem... so, apologies in advance!)

Hello! I am trying to bootstrap the NFS-CSI driver off the helm chart in a k3s cluster - only one node for now, I intend to grow it to a few more once I have my base config figured out. But, this means that this message:

kube-system   0s                     Warning   FailedScheduling                 Pod/csi-nfs-controller-59b87c6c7c-ktfh7    0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.

isn't helping a whole lot. So I have tried to get rid of this but no matter to what I set controller.tolerations, I keep getting that warning.

First, here's my HelmChart and values as kubectl applyd to the k3s node:

apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  name: nfs-csi-chart
  namespace: kube-system
spec:
  repo: https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
  chart: csi-driver-nfs
  #version: latest
  targetNamespace: kube-system
  valuesContent: |-
    serviceAccount:
      create: true # When true, service accounts will be created for you. Set to false if you want to use your own.
      # controller: csi-nfs-controller-sa # Name of Service Account to be created or used
      # node: csi-nfs-node-sa # Name of Service Account to be created or used

    rbac:
      create: true
      name: nfs

    driver:
      name: nfs.csi.k8s.io
      mountPermissions: 0

    feature:
      enableFSGroupPolicy: true
      enableInlineVolume: false
      propagateHostMountOptions: false

    # do I have to change that?; k3s on /mnt/usb/k3s but no kubelet dir
    kubeletDir: /var/lib/kubelet

    controller:
      # TODO: do i need to true them?
      runOnControlPlane: true
      runOnMaster: true
      logLevel: 5
      workingMountDir: /tmp
      defaultOnDeletePolicy: retain  # available values: delete, retain
      priorityClassName: system-cluster-critical
      # FIXME: better solution???
      tolerations: []
    node:
      name: csi-nfs-node

    # TODO: sync to backup
    externalSnapshotter:
      enabled: false
      name: snapshot-controller
      priorityClassName: system-cluster-critical
      # Create volume snapshot CRDs.
      customResourceDefinitions:
        enabled: true   #if set true, VolumeSnapshot, VolumeSnapshotContent and VolumeSnapshotClass CRDs will be created. Set it false, If they already exist in cluster.

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-bunker
provisioner: nfs.csi.k8s.io
parameters:
  # alt. use tailscale IP
  server: 192.168.1.2
  share: /mnt/vol1/Services/k3s
reclaimPolicy: Retain
volumeBindingMode: Immediate
mountOptions:
  - nfsvers=4.1

When I look at the generated pod that throws the error, I can see the tolerations right then and there:

  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoSchedule
    key: node-role.kubernetes.io/controlplane
    operator: Exists
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300

Is there something I overlooked to make the controller properly schedule onto my node? Looking at the node itself shows the related taints:

Node spec

``` # kubectl get node/routerboi -o yaml apiVersion: v1 kind: Node metadata: annotations: alpha.kubernetes.io/provided-node-ip: 192.168.1.3 csi.volume.kubernetes.io/nodeid: '{"nfs.csi.k8s.io":"routerboi"}' etcd.k3s.cattle.io/local-snapshots-timestamp: "2024-04-21T04:19:08+02:00" etcd.k3s.cattle.io/node-address: 192.168.1.3 etcd.k3s.cattle.io/node-name: routerboi-a33ea14d flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"de:b0:64:00:55:cf"}' flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: "true" flannel.alpha.coreos.com/public-ip: 100.64.0.2 flannel.alpha.coreos.com/public-ip-overwrite: 100.64.0.2 k3s.io/encryption-config-hash: start-70fb6f5afe422f096fc74aa91ff0998185377373139914e3aeaa9d20999adf8f k3s.io/external-ip: 100.64.0.2 k3s.io/hostname: cluserboi k3s.io/internal-ip: 192.168.1.3 k3s.io/node-args: '["server","--log","/var/log/k3s.log","--token","********","--write-kubeconfig-mode","600","--cluster-init","true","--cluster-domain","kube.birb.it","--flannel-external-ip","true","--etcd-snapshot-compress","true","--secrets-encryption","true","--data-dir","/mnt/usb/k3s","--node-external-ip","100.64.0.2","--node-label","node-location=home","--node-name","routerboi","--default-local-storage-path","/mnt/usb/k3s-data"]' k3s.io/node-config-hash: 7FJHCLEHT5LLPFFY5MHTC4FNIGPUD3EZI2YWWAVNCRX4UCF2TZZA==== k3s.io/node-env: '{"K3S_DATA_DIR":"/mnt/usb/k3s/data/7ddd49d3724e00d95d2af069d3247eaeb6635abe80397c8d94d4053dd02ab88d"}' node.alpha.kubernetes.io/ttl: "0" volumes.kubernetes.io/controller-managed-attach-detach: "true" creationTimestamp: "2024-04-20T20:07:06Z" finalizers: - wrangler.cattle.io/node - wrangler.cattle.io/managed-etcd-controller labels: beta.kubernetes.io/arch: arm64 beta.kubernetes.io/instance-type: k3s beta.kubernetes.io/os: linux kubernetes.io/arch: arm64 kubernetes.io/hostname: routerboi kubernetes.io/os: linux node-location: home node-role.kubernetes.io/control-plane: "true" node-role.kubernetes.io/etcd: "true" node-role.kubernetes.io/master: "true" node.kubernetes.io/instance-type: k3s name: routerboi resourceVersion: "72651" uid: b4e6ff71-c631-4f20-a61f-ef578cf2749d spec: podCIDR: 10.42.0.0/24 podCIDRs: - 10.42.0.0/24 providerID: k3s://routerboi status: addresses: - address: 192.168.1.3 type: InternalIP - address: 100.64.0.2 type: ExternalIP - address: cluserboi type: Hostname allocatable: cpu: "8" ephemeral-storage: "28447967825" hugepages-1Gi: "0" hugepages-2Mi: "0" hugepages-32Mi: "0" hugepages-64Ki: "0" memory: 8131288Ki pods: "110" capacity: cpu: "8" ephemeral-storage: 29243388Ki hugepages-1Gi: "0" hugepages-2Mi: "0" hugepages-32Mi: "0" hugepages-64Ki: "0" memory: 8131288Ki pods: "110" conditions: - lastHeartbeatTime: "2024-04-21T03:12:33Z" lastTransitionTime: "2024-04-20T20:07:16Z" message: Node is a voting member of the etcd cluster reason: MemberNotLearner status: "True" type: EtcdIsVoter - lastHeartbeatTime: "2024-04-21T03:13:06Z" lastTransitionTime: "2024-04-20T20:07:06Z" message: kubelet has sufficient memory available reason: KubeletHasSufficientMemory status: "False" type: MemoryPressure - lastHeartbeatTime: "2024-04-21T03:13:06Z" lastTransitionTime: "2024-04-20T20:07:06Z" message: kubelet has no disk pressure reason: KubeletHasNoDiskPressure status: "False" type: DiskPressure - lastHeartbeatTime: "2024-04-21T03:13:06Z" lastTransitionTime: "2024-04-20T20:07:06Z" message: kubelet has sufficient PID available reason: KubeletHasSufficientPID status: "False" type: PIDPressure - lastHeartbeatTime: "2024-04-21T03:13:06Z" lastTransitionTime: "2024-04-20T22:19:01Z" message: kubelet is posting ready status. AppArmor enabled reason: KubeletReady status: "True" type: Ready daemonEndpoints: kubeletEndpoint: Port: 10250 images: - names: - docker.io/rancher/klipper-helm@sha256:87db3ad354905e6d31e420476467aefcd8f37d071a8f1c8a904f4743162ae546 - docker.io/rancher/klipper-helm:v0.8.3-build20240228 sizeBytes: 84105730 - names: - docker.io/vaultwarden/server@sha256:edb8e2bab9cbca22e555638294db9b3657ffbb6e5d149a29d7ccdb243e3c71e0 - docker.io/vaultwarden/server:latest sizeBytes: 66190948 - names: - registry.k8s.io/sig-storage/nfsplugin@sha256:54b97b7ec30ca185c16e8c40e84fc527a7fc5cc8e9f7ea6b857a7a67655fff54 - registry.k8s.io/sig-storage/nfsplugin:v4.6.0 sizeBytes: 63690685 - names: - docker.io/rancher/mirrored-library-traefik@sha256:ca9c8fbe001070c546a75184e3fd7f08c3e47dfc1e89bff6fe2edd302accfaec - docker.io/rancher/mirrored-library-traefik:2.10.5 sizeBytes: 40129288 - names: - docker.io/rancher/mirrored-metrics-server@sha256:20b8b36f8cac9e25aa2a0ff35147b13643bfec603e7e7480886632330a3bbc59 - docker.io/rancher/mirrored-metrics-server:v0.7.0 sizeBytes: 17809919 - names: - docker.io/rancher/local-path-provisioner@sha256:aee53cadc62bd023911e7f077877d047c5b3c269f9bba25724d558654f43cea0 - docker.io/rancher/local-path-provisioner:v0.0.26 sizeBytes: 15933947 - names: - docker.io/rancher/mirrored-coredns-coredns@sha256:a11fafae1f8037cbbd66c5afa40ba2423936b72b4fd50a7034a7e8b955163594 - docker.io/rancher/mirrored-coredns-coredns:1.10.1 sizeBytes: 14556850 - names: - registry.k8s.io/sig-storage/livenessprobe@sha256:5baeb4a6d7d517434292758928bb33efc6397368cbb48c8a4cf29496abf4e987 - registry.k8s.io/sig-storage/livenessprobe:v2.12.0 sizeBytes: 12635307 - names: - registry.k8s.io/sig-storage/csi-node-driver-registrar@sha256:c53535af8a7f7e3164609838c4b191b42b2d81238d75c1b2a2b582ada62a9780 - registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.10.0 sizeBytes: 10291112 - names: - docker.io/rancher/klipper-lb@sha256:558dcf96bf0800d9977ef46dca18411752618cd9dd06daeb99460c0a301d0a60 - docker.io/rancher/klipper-lb:v0.4.7 sizeBytes: 4939041 - names: - docker.io/library/busybox@sha256:c3839dd800b9eb7603340509769c43e146a74c63dca3045a8e7dc8ee07e53966 - docker.io/rancher/mirrored-library-busybox@sha256:0d2d5aa0a465e06264b1e68a78b6d2af5df564504bde485ae995f8e73430bca2 - docker.io/library/busybox:latest - docker.io/rancher/mirrored-library-busybox:1.36.1 sizeBytes: 1848702 - names: - docker.io/rancher/mirrored-pause@sha256:74c4244427b7312c5b901fe0f67cbc53683d06f4f24c6faee65d4182bf0fa893 - docker.io/rancher/mirrored-pause:3.6 sizeBytes: 253243 nodeInfo: architecture: arm64 bootID: 198115b5-8292-4d8d-91ef-5faf2ea60504 containerRuntimeVersion: containerd://1.7.11-k3s2 kernelVersion: 6.8.7-edge-rockchip-rk3588 kubeProxyVersion: v1.29.3+k3s1 kubeletVersion: v1.29.3+k3s1 machineID: 28b5d8681b21493b87f17ffeb6fcb5b7 operatingSystem: linux osImage: Armbian 24.5.0-trunk.446 bookworm systemUUID: 28b5d8681b21493b87f17ffeb6fcb5b7 ```

Do you perhaps see something that I missed?

Thank you and kind regards, Ingwie

andyzhangx commented 4 months ago

have you resolved this issue?

gmatiukhin commented 3 months ago

Ran into the same issue today when setting up version 4.7.0 on my k3s cluster. I also had both controller.runOnMaster and controller.runOnControlPlane set to true.

When doing kubectl describe pod -l app=csi-nfs-controller this was the Node-Selectors part of it:

Node-Selectors:              kubernetes.io/os=linux
                             node-role.kubernetes.io/control-plane=
                             node-role.kubernetes.io/master=

Which seems to be the correct behavior according to the template.

However my master node has the following labels:

node-role.kubernetes.io/control-plane=true
node-role.kubernetes.io/master=true

Setting controller.runOnMaster and controller.runOnControlPlane to false and then specifying controller.nodeSelector manually like this works:

nodeSelector:
  node-role.kubernetes.io/control-plane: "true"
  node-role.kubernetes.io/master: "true"

There's already a PR #603 addressing this, but as was already mentioned in the comment there, using nodeSelector is not the best solution as master nodes may be labeled with either "true" or "".

k8s-triage-robot commented 3 weeks ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

kubernetes-csi / csi-driver-nfs

Single-node (WIP) cluster can't schedule controller #654