Document usage of AWS cloud provider for v1.23+

rancher-max commented 2 years ago

Users should either begin migrating to out-of-tree aws cloud provider or set CSIMigrationAWS feature-gate to false. More testing is needed to determine the full impact, but currently if a user brings up a fresh cluster using v1.23.4+rke2r1 and sets config option cloud-provider-name: aws, then by default it will not deploy the EBS CSI provisioner, so the in-tree cloud provider will not provision EBS volumes.

It is unclear currently what happens on an upgrade to v1.23; more testing will give us this information and allow us to make recommendations.

As in-tree cloud providers are being removed upstream in v1.24, now might be the time to encourage users to start migrating before even upgrading to v1.23.

rancher-max commented 2 years ago

It looks like enabling the feature gate works and creates the volume in AWS EBS correctly. However, when I create a pod to use that volume, it stays stuck in pending.

The only event in the pod is: 0/4 nodes are available: 4 node(s) had volume node affinity conflict.

logs in kube-scheduler show:

I0311 04:10:18.717375       1 trace.go:205] Trace[123574548]: "Scheduling" namespace:default,name:hello-7bcd65fcf-z8j2s (11-Mar-2022 04:10:18.580) (total time: 122ms):
Trace[123574548]: ---"Prioritizing done" 122ms (04:10:18.703)
Trace[123574548]: [122.627117ms] [122.627117ms] END
E0311 04:25:55.651051       1 scheduler.go:487] "Error selecting node for pod" err="running PreFilter plugin \"VolumeBinding\": error getting PVC \"default/ebs-claim\": could not find v1.PersistentVolumeClaim \"default/ebs-claim\"" pod="default/mypod"
E0311 04:25:55.671083       1 factory.go:225] "Error scheduling pod; retrying" err="running PreFilter plugin \"VolumeBinding\": error getting PVC \"default/ebs-claim\": could not find v1.PersistentVolumeClaim \"default/ebs-claim\"" pod="default/mypod"

The pvc ebs-claim does exist in the default namespace.

The pv associated with it has the correct availability zone (matches what both the ebs volume is and all the nodes):

Node Affinity:     
  Required Terms:  
    Term 0:        topology.kubernetes.io/zone in [us-east-2a]
                   topology.kubernetes.io/region in [us-east-2]

Tried some additional things from https://stackoverflow.com/questions/51946393/kubernetes-pod-warning-1-nodes-had-volume-node-affinity-conflict which didn’t resolve this for me.

brandond commented 2 years ago

Did you create the volume manually and then try to bind it with a PVC? I usually just create a PVC for the pod and let it create the PV for itself.

rancher-max commented 2 years ago

Nope just the PVC. Here is what I deployed:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: sctest
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
  iopsPerGB: "10"
  fsType: ext4
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ebs-claim
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: sctest
  resources:
    requests:
      storage: 4Gi

apiVersion: "v1"
kind: "Pod"
metadata:
  name: "mypod"
  labels:
    name: "frontendhttp"
spec:
  containers:
    -
      name: "myfrontend"
      image: openshift/hello-openshift
      ports:
        -
          containerPort: 80
          name: "http-server"
      volumeMounts:
        -
          mountPath: "/var/www/html"
          name: "pvol"
  volumes:
    -
      name: "pvol"
      persistentVolumeClaim:
        claimName: "ebs-claim"

brandond commented 2 years ago

Huh, that's odd. But you can do kubectl get -n default pvc ebs-claim -o yaml and it shows up?

If it's working, the PVC should of course exist, and there should also be a PV bound to the PVC.

rancher-max commented 2 years ago

Yep it's all there:

$ kubectl get -n default pvc ebs-claim -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"ebs-claim","namespace":"default"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"4Gi"}},"storageClassName":"sctest"}}
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
    volume.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
  creationTimestamp: "2022-03-11T04:25:55Z"
  finalizers:
  - kubernetes.io/pvc-protection
  name: ebs-claim
  namespace: default
  resourceVersion: "29773"
  uid: 6ecacdd4-d0bb-4a72-a076-4f2a72d9a276
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 4Gi
  storageClassName: sctest
  volumeMode: Filesystem
  volumeName: pvc-6ecacdd4-d0bb-4a72-a076-4f2a72d9a276
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 4Gi
  phase: Bound

$ k get all,sc,pvc,pv,volumeattachments
NAME        READY   STATUS    RESTARTS   AGE
pod/mypod   0/1     Pending   0          13h

NAME                 TYPE           CLUSTER-IP     EXTERNAL-IP                                                              PORT(S)        AGE
service/hello        LoadBalancer   10.43.46.130   a689605ae260f4485ad3bd49be3c0bd0-472934356.us-east-2.elb.amazonaws.com   80:31722/TCP   13h
service/kubernetes   ClusterIP      10.43.0.1      <none>                                                                   443/TCP        16h

NAME                                 PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
storageclass.storage.k8s.io/sctest   kubernetes.io/aws-ebs   Delete          Immediate           false                  13h

NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/ebs-claim   Bound    pvc-6ecacdd4-d0bb-4a72-a076-4f2a72d9a276   4Gi        RWO            sctest         13h

NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM               STORAGECLASS   REASON   AGE
persistentvolume/pvc-6ecacdd4-d0bb-4a72-a076-4f2a72d9a276   4Gi        RWO            Delete           Bound    default/ebs-claim   sctest                  13h

brandond commented 2 years ago

can you dump the nodes and PVs as yaml as well?

rancher-max commented 2 years ago

$ k get nodes,pv -o yaml
apiVersion: v1
items:
- apiVersion: v1
  kind: Node
  metadata:
    annotations:
      etcd.rke2.cattle.io/node-address: 172.31.10.149
      etcd.rke2.cattle.io/node-name: ip-172-31-10-149.us-east-2.compute.internal-0213ded2
      flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"fe:0e:58:f2:6a:71"}'
      flannel.alpha.coreos.com/backend-type: vxlan
      flannel.alpha.coreos.com/kube-subnet-manager: "true"
      flannel.alpha.coreos.com/public-ip: 172.31.10.149
      node.alpha.kubernetes.io/ttl: "0"
      projectcalico.org/IPv4Address: 172.31.10.149/20
      projectcalico.org/IPv4IPIPTunnelAddr: 10.42.1.1
      rke2.io/encryption-config-hash: start-5a0c83ff4a9e0af5841422818a5fd2192fe28509a2a2d90957ac5004c0d27d10
      rke2.io/node-args: '["server","--write-kubeconfig-mode","0644","--tls-san","<redacted>","--server","https://<redacted>:9345","--token","********","--node-name","ip-172-31-10-149.us-east-2.compute.internal","--cloud-provider-name","aws","--profile","cis-1.6","--selinux","true","--kube-controller-manager-arg","feature-gates=CSIMigrationAWS=false"]'
      rke2.io/node-config-hash: 652PDUSYNM7EFWOH6JTHSYVNDA7TS223GEUFW3OLXJFDFBHMMF7A====
      rke2.io/node-env: '{"RKE2_SELINUX":"true"}'
      volumes.kubernetes.io/controller-managed-attach-detach: "true"
    creationTimestamp: "2022-03-11T01:09:00Z"
    finalizers:
    - wrangler.cattle.io/node
    - wrangler.cattle.io/managed-etcd-controller
    - wrangler.cattle.io/cisnetworkpolicy-node
    labels:
      beta.kubernetes.io/arch: amd64
      beta.kubernetes.io/instance-type: t3.medium
      beta.kubernetes.io/os: linux
      failure-domain.beta.kubernetes.io/region: us-east-2
      failure-domain.beta.kubernetes.io/zone: us-east-2a
      kubernetes.io/arch: amd64
      kubernetes.io/hostname: ip-172-31-10-149.us-east-2.compute.internal
      kubernetes.io/os: linux
      node-role.kubernetes.io/control-plane: "true"
      node-role.kubernetes.io/etcd: "true"
      node-role.kubernetes.io/master: "true"
      node.kubernetes.io/instance-type: t3.medium
      topology.kubernetes.io/region: us-east-2
      topology.kubernetes.io/zone: us-east-2a
    name: ip-172-31-10-149.us-east-2.compute.internal
    resourceVersion: "145219"
    uid: 72718669-985b-4df6-984f-272ce1cb31ba
  spec:
    podCIDR: 10.42.1.0/24
    podCIDRs:
    - 10.42.1.0/24
    providerID: aws:///us-east-2a/i-03f5829de5496d775
  status:
    addresses:
    - address: 172.31.10.149
      type: InternalIP
    - address: <redacted>
      type: ExternalIP
    - address: ip-172-31-10-149.us-east-2.compute.internal
      type: Hostname
    - address: ip-172-31-10-149.us-east-2.compute.internal
      type: InternalDNS
    - address: <redacted>
      type: ExternalDNS
    allocatable:
      attachable-volumes-aws-ebs: "25"
      cpu: "2"
      ephemeral-storage: "20389121418"
      hugepages-1Gi: "0"
      hugepages-2Mi: "0"
      memory: 3764784Ki
      pods: "110"
    capacity:
      attachable-volumes-aws-ebs: "25"
      cpu: "2"
      ephemeral-storage: 20959212Ki
      hugepages-1Gi: "0"
      hugepages-2Mi: "0"
      memory: 3764784Ki
      pods: "110"
    conditions:
    - lastHeartbeatTime: "2022-03-11T01:09:45Z"
      lastTransitionTime: "2022-03-11T01:09:45Z"
      message: Flannel is running on this node
      reason: FlannelIsUp
      status: "False"
      type: NetworkUnavailable
    - lastHeartbeatTime: "2022-03-11T18:11:07Z"
      lastTransitionTime: "2022-03-11T01:09:00Z"
      message: kubelet has sufficient memory available
      reason: KubeletHasSufficientMemory
      status: "False"
      type: MemoryPressure
    - lastHeartbeatTime: "2022-03-11T18:11:07Z"
      lastTransitionTime: "2022-03-11T01:09:00Z"
      message: kubelet has no disk pressure
      reason: KubeletHasNoDiskPressure
      status: "False"
      type: DiskPressure
    - lastHeartbeatTime: "2022-03-11T18:11:07Z"
      lastTransitionTime: "2022-03-11T01:09:00Z"
      message: kubelet has sufficient PID available
      reason: KubeletHasSufficientPID
      status: "False"
      type: PIDPressure
    - lastHeartbeatTime: "2022-03-11T18:11:07Z"
      lastTransitionTime: "2022-03-11T01:09:31Z"
      message: kubelet is posting ready status
      reason: KubeletReady
      status: "True"
      type: Ready
    daemonEndpoints:
      kubeletEndpoint:
        Port: 10250
    images:
    - names:
      - docker.io/rancher/nginx-ingress-controller@sha256:8df436f5ca2748311468c4aa14d55f3ef2cc7811bda56c9bae6ab43dc132b80b
      - docker.io/rancher/nginx-ingress-controller:nginx-1.0.2-hardened2
      sizeBytes: 232186821
    - names:
      - docker.io/rancher/hardened-kubernetes@sha256:14288ba19b762f471a88e1d78779f7653e785032d99464bf0f5d57c0f4ceec21
      - docker.io/rancher/hardened-kubernetes:v1.23.4-rke2r1-build20220217
      sizeBytes: 223545879
    - names:
      - docker.io/rancher/hardened-calico@sha256:69fc28d2398a747fc15019e606b45bbc2ccc2d03343b0b7cefc4328d2842ddac
      - docker.io/rancher/hardened-calico:v3.21.4-build20220208
      sizeBytes: 198509698
    - names:
      - docker.io/rancher/hardened-flannel@sha256:f62122114ca136dcccd042e1149264eda4e901b61a0d956b1549afb98786c382
      - docker.io/rancher/hardened-flannel:v0.16.1-build20220119
      sizeBytes: 97290927
    - names:
      - docker.io/rancher/hardened-coredns@sha256:55ed3a4871383cd9fe9d38e0a57b97135fe4369f953a52b254d1eeef36756365
      - docker.io/rancher/hardened-coredns:v1.8.5-build20211119
      sizeBytes: 50744176
    - names:
      - docker.io/rancher/hardened-etcd@sha256:5ce7ea0dd355d9d5f6b9d6d4c1e3453a438bf608792f2f5733e8355eafdb8da8
      - docker.io/rancher/hardened-etcd:v3.5.1-k3s1-build20220112
      sizeBytes: 49055065
    - names:
      - docker.io/rancher/pause@sha256:036d575e82945c112ef84e4585caff3648322a2f9ed4c3a6ce409dd10abc4f34
      - docker.io/rancher/pause:3.6
      sizeBytes: 299396
    nodeInfo:
      architecture: amd64
      bootID: da2852e8-d58c-4e18-9615-1619b678fc2a
      containerRuntimeVersion: containerd://1.5.9-k3s1
      kernelVersion: 4.18.0-348.12.2.el8_5.x86_64
      kubeProxyVersion: v1.23.4+rke2r1
      kubeletVersion: v1.23.4+rke2r1
      machineID: 006336e0740647d6ab66a3143b4851e3
      operatingSystem: linux
      osImage: Red Hat Enterprise Linux 8.5 (Ootpa)
      systemUUID: ec229f98-6bd9-6bdb-1eab-e0ec51a9d865
- apiVersion: v1
  kind: Node
  metadata:
    annotations:
      flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"0a:88:ec:0b:1f:98"}'
      flannel.alpha.coreos.com/backend-type: vxlan
      flannel.alpha.coreos.com/kube-subnet-manager: "true"
      flannel.alpha.coreos.com/public-ip: 172.31.12.67
      node.alpha.kubernetes.io/ttl: "0"
      projectcalico.org/IPv4Address: 172.31.12.67/20
      projectcalico.org/IPv4IPIPTunnelAddr: 10.42.3.1
      rke2.io/node-args: '["agent","--server","https://<redacted>:9345","--token","********","--node-name","ip-172-31-12-67.us-east-2.compute.internal","--cloud-provider-name","aws","--profile","cis-1.6","--selinux","true"]'
      rke2.io/node-config-hash: 3J2VFTLHJYJK5LKAZ2GDA5OAPS5UCP2UVO3EVGZQRJ4TNESRO2UQ====
      rke2.io/node-env: '{"RKE2_SELINUX":"true"}'
      volumes.kubernetes.io/controller-managed-attach-detach: "true"
    creationTimestamp: "2022-03-11T01:14:36Z"
    finalizers:
    - wrangler.cattle.io/node
    - wrangler.cattle.io/managed-etcd-controller
    - wrangler.cattle.io/cisnetworkpolicy-node
    labels:
      beta.kubernetes.io/arch: amd64
      beta.kubernetes.io/instance-type: t3.medium
      beta.kubernetes.io/os: linux
      failure-domain.beta.kubernetes.io/region: us-east-2
      failure-domain.beta.kubernetes.io/zone: us-east-2a
      kubernetes.io/arch: amd64
      kubernetes.io/hostname: ip-172-31-12-67.us-east-2.compute.internal
      kubernetes.io/os: linux
      node.kubernetes.io/instance-type: t3.medium
      topology.kubernetes.io/region: us-east-2
      topology.kubernetes.io/zone: us-east-2a
    name: ip-172-31-12-67.us-east-2.compute.internal
    resourceVersion: "145488"
    uid: 49fc3a6f-56cc-4dc7-95ca-c2527a6f60d6
  spec:
    podCIDR: 10.42.3.0/24
    podCIDRs:
    - 10.42.3.0/24
    providerID: aws:///us-east-2a/i-022bed525a5397d10
  status:
    addresses:
    - address: 172.31.12.67
      type: InternalIP
    - address: <redacted>
      type: ExternalIP
    - address: ip-172-31-12-67.us-east-2.compute.internal
      type: Hostname
    - address: ip-172-31-12-67.us-east-2.compute.internal
      type: InternalDNS
    - address: <redacted>
      type: ExternalDNS
    allocatable:
      attachable-volumes-aws-ebs: "25"
      cpu: "2"
      ephemeral-storage: "20389121418"
      hugepages-1Gi: "0"
      hugepages-2Mi: "0"
      memory: 3764784Ki
      pods: "110"
    capacity:
      attachable-volumes-aws-ebs: "25"
      cpu: "2"
      ephemeral-storage: 20959212Ki
      hugepages-1Gi: "0"
      hugepages-2Mi: "0"
      memory: 3764784Ki
      pods: "110"
    conditions:
    - lastHeartbeatTime: "2022-03-11T01:15:37Z"
      lastTransitionTime: "2022-03-11T01:15:37Z"
      message: Flannel is running on this node
      reason: FlannelIsUp
      status: "False"
      type: NetworkUnavailable
    - lastHeartbeatTime: "2022-03-11T18:13:03Z"
      lastTransitionTime: "2022-03-11T01:14:36Z"
      message: kubelet has sufficient memory available
      reason: KubeletHasSufficientMemory
      status: "False"
      type: MemoryPressure
    - lastHeartbeatTime: "2022-03-11T18:13:03Z"
      lastTransitionTime: "2022-03-11T01:14:36Z"
      message: kubelet has no disk pressure
      reason: KubeletHasNoDiskPressure
      status: "False"
      type: DiskPressure
    - lastHeartbeatTime: "2022-03-11T18:13:03Z"
      lastTransitionTime: "2022-03-11T01:14:36Z"
      message: kubelet has sufficient PID available
      reason: KubeletHasSufficientPID
      status: "False"
      type: PIDPressure
    - lastHeartbeatTime: "2022-03-11T18:13:03Z"
      lastTransitionTime: "2022-03-11T01:15:16Z"
      message: kubelet is posting ready status
      reason: KubeletReady
      status: "True"
      type: Ready
    daemonEndpoints:
      kubeletEndpoint:
        Port: 10250
    images:
    - names:
      - docker.io/rancher/nginx-ingress-controller@sha256:8df436f5ca2748311468c4aa14d55f3ef2cc7811bda56c9bae6ab43dc132b80b
      - docker.io/rancher/nginx-ingress-controller:nginx-1.0.2-hardened2
      sizeBytes: 232186821
    - names:
      - docker.io/rancher/hardened-kubernetes@sha256:14288ba19b762f471a88e1d78779f7653e785032d99464bf0f5d57c0f4ceec21
      - docker.io/rancher/hardened-kubernetes:v1.23.4-rke2r1-build20220217
      sizeBytes: 223545879
    - names:
      - docker.io/rancher/hardened-calico@sha256:69fc28d2398a747fc15019e606b45bbc2ccc2d03343b0b7cefc4328d2842ddac
      - docker.io/rancher/hardened-calico:v3.21.4-build20220208
      sizeBytes: 198509698
    - names:
      - docker.io/rancher/hardened-flannel@sha256:f62122114ca136dcccd042e1149264eda4e901b61a0d956b1549afb98786c382
      - docker.io/rancher/hardened-flannel:v0.16.1-build20220119
      sizeBytes: 97290927
    - names:
      - docker.io/ranchertest/mytestcontainer@sha256:7e418465981575a9abef4ee16a80c562a2d2d171e591c1475c38347ef3ec2a72
      - docker.io/ranchertest/mytestcontainer:unprivileged
      sizeBytes: 75437038
    - names:
      - docker.io/rancher/pause@sha256:036d575e82945c112ef84e4585caff3648322a2f9ed4c3a6ce409dd10abc4f34
      - docker.io/rancher/pause:3.6
      sizeBytes: 299396
    nodeInfo:
      architecture: amd64
      bootID: 8220a0bb-93a1-498e-8e06-16c4d4a9b4cf
      containerRuntimeVersion: containerd://1.5.9-k3s1
      kernelVersion: 4.18.0-348.12.2.el8_5.x86_64
      kubeProxyVersion: v1.23.4+rke2r1
      kubeletVersion: v1.23.4+rke2r1
      machineID: 006336e0740647d6ab66a3143b4851e3
      operatingSystem: linux
      osImage: Red Hat Enterprise Linux 8.5 (Ootpa)
      systemUUID: ec21c076-9c44-1dd1-e2b3-b63cb6fad7d6
- apiVersion: v1
  kind: Node
  metadata:
    annotations:
      etcd.rke2.cattle.io/node-address: 172.31.15.92
      etcd.rke2.cattle.io/node-name: ip-172-31-15-92.us-east-2.compute.internal-fbffe900
      flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"8e:a9:9e:a6:28:43"}'
      flannel.alpha.coreos.com/backend-type: vxlan
      flannel.alpha.coreos.com/kube-subnet-manager: "true"
      flannel.alpha.coreos.com/public-ip: 172.31.15.92
      node.alpha.kubernetes.io/ttl: "0"
      projectcalico.org/IPv4Address: 172.31.15.92/20
      projectcalico.org/IPv4IPIPTunnelAddr: 10.42.2.1
      rke2.io/encryption-config-hash: start-5a0c83ff4a9e0af5841422818a5fd2192fe28509a2a2d90957ac5004c0d27d10
      rke2.io/node-args: '["server","--write-kubeconfig-mode","0644","--tls-san","<redacted>","--server","https://<redacted>:9345","--token","********","--node-name","ip-172-31-15-92.us-east-2.compute.internal","--cloud-provider-name","aws","--profile","cis-1.6","--selinux","true","--kube-controller-manager-arg","feature-gates=CSIMigrationAWS=false"]'
      rke2.io/node-config-hash: MODXA5SEKWX26NZRH5GSBOC6EJLDOTXUW2JUD6ENM7FY35E4TKEA====
      rke2.io/node-env: '{"RKE2_SELINUX":"true"}'
      volumes.kubernetes.io/controller-managed-attach-detach: "true"
    creationTimestamp: "2022-03-11T01:09:26Z"
    finalizers:
    - wrangler.cattle.io/node
    - wrangler.cattle.io/managed-etcd-controller
    - wrangler.cattle.io/cisnetworkpolicy-node
    labels:
      beta.kubernetes.io/arch: amd64
      beta.kubernetes.io/instance-type: t3.medium
      beta.kubernetes.io/os: linux
      failure-domain.beta.kubernetes.io/region: us-east-2
      failure-domain.beta.kubernetes.io/zone: us-east-2a
      kubernetes.io/arch: amd64
      kubernetes.io/hostname: ip-172-31-15-92.us-east-2.compute.internal
      kubernetes.io/os: linux
      node-role.kubernetes.io/control-plane: "true"
      node-role.kubernetes.io/etcd: "true"
      node-role.kubernetes.io/master: "true"
      node.kubernetes.io/instance-type: t3.medium
      topology.kubernetes.io/region: us-east-2
      topology.kubernetes.io/zone: us-east-2a
    name: ip-172-31-15-92.us-east-2.compute.internal
    resourceVersion: "145446"
    uid: 058da6ce-435f-4663-b60f-6be445e8758c
  spec:
    podCIDR: 10.42.2.0/24
    podCIDRs:
    - 10.42.2.0/24
    providerID: aws:///us-east-2a/i-0eecd25629fe35667
  status:
    addresses:
    - address: 172.31.15.92
      type: InternalIP
    - address: <redacted>
      type: ExternalIP
    - address: ip-172-31-15-92.us-east-2.compute.internal
      type: Hostname
    - address: ip-172-31-15-92.us-east-2.compute.internal
      type: InternalDNS
    - address: <redacted>
      type: ExternalDNS
    allocatable:
      attachable-volumes-aws-ebs: "25"
      cpu: "2"
      ephemeral-storage: "20389121418"
      hugepages-1Gi: "0"
      hugepages-2Mi: "0"
      memory: 3764784Ki
      pods: "110"
    capacity:
      attachable-volumes-aws-ebs: "25"
      cpu: "2"
      ephemeral-storage: 20959212Ki
      hugepages-1Gi: "0"
      hugepages-2Mi: "0"
      memory: 3764784Ki
      pods: "110"
    conditions:
    - lastHeartbeatTime: "2022-03-11T01:10:48Z"
      lastTransitionTime: "2022-03-11T01:10:48Z"
      message: Flannel is running on this node
      reason: FlannelIsUp
      status: "False"
      type: NetworkUnavailable
    - lastHeartbeatTime: "2022-03-11T18:12:45Z"
      lastTransitionTime: "2022-03-11T01:09:26Z"
      message: kubelet has sufficient memory available
      reason: KubeletHasSufficientMemory
      status: "False"
      type: MemoryPressure
    - lastHeartbeatTime: "2022-03-11T18:12:45Z"
      lastTransitionTime: "2022-03-11T01:09:26Z"
      message: kubelet has no disk pressure
      reason: KubeletHasNoDiskPressure
      status: "False"
      type: DiskPressure
    - lastHeartbeatTime: "2022-03-11T18:12:45Z"
      lastTransitionTime: "2022-03-11T01:09:26Z"
      message: kubelet has sufficient PID available
      reason: KubeletHasSufficientPID
      status: "False"
      type: PIDPressure
    - lastHeartbeatTime: "2022-03-11T18:12:45Z"
      lastTransitionTime: "2022-03-11T01:10:37Z"
      message: kubelet is posting ready status
      reason: KubeletReady
      status: "True"
      type: Ready
    daemonEndpoints:
      kubeletEndpoint:
        Port: 10250
    images:
    - names:
      - docker.io/rancher/nginx-ingress-controller@sha256:8df436f5ca2748311468c4aa14d55f3ef2cc7811bda56c9bae6ab43dc132b80b
      - docker.io/rancher/nginx-ingress-controller:nginx-1.0.2-hardened2
      sizeBytes: 232186821
    - names:
      - docker.io/rancher/hardened-kubernetes@sha256:14288ba19b762f471a88e1d78779f7653e785032d99464bf0f5d57c0f4ceec21
      - docker.io/rancher/hardened-kubernetes:v1.23.4-rke2r1-build20220217
      sizeBytes: 223545879
    - names:
      - docker.io/rancher/hardened-calico@sha256:69fc28d2398a747fc15019e606b45bbc2ccc2d03343b0b7cefc4328d2842ddac
      - docker.io/rancher/hardened-calico:v3.21.4-build20220208
      sizeBytes: 198509698
    - names:
      - docker.io/rancher/hardened-flannel@sha256:f62122114ca136dcccd042e1149264eda4e901b61a0d956b1549afb98786c382
      - docker.io/rancher/hardened-flannel:v0.16.1-build20220119
      sizeBytes: 97290927
    - names:
      - docker.io/rancher/hardened-etcd@sha256:5ce7ea0dd355d9d5f6b9d6d4c1e3453a438bf608792f2f5733e8355eafdb8da8
      - docker.io/rancher/hardened-etcd:v3.5.1-k3s1-build20220112
      sizeBytes: 49055065
    - names:
      - docker.io/rancher/pause@sha256:036d575e82945c112ef84e4585caff3648322a2f9ed4c3a6ce409dd10abc4f34
      - docker.io/rancher/pause:3.6
      sizeBytes: 299396
    nodeInfo:
      architecture: amd64
      bootID: d9918162-9738-4385-abf4-c76325813bfd
      containerRuntimeVersion: containerd://1.5.9-k3s1
      kernelVersion: 4.18.0-348.12.2.el8_5.x86_64
      kubeProxyVersion: v1.23.4+rke2r1
      kubeletVersion: v1.23.4+rke2r1
      machineID: 006336e0740647d6ab66a3143b4851e3
      operatingSystem: linux
      osImage: Red Hat Enterprise Linux 8.5 (Ootpa)
      systemUUID: ec2470c6-8a37-bf45-9755-24592ea9ced1
- apiVersion: v1
  kind: Node
  metadata:
    annotations:
      etcd.rke2.cattle.io/node-address: 172.31.2.220
      etcd.rke2.cattle.io/node-name: ip-172-31-2-220.us-east-2.compute.internal-48d3e66f
      flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"ae:d4:df:82:84:05"}'
      flannel.alpha.coreos.com/backend-type: vxlan
      flannel.alpha.coreos.com/kube-subnet-manager: "true"
      flannel.alpha.coreos.com/public-ip: 172.31.2.220
      node.alpha.kubernetes.io/ttl: "0"
      projectcalico.org/IPv4Address: 172.31.2.220/20
      projectcalico.org/IPv4IPIPTunnelAddr: 10.42.0.1
      rke2.io/encryption-config-hash: start-5a0c83ff4a9e0af5841422818a5fd2192fe28509a2a2d90957ac5004c0d27d10
      rke2.io/node-args: '["server","--write-kubeconfig-mode","0644","--tls-san","<redacted>","--node-name","ip-172-31-2-220.us-east-2.compute.internal","--cloud-provider-name","aws","--profile","cis-1.6","--selinux","true","--kube-controller-manager-arg","feature-gates=CSIMigrationAWS=false"]'
      rke2.io/node-config-hash: 53EF6IUX5NCWD5UV46AVD5JN7IT7NWX5PKDKG6FDLRYKV7QRENTQ====
      rke2.io/node-env: '{"RKE2_SELINUX":"true"}'
      volumes.kubernetes.io/controller-managed-attach-detach: "true"
    creationTimestamp: "2022-03-11T01:01:34Z"
    finalizers:
    - wrangler.cattle.io/node
    - wrangler.cattle.io/managed-etcd-controller
    - wrangler.cattle.io/cisnetworkpolicy-node
    labels:
      beta.kubernetes.io/arch: amd64
      beta.kubernetes.io/instance-type: t3.medium
      beta.kubernetes.io/os: linux
      failure-domain.beta.kubernetes.io/region: us-east-2
      failure-domain.beta.kubernetes.io/zone: us-east-2a
      kubernetes.io/arch: amd64
      kubernetes.io/hostname: ip-172-31-2-220.us-east-2.compute.internal
      kubernetes.io/os: linux
      node-role.kubernetes.io/control-plane: "true"
      node-role.kubernetes.io/etcd: "true"
      node-role.kubernetes.io/master: "true"
      node.kubernetes.io/instance-type: t3.medium
      topology.kubernetes.io/region: us-east-2
      topology.kubernetes.io/zone: us-east-2a
    name: ip-172-31-2-220.us-east-2.compute.internal
    resourceVersion: "145688"
    uid: 27ea98ee-3308-4476-9b69-5eab3620579e
  spec:
    podCIDR: 10.42.0.0/24
    podCIDRs:
    - 10.42.0.0/24
    providerID: aws:///us-east-2a/i-0b60ba08142a9557a
  status:
    addresses:
    - address: 172.31.2.220
      type: InternalIP
    - address: <redacted>
      type: ExternalIP
    - address: ip-172-31-2-220.us-east-2.compute.internal
      type: Hostname
    - address: ip-172-31-2-220.us-east-2.compute.internal
      type: InternalDNS
    - address: <redacted>
      type: ExternalDNS
    allocatable:
      attachable-volumes-aws-ebs: "25"
      cpu: "2"
      ephemeral-storage: "20389121418"
      hugepages-1Gi: "0"
      hugepages-2Mi: "0"
      memory: 3764784Ki
      pods: "110"
    capacity:
      attachable-volumes-aws-ebs: "25"
      cpu: "2"
      ephemeral-storage: 20959212Ki
      hugepages-1Gi: "0"
      hugepages-2Mi: "0"
      memory: 3764784Ki
      pods: "110"
    conditions:
    - lastHeartbeatTime: "2022-03-11T01:02:35Z"
      lastTransitionTime: "2022-03-11T01:02:35Z"
      message: Flannel is running on this node
      reason: FlannelIsUp
      status: "False"
      type: NetworkUnavailable
    - lastHeartbeatTime: "2022-03-11T18:14:29Z"
      lastTransitionTime: "2022-03-11T01:01:34Z"
      message: kubelet has sufficient memory available
      reason: KubeletHasSufficientMemory
      status: "False"
      type: MemoryPressure
    - lastHeartbeatTime: "2022-03-11T18:14:29Z"
      lastTransitionTime: "2022-03-11T01:01:34Z"
      message: kubelet has no disk pressure
      reason: KubeletHasNoDiskPressure
      status: "False"
      type: DiskPressure
    - lastHeartbeatTime: "2022-03-11T18:14:29Z"
      lastTransitionTime: "2022-03-11T01:01:34Z"
      message: kubelet has sufficient PID available
      reason: KubeletHasSufficientPID
      status: "False"
      type: PIDPressure
    - lastHeartbeatTime: "2022-03-11T18:14:29Z"
      lastTransitionTime: "2022-03-11T01:02:37Z"
      message: kubelet is posting ready status
      reason: KubeletReady
      status: "True"
      type: Ready
    daemonEndpoints:
      kubeletEndpoint:
        Port: 10250
    images:
    - names:
      - docker.io/rancher/nginx-ingress-controller@sha256:8df436f5ca2748311468c4aa14d55f3ef2cc7811bda56c9bae6ab43dc132b80b
      - docker.io/rancher/nginx-ingress-controller:nginx-1.0.2-hardened2
      sizeBytes: 232186821
    - names:
      - docker.io/rancher/hardened-kubernetes@sha256:14288ba19b762f471a88e1d78779f7653e785032d99464bf0f5d57c0f4ceec21
      - docker.io/rancher/hardened-kubernetes:v1.23.4-rke2r1-build20220217
      sizeBytes: 223545879
    - names:
      - docker.io/rancher/hardened-calico@sha256:69fc28d2398a747fc15019e606b45bbc2ccc2d03343b0b7cefc4328d2842ddac
      - docker.io/rancher/hardened-calico:v3.21.4-build20220208
      sizeBytes: 198509698
    - names:
      - docker.io/rancher/hardened-flannel@sha256:f62122114ca136dcccd042e1149264eda4e901b61a0d956b1549afb98786c382
      - docker.io/rancher/hardened-flannel:v0.16.1-build20220119
      sizeBytes: 97290927
    - names:
      - docker.io/rancher/klipper-helm@sha256:1d31345264c7acf55e95327d0bf14262a71014dd1be31e8ab54adaf0926a385f
      - docker.io/rancher/klipper-helm:v0.6.7-build20211110
      sizeBytes: 84453872
    - names:
      - docker.io/rancher/hardened-coredns@sha256:55ed3a4871383cd9fe9d38e0a57b97135fe4369f953a52b254d1eeef36756365
      - docker.io/rancher/hardened-coredns:v1.8.5-build20211119
      sizeBytes: 50744176
    - names:
      - docker.io/rancher/hardened-k8s-metrics-server@sha256:2aeab35db572d3e6b769a0991c2d2b332c0acee2898b799ab3169ee62208bc89
      - docker.io/rancher/hardened-k8s-metrics-server:v0.5.0-build20211119
      sizeBytes: 49698028
    - names:
      - docker.io/rancher/hardened-etcd@sha256:5ce7ea0dd355d9d5f6b9d6d4c1e3453a438bf608792f2f5733e8355eafdb8da8
      - docker.io/rancher/hardened-etcd:v3.5.1-k3s1-build20220112
      sizeBytes: 49055065
    - names:
      - docker.io/rancher/hardened-cluster-autoscaler@sha256:7cc3ec1030240a8b69d1185611c1f89cf357cddae642e8cc082e1a49ebc3611d
      - docker.io/rancher/hardened-cluster-autoscaler:v1.8.5-build20211119
      sizeBytes: 43568033
    - names:
      - docker.io/rancher/mirrored-ingress-nginx-kube-webhook-certgen@sha256:52dc63ad0160c9ae201daaff7d9bc8defb0e8a529cc2cfe5baf9d8e0b198d4a8
      - docker.io/rancher/mirrored-ingress-nginx-kube-webhook-certgen:v1.0
      sizeBytes: 18592673
    - names:
      - docker.io/rancher/pause@sha256:036d575e82945c112ef84e4585caff3648322a2f9ed4c3a6ce409dd10abc4f34
      - docker.io/rancher/pause:3.6
      sizeBytes: 299396
    nodeInfo:
      architecture: amd64
      bootID: 418a9788-d554-4acd-8f68-c4583635b172
      containerRuntimeVersion: containerd://1.5.9-k3s1
      kernelVersion: 4.18.0-348.12.2.el8_5.x86_64
      kubeProxyVersion: v1.23.4+rke2r1
      kubeletVersion: v1.23.4+rke2r1
      machineID: 006336e0740647d6ab66a3143b4851e3
      operatingSystem: linux
      osImage: Red Hat Enterprise Linux 8.5 (Ootpa)
      systemUUID: ec25551c-62f6-6689-a207-1da8c049fd90
- apiVersion: v1
  kind: PersistentVolume
  metadata:
    annotations:
      kubernetes.io/createdby: aws-ebs-dynamic-provisioner
      pv.kubernetes.io/bound-by-controller: "yes"
      pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs
    creationTimestamp: "2022-03-11T04:26:01Z"
    finalizers:
    - kubernetes.io/pv-protection
    labels:
      topology.kubernetes.io/region: us-east-2
      topology.kubernetes.io/zone: us-east-2a
    name: pvc-6ecacdd4-d0bb-4a72-a076-4f2a72d9a276
    resourceVersion: "29770"
    uid: ae832f24-5be1-46e9-b71f-855a3c0cfcab
  spec:
    accessModes:
    - ReadWriteOnce
    awsElasticBlockStore:
      fsType: ext4
      volumeID: aws://us-east-2a/vol-00ff815fe43b7d99a
    capacity:
      storage: 4Gi
    claimRef:
      apiVersion: v1
      kind: PersistentVolumeClaim
      name: ebs-claim
      namespace: default
      resourceVersion: "29752"
      uid: 6ecacdd4-d0bb-4a72-a076-4f2a72d9a276
    nodeAffinity:
      required:
        nodeSelectorTerms:
        - matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: In
            values:
            - us-east-2a
          - key: topology.kubernetes.io/region
            operator: In
            values:
            - us-east-2
    persistentVolumeReclaimPolicy: Delete
    storageClassName: sctest
    volumeMode: Filesystem
  status:
    phase: Bound
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

brandond commented 2 years ago

Just out of curiosity, what happens if you add a worker node? I know that the ELB controller used to do dumb things like refusing to use endpoints that were on nodes with the master/control-plane role, so I kinda wonder if they're doing something similar for attaching the EBS volumes.

rancher-max commented 2 years ago

One of those is already a worker node:

$ k get nodes
NAME                                          STATUS   ROLES                       AGE   VERSION
ip-172-31-10-149.us-east-2.compute.internal   Ready    control-plane,etcd,master   17h   v1.23.4+rke2r1
ip-172-31-12-67.us-east-2.compute.internal    Ready    <none>                      17h   v1.23.4+rke2r1
ip-172-31-15-92.us-east-2.compute.internal    Ready    control-plane,etcd,master   17h   v1.23.4+rke2r1
ip-172-31-2-220.us-east-2.compute.internal    Ready    control-plane,etcd,master   17h   v1.23.4+rke2r1

brandond commented 2 years ago

OK, missed that, sorry.

What components did you set the FeatureGate on? I suspect it probably needs to be set on the scheduler, controller-manager, cloud-controller-manager, and kubelet.

rancher-max commented 2 years ago

jsdajpafpjajwifpw that's probably it. Just pulling from the node output: "--kube-controller-manager-arg","feature-gates=CSIMigrationAWS=false" -- I only set it on controller manager

brandond commented 2 years ago

I can't imagine the apiserver also wanting it, but you might try just adding it to all the component args just to see if that fixes it.

rancher-max commented 2 years ago

Yeah I'll add to all of them and if that fixes it I will remove one at a time until we get the minimum viable setting 👍

rancher-max commented 2 years ago

I was able to get this working by setting the feature gate on all of the possible components, placing this in the config.yaml of each server node:

kube-apiserver-arg: feature-gates=CSIMigrationAWS=false
etcd-arg: feature-gates=CSIMigrationAWS=false
kube-controller-manager-arg: feature-gates=CSIMigrationAWS=false
kube-scheduler-arg: feature-gates=CSIMigrationAWS=false
kubelet-arg: feature-gates=CSIMigrationAWS=false
kube-proxy-arg: feature-gates=CSIMigrationAWS=false

I was also able to get this to work by just setting a few of these. This was the full config.yaml on each server node:

cloud-provider-name: aws
profile: "cis-1.6"
selinux: true
kube-apiserver-arg: feature-gates=CSIMigrationAWS=false
kube-controller-manager-arg: feature-gates=CSIMigrationAWS=false
kubelet-arg: feature-gates=CSIMigrationAWS=false

And this is what I set on each agent node:

cloud-provider-name: aws
profile: "cis-1.6"
selinux: true
kubelet-arg: feature-gates=CSIMigrationAWS=false

I feel that somehow it might be safer to set it on all components, but we can probably document this minimally.

dkeightley commented 2 years ago

If it helps, I recently created an example to install the AWS CCM on RKE2, this example is for a single server node but could be reduced to join an agent node.

https://gist.github.com/dkeightley/26607d6739429a174675a81cd6fe65d6

It would be a nice UX to provide an RKE2 config option to deploy the chart and adjust the component arguments.

rust84 commented 1 year ago

Flag is locked to true as of 1.25 https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#feature-11

The cluster will fail to come up if you try to set it to false. So we are stuck on 1.24 until we get support for installing the out of tree cloud provider?

brandond commented 1 year ago

So we are stuck on 1.24 until we get support for installing the out of tree cloud provider?

The out-of-tree cloud provider should work when installed from its upstream helm chart. You should be able to do this using a HelmChart resource with the Bootstrap field set to true. I believe there are upstream docs available that cover migrating from in-tree to out-of-tree providers.

brandond commented 1 year ago

We need to document how to install this chart via a HelmChart resource. We might also evaluate whether or not it would be trivial to repackage in rke2-charts and wire up some of the existing cloud-provider logic to automatically install the chart, similar to what we do for the rancher-vsphere cloud-provider value:

https://github.com/rancher/rke2/blob/33caf61cf1fc71cca522eaeac1d3541b5f3c417c/pkg/cli/cmds/root.go#L212

theturtle32 commented 1 year ago

As a new user evaluating Rancher and K8s, this has been catching me out and I've lost hours and hours pulling out my hair, trying to understand why only the deprecated (but actually removed) in-tree Amazon cloud provider is the only option I'm presented with when spinning up a new cluster using RKE (1 or 2) on EC2. There's no mention anywhere that I will have to install the cloud provider from a helm chart, or even that that's how cloud providers are installed. I had no idea until reading these last two comments that installing an out of tree cloud provider was a relatively trivial matter of a helm chart.

But it absolutely should be something that's hand-held via the UI, and honestly, even something that's assumed you would want to do by default for a new RKE2 deploy on EC2.

rancher / rke2

Document usage of AWS cloud provider for v1.23+ #2589