Open rancher-max opened 2 years ago
It looks like enabling the feature gate works and creates the volume in AWS EBS correctly. However, when I create a pod to use that volume, it stays stuck in pending.
The only event in the pod is: 0/4 nodes are available: 4 node(s) had volume node affinity conflict.
logs in kube-scheduler show:
I0311 04:10:18.717375 1 trace.go:205] Trace[123574548]: "Scheduling" namespace:default,name:hello-7bcd65fcf-z8j2s (11-Mar-2022 04:10:18.580) (total time: 122ms):
Trace[123574548]: ---"Prioritizing done" 122ms (04:10:18.703)
Trace[123574548]: [122.627117ms] [122.627117ms] END
E0311 04:25:55.651051 1 scheduler.go:487] "Error selecting node for pod" err="running PreFilter plugin \"VolumeBinding\": error getting PVC \"default/ebs-claim\": could not find v1.PersistentVolumeClaim \"default/ebs-claim\"" pod="default/mypod"
E0311 04:25:55.671083 1 factory.go:225] "Error scheduling pod; retrying" err="running PreFilter plugin \"VolumeBinding\": error getting PVC \"default/ebs-claim\": could not find v1.PersistentVolumeClaim \"default/ebs-claim\"" pod="default/mypod"
The pvc ebs-claim does exist in the default namespace.
The pv associated with it has the correct availability zone (matches what both the ebs volume is and all the nodes):
Node Affinity:
Required Terms:
Term 0: topology.kubernetes.io/zone in [us-east-2a]
topology.kubernetes.io/region in [us-east-2]
Tried some additional things from https://stackoverflow.com/questions/51946393/kubernetes-pod-warning-1-nodes-had-volume-node-affinity-conflict which didn’t resolve this for me.
Did you create the volume manually and then try to bind it with a PVC? I usually just create a PVC for the pod and let it create the PV for itself.
Nope just the PVC. Here is what I deployed:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: sctest
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
iopsPerGB: "10"
fsType: ext4
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ebs-claim
spec:
accessModes:
- ReadWriteOnce
storageClassName: sctest
resources:
requests:
storage: 4Gi
apiVersion: "v1"
kind: "Pod"
metadata:
name: "mypod"
labels:
name: "frontendhttp"
spec:
containers:
-
name: "myfrontend"
image: openshift/hello-openshift
ports:
-
containerPort: 80
name: "http-server"
volumeMounts:
-
mountPath: "/var/www/html"
name: "pvol"
volumes:
-
name: "pvol"
persistentVolumeClaim:
claimName: "ebs-claim"
Huh, that's odd. But you can do kubectl get -n default pvc ebs-claim -o yaml
and it shows up?
If it's working, the PVC should of course exist, and there should also be a PV bound to the PVC.
Yep it's all there:
$ kubectl get -n default pvc ebs-claim -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"ebs-claim","namespace":"default"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"4Gi"}},"storageClassName":"sctest"}}
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
volume.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
creationTimestamp: "2022-03-11T04:25:55Z"
finalizers:
- kubernetes.io/pvc-protection
name: ebs-claim
namespace: default
resourceVersion: "29773"
uid: 6ecacdd4-d0bb-4a72-a076-4f2a72d9a276
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 4Gi
storageClassName: sctest
volumeMode: Filesystem
volumeName: pvc-6ecacdd4-d0bb-4a72-a076-4f2a72d9a276
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 4Gi
phase: Bound
$ k get all,sc,pvc,pv,volumeattachments
NAME READY STATUS RESTARTS AGE
pod/mypod 0/1 Pending 0 13h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/hello LoadBalancer 10.43.46.130 a689605ae260f4485ad3bd49be3c0bd0-472934356.us-east-2.elb.amazonaws.com 80:31722/TCP 13h
service/kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 16h
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
storageclass.storage.k8s.io/sctest kubernetes.io/aws-ebs Delete Immediate false 13h
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/ebs-claim Bound pvc-6ecacdd4-d0bb-4a72-a076-4f2a72d9a276 4Gi RWO sctest 13h
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-6ecacdd4-d0bb-4a72-a076-4f2a72d9a276 4Gi RWO Delete Bound default/ebs-claim sctest 13h
can you dump the nodes and PVs as yaml as well?
$ k get nodes,pv -o yaml
apiVersion: v1
items:
- apiVersion: v1
kind: Node
metadata:
annotations:
etcd.rke2.cattle.io/node-address: 172.31.10.149
etcd.rke2.cattle.io/node-name: ip-172-31-10-149.us-east-2.compute.internal-0213ded2
flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"fe:0e:58:f2:6a:71"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.31.10.149
node.alpha.kubernetes.io/ttl: "0"
projectcalico.org/IPv4Address: 172.31.10.149/20
projectcalico.org/IPv4IPIPTunnelAddr: 10.42.1.1
rke2.io/encryption-config-hash: start-5a0c83ff4a9e0af5841422818a5fd2192fe28509a2a2d90957ac5004c0d27d10
rke2.io/node-args: '["server","--write-kubeconfig-mode","0644","--tls-san","<redacted>","--server","https://<redacted>:9345","--token","********","--node-name","ip-172-31-10-149.us-east-2.compute.internal","--cloud-provider-name","aws","--profile","cis-1.6","--selinux","true","--kube-controller-manager-arg","feature-gates=CSIMigrationAWS=false"]'
rke2.io/node-config-hash: 652PDUSYNM7EFWOH6JTHSYVNDA7TS223GEUFW3OLXJFDFBHMMF7A====
rke2.io/node-env: '{"RKE2_SELINUX":"true"}'
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2022-03-11T01:09:00Z"
finalizers:
- wrangler.cattle.io/node
- wrangler.cattle.io/managed-etcd-controller
- wrangler.cattle.io/cisnetworkpolicy-node
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/instance-type: t3.medium
beta.kubernetes.io/os: linux
failure-domain.beta.kubernetes.io/region: us-east-2
failure-domain.beta.kubernetes.io/zone: us-east-2a
kubernetes.io/arch: amd64
kubernetes.io/hostname: ip-172-31-10-149.us-east-2.compute.internal
kubernetes.io/os: linux
node-role.kubernetes.io/control-plane: "true"
node-role.kubernetes.io/etcd: "true"
node-role.kubernetes.io/master: "true"
node.kubernetes.io/instance-type: t3.medium
topology.kubernetes.io/region: us-east-2
topology.kubernetes.io/zone: us-east-2a
name: ip-172-31-10-149.us-east-2.compute.internal
resourceVersion: "145219"
uid: 72718669-985b-4df6-984f-272ce1cb31ba
spec:
podCIDR: 10.42.1.0/24
podCIDRs:
- 10.42.1.0/24
providerID: aws:///us-east-2a/i-03f5829de5496d775
status:
addresses:
- address: 172.31.10.149
type: InternalIP
- address: <redacted>
type: ExternalIP
- address: ip-172-31-10-149.us-east-2.compute.internal
type: Hostname
- address: ip-172-31-10-149.us-east-2.compute.internal
type: InternalDNS
- address: <redacted>
type: ExternalDNS
allocatable:
attachable-volumes-aws-ebs: "25"
cpu: "2"
ephemeral-storage: "20389121418"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 3764784Ki
pods: "110"
capacity:
attachable-volumes-aws-ebs: "25"
cpu: "2"
ephemeral-storage: 20959212Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 3764784Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2022-03-11T01:09:45Z"
lastTransitionTime: "2022-03-11T01:09:45Z"
message: Flannel is running on this node
reason: FlannelIsUp
status: "False"
type: NetworkUnavailable
- lastHeartbeatTime: "2022-03-11T18:11:07Z"
lastTransitionTime: "2022-03-11T01:09:00Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2022-03-11T18:11:07Z"
lastTransitionTime: "2022-03-11T01:09:00Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2022-03-11T18:11:07Z"
lastTransitionTime: "2022-03-11T01:09:00Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2022-03-11T18:11:07Z"
lastTransitionTime: "2022-03-11T01:09:31Z"
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- docker.io/rancher/nginx-ingress-controller@sha256:8df436f5ca2748311468c4aa14d55f3ef2cc7811bda56c9bae6ab43dc132b80b
- docker.io/rancher/nginx-ingress-controller:nginx-1.0.2-hardened2
sizeBytes: 232186821
- names:
- docker.io/rancher/hardened-kubernetes@sha256:14288ba19b762f471a88e1d78779f7653e785032d99464bf0f5d57c0f4ceec21
- docker.io/rancher/hardened-kubernetes:v1.23.4-rke2r1-build20220217
sizeBytes: 223545879
- names:
- docker.io/rancher/hardened-calico@sha256:69fc28d2398a747fc15019e606b45bbc2ccc2d03343b0b7cefc4328d2842ddac
- docker.io/rancher/hardened-calico:v3.21.4-build20220208
sizeBytes: 198509698
- names:
- docker.io/rancher/hardened-flannel@sha256:f62122114ca136dcccd042e1149264eda4e901b61a0d956b1549afb98786c382
- docker.io/rancher/hardened-flannel:v0.16.1-build20220119
sizeBytes: 97290927
- names:
- docker.io/rancher/hardened-coredns@sha256:55ed3a4871383cd9fe9d38e0a57b97135fe4369f953a52b254d1eeef36756365
- docker.io/rancher/hardened-coredns:v1.8.5-build20211119
sizeBytes: 50744176
- names:
- docker.io/rancher/hardened-etcd@sha256:5ce7ea0dd355d9d5f6b9d6d4c1e3453a438bf608792f2f5733e8355eafdb8da8
- docker.io/rancher/hardened-etcd:v3.5.1-k3s1-build20220112
sizeBytes: 49055065
- names:
- docker.io/rancher/pause@sha256:036d575e82945c112ef84e4585caff3648322a2f9ed4c3a6ce409dd10abc4f34
- docker.io/rancher/pause:3.6
sizeBytes: 299396
nodeInfo:
architecture: amd64
bootID: da2852e8-d58c-4e18-9615-1619b678fc2a
containerRuntimeVersion: containerd://1.5.9-k3s1
kernelVersion: 4.18.0-348.12.2.el8_5.x86_64
kubeProxyVersion: v1.23.4+rke2r1
kubeletVersion: v1.23.4+rke2r1
machineID: 006336e0740647d6ab66a3143b4851e3
operatingSystem: linux
osImage: Red Hat Enterprise Linux 8.5 (Ootpa)
systemUUID: ec229f98-6bd9-6bdb-1eab-e0ec51a9d865
- apiVersion: v1
kind: Node
metadata:
annotations:
flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"0a:88:ec:0b:1f:98"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.31.12.67
node.alpha.kubernetes.io/ttl: "0"
projectcalico.org/IPv4Address: 172.31.12.67/20
projectcalico.org/IPv4IPIPTunnelAddr: 10.42.3.1
rke2.io/node-args: '["agent","--server","https://<redacted>:9345","--token","********","--node-name","ip-172-31-12-67.us-east-2.compute.internal","--cloud-provider-name","aws","--profile","cis-1.6","--selinux","true"]'
rke2.io/node-config-hash: 3J2VFTLHJYJK5LKAZ2GDA5OAPS5UCP2UVO3EVGZQRJ4TNESRO2UQ====
rke2.io/node-env: '{"RKE2_SELINUX":"true"}'
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2022-03-11T01:14:36Z"
finalizers:
- wrangler.cattle.io/node
- wrangler.cattle.io/managed-etcd-controller
- wrangler.cattle.io/cisnetworkpolicy-node
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/instance-type: t3.medium
beta.kubernetes.io/os: linux
failure-domain.beta.kubernetes.io/region: us-east-2
failure-domain.beta.kubernetes.io/zone: us-east-2a
kubernetes.io/arch: amd64
kubernetes.io/hostname: ip-172-31-12-67.us-east-2.compute.internal
kubernetes.io/os: linux
node.kubernetes.io/instance-type: t3.medium
topology.kubernetes.io/region: us-east-2
topology.kubernetes.io/zone: us-east-2a
name: ip-172-31-12-67.us-east-2.compute.internal
resourceVersion: "145488"
uid: 49fc3a6f-56cc-4dc7-95ca-c2527a6f60d6
spec:
podCIDR: 10.42.3.0/24
podCIDRs:
- 10.42.3.0/24
providerID: aws:///us-east-2a/i-022bed525a5397d10
status:
addresses:
- address: 172.31.12.67
type: InternalIP
- address: <redacted>
type: ExternalIP
- address: ip-172-31-12-67.us-east-2.compute.internal
type: Hostname
- address: ip-172-31-12-67.us-east-2.compute.internal
type: InternalDNS
- address: <redacted>
type: ExternalDNS
allocatable:
attachable-volumes-aws-ebs: "25"
cpu: "2"
ephemeral-storage: "20389121418"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 3764784Ki
pods: "110"
capacity:
attachable-volumes-aws-ebs: "25"
cpu: "2"
ephemeral-storage: 20959212Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 3764784Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2022-03-11T01:15:37Z"
lastTransitionTime: "2022-03-11T01:15:37Z"
message: Flannel is running on this node
reason: FlannelIsUp
status: "False"
type: NetworkUnavailable
- lastHeartbeatTime: "2022-03-11T18:13:03Z"
lastTransitionTime: "2022-03-11T01:14:36Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2022-03-11T18:13:03Z"
lastTransitionTime: "2022-03-11T01:14:36Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2022-03-11T18:13:03Z"
lastTransitionTime: "2022-03-11T01:14:36Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2022-03-11T18:13:03Z"
lastTransitionTime: "2022-03-11T01:15:16Z"
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- docker.io/rancher/nginx-ingress-controller@sha256:8df436f5ca2748311468c4aa14d55f3ef2cc7811bda56c9bae6ab43dc132b80b
- docker.io/rancher/nginx-ingress-controller:nginx-1.0.2-hardened2
sizeBytes: 232186821
- names:
- docker.io/rancher/hardened-kubernetes@sha256:14288ba19b762f471a88e1d78779f7653e785032d99464bf0f5d57c0f4ceec21
- docker.io/rancher/hardened-kubernetes:v1.23.4-rke2r1-build20220217
sizeBytes: 223545879
- names:
- docker.io/rancher/hardened-calico@sha256:69fc28d2398a747fc15019e606b45bbc2ccc2d03343b0b7cefc4328d2842ddac
- docker.io/rancher/hardened-calico:v3.21.4-build20220208
sizeBytes: 198509698
- names:
- docker.io/rancher/hardened-flannel@sha256:f62122114ca136dcccd042e1149264eda4e901b61a0d956b1549afb98786c382
- docker.io/rancher/hardened-flannel:v0.16.1-build20220119
sizeBytes: 97290927
- names:
- docker.io/ranchertest/mytestcontainer@sha256:7e418465981575a9abef4ee16a80c562a2d2d171e591c1475c38347ef3ec2a72
- docker.io/ranchertest/mytestcontainer:unprivileged
sizeBytes: 75437038
- names:
- docker.io/rancher/pause@sha256:036d575e82945c112ef84e4585caff3648322a2f9ed4c3a6ce409dd10abc4f34
- docker.io/rancher/pause:3.6
sizeBytes: 299396
nodeInfo:
architecture: amd64
bootID: 8220a0bb-93a1-498e-8e06-16c4d4a9b4cf
containerRuntimeVersion: containerd://1.5.9-k3s1
kernelVersion: 4.18.0-348.12.2.el8_5.x86_64
kubeProxyVersion: v1.23.4+rke2r1
kubeletVersion: v1.23.4+rke2r1
machineID: 006336e0740647d6ab66a3143b4851e3
operatingSystem: linux
osImage: Red Hat Enterprise Linux 8.5 (Ootpa)
systemUUID: ec21c076-9c44-1dd1-e2b3-b63cb6fad7d6
- apiVersion: v1
kind: Node
metadata:
annotations:
etcd.rke2.cattle.io/node-address: 172.31.15.92
etcd.rke2.cattle.io/node-name: ip-172-31-15-92.us-east-2.compute.internal-fbffe900
flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"8e:a9:9e:a6:28:43"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.31.15.92
node.alpha.kubernetes.io/ttl: "0"
projectcalico.org/IPv4Address: 172.31.15.92/20
projectcalico.org/IPv4IPIPTunnelAddr: 10.42.2.1
rke2.io/encryption-config-hash: start-5a0c83ff4a9e0af5841422818a5fd2192fe28509a2a2d90957ac5004c0d27d10
rke2.io/node-args: '["server","--write-kubeconfig-mode","0644","--tls-san","<redacted>","--server","https://<redacted>:9345","--token","********","--node-name","ip-172-31-15-92.us-east-2.compute.internal","--cloud-provider-name","aws","--profile","cis-1.6","--selinux","true","--kube-controller-manager-arg","feature-gates=CSIMigrationAWS=false"]'
rke2.io/node-config-hash: MODXA5SEKWX26NZRH5GSBOC6EJLDOTXUW2JUD6ENM7FY35E4TKEA====
rke2.io/node-env: '{"RKE2_SELINUX":"true"}'
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2022-03-11T01:09:26Z"
finalizers:
- wrangler.cattle.io/node
- wrangler.cattle.io/managed-etcd-controller
- wrangler.cattle.io/cisnetworkpolicy-node
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/instance-type: t3.medium
beta.kubernetes.io/os: linux
failure-domain.beta.kubernetes.io/region: us-east-2
failure-domain.beta.kubernetes.io/zone: us-east-2a
kubernetes.io/arch: amd64
kubernetes.io/hostname: ip-172-31-15-92.us-east-2.compute.internal
kubernetes.io/os: linux
node-role.kubernetes.io/control-plane: "true"
node-role.kubernetes.io/etcd: "true"
node-role.kubernetes.io/master: "true"
node.kubernetes.io/instance-type: t3.medium
topology.kubernetes.io/region: us-east-2
topology.kubernetes.io/zone: us-east-2a
name: ip-172-31-15-92.us-east-2.compute.internal
resourceVersion: "145446"
uid: 058da6ce-435f-4663-b60f-6be445e8758c
spec:
podCIDR: 10.42.2.0/24
podCIDRs:
- 10.42.2.0/24
providerID: aws:///us-east-2a/i-0eecd25629fe35667
status:
addresses:
- address: 172.31.15.92
type: InternalIP
- address: <redacted>
type: ExternalIP
- address: ip-172-31-15-92.us-east-2.compute.internal
type: Hostname
- address: ip-172-31-15-92.us-east-2.compute.internal
type: InternalDNS
- address: <redacted>
type: ExternalDNS
allocatable:
attachable-volumes-aws-ebs: "25"
cpu: "2"
ephemeral-storage: "20389121418"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 3764784Ki
pods: "110"
capacity:
attachable-volumes-aws-ebs: "25"
cpu: "2"
ephemeral-storage: 20959212Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 3764784Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2022-03-11T01:10:48Z"
lastTransitionTime: "2022-03-11T01:10:48Z"
message: Flannel is running on this node
reason: FlannelIsUp
status: "False"
type: NetworkUnavailable
- lastHeartbeatTime: "2022-03-11T18:12:45Z"
lastTransitionTime: "2022-03-11T01:09:26Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2022-03-11T18:12:45Z"
lastTransitionTime: "2022-03-11T01:09:26Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2022-03-11T18:12:45Z"
lastTransitionTime: "2022-03-11T01:09:26Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2022-03-11T18:12:45Z"
lastTransitionTime: "2022-03-11T01:10:37Z"
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- docker.io/rancher/nginx-ingress-controller@sha256:8df436f5ca2748311468c4aa14d55f3ef2cc7811bda56c9bae6ab43dc132b80b
- docker.io/rancher/nginx-ingress-controller:nginx-1.0.2-hardened2
sizeBytes: 232186821
- names:
- docker.io/rancher/hardened-kubernetes@sha256:14288ba19b762f471a88e1d78779f7653e785032d99464bf0f5d57c0f4ceec21
- docker.io/rancher/hardened-kubernetes:v1.23.4-rke2r1-build20220217
sizeBytes: 223545879
- names:
- docker.io/rancher/hardened-calico@sha256:69fc28d2398a747fc15019e606b45bbc2ccc2d03343b0b7cefc4328d2842ddac
- docker.io/rancher/hardened-calico:v3.21.4-build20220208
sizeBytes: 198509698
- names:
- docker.io/rancher/hardened-flannel@sha256:f62122114ca136dcccd042e1149264eda4e901b61a0d956b1549afb98786c382
- docker.io/rancher/hardened-flannel:v0.16.1-build20220119
sizeBytes: 97290927
- names:
- docker.io/rancher/hardened-etcd@sha256:5ce7ea0dd355d9d5f6b9d6d4c1e3453a438bf608792f2f5733e8355eafdb8da8
- docker.io/rancher/hardened-etcd:v3.5.1-k3s1-build20220112
sizeBytes: 49055065
- names:
- docker.io/rancher/pause@sha256:036d575e82945c112ef84e4585caff3648322a2f9ed4c3a6ce409dd10abc4f34
- docker.io/rancher/pause:3.6
sizeBytes: 299396
nodeInfo:
architecture: amd64
bootID: d9918162-9738-4385-abf4-c76325813bfd
containerRuntimeVersion: containerd://1.5.9-k3s1
kernelVersion: 4.18.0-348.12.2.el8_5.x86_64
kubeProxyVersion: v1.23.4+rke2r1
kubeletVersion: v1.23.4+rke2r1
machineID: 006336e0740647d6ab66a3143b4851e3
operatingSystem: linux
osImage: Red Hat Enterprise Linux 8.5 (Ootpa)
systemUUID: ec2470c6-8a37-bf45-9755-24592ea9ced1
- apiVersion: v1
kind: Node
metadata:
annotations:
etcd.rke2.cattle.io/node-address: 172.31.2.220
etcd.rke2.cattle.io/node-name: ip-172-31-2-220.us-east-2.compute.internal-48d3e66f
flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"ae:d4:df:82:84:05"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.31.2.220
node.alpha.kubernetes.io/ttl: "0"
projectcalico.org/IPv4Address: 172.31.2.220/20
projectcalico.org/IPv4IPIPTunnelAddr: 10.42.0.1
rke2.io/encryption-config-hash: start-5a0c83ff4a9e0af5841422818a5fd2192fe28509a2a2d90957ac5004c0d27d10
rke2.io/node-args: '["server","--write-kubeconfig-mode","0644","--tls-san","<redacted>","--node-name","ip-172-31-2-220.us-east-2.compute.internal","--cloud-provider-name","aws","--profile","cis-1.6","--selinux","true","--kube-controller-manager-arg","feature-gates=CSIMigrationAWS=false"]'
rke2.io/node-config-hash: 53EF6IUX5NCWD5UV46AVD5JN7IT7NWX5PKDKG6FDLRYKV7QRENTQ====
rke2.io/node-env: '{"RKE2_SELINUX":"true"}'
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2022-03-11T01:01:34Z"
finalizers:
- wrangler.cattle.io/node
- wrangler.cattle.io/managed-etcd-controller
- wrangler.cattle.io/cisnetworkpolicy-node
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/instance-type: t3.medium
beta.kubernetes.io/os: linux
failure-domain.beta.kubernetes.io/region: us-east-2
failure-domain.beta.kubernetes.io/zone: us-east-2a
kubernetes.io/arch: amd64
kubernetes.io/hostname: ip-172-31-2-220.us-east-2.compute.internal
kubernetes.io/os: linux
node-role.kubernetes.io/control-plane: "true"
node-role.kubernetes.io/etcd: "true"
node-role.kubernetes.io/master: "true"
node.kubernetes.io/instance-type: t3.medium
topology.kubernetes.io/region: us-east-2
topology.kubernetes.io/zone: us-east-2a
name: ip-172-31-2-220.us-east-2.compute.internal
resourceVersion: "145688"
uid: 27ea98ee-3308-4476-9b69-5eab3620579e
spec:
podCIDR: 10.42.0.0/24
podCIDRs:
- 10.42.0.0/24
providerID: aws:///us-east-2a/i-0b60ba08142a9557a
status:
addresses:
- address: 172.31.2.220
type: InternalIP
- address: <redacted>
type: ExternalIP
- address: ip-172-31-2-220.us-east-2.compute.internal
type: Hostname
- address: ip-172-31-2-220.us-east-2.compute.internal
type: InternalDNS
- address: <redacted>
type: ExternalDNS
allocatable:
attachable-volumes-aws-ebs: "25"
cpu: "2"
ephemeral-storage: "20389121418"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 3764784Ki
pods: "110"
capacity:
attachable-volumes-aws-ebs: "25"
cpu: "2"
ephemeral-storage: 20959212Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 3764784Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2022-03-11T01:02:35Z"
lastTransitionTime: "2022-03-11T01:02:35Z"
message: Flannel is running on this node
reason: FlannelIsUp
status: "False"
type: NetworkUnavailable
- lastHeartbeatTime: "2022-03-11T18:14:29Z"
lastTransitionTime: "2022-03-11T01:01:34Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2022-03-11T18:14:29Z"
lastTransitionTime: "2022-03-11T01:01:34Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2022-03-11T18:14:29Z"
lastTransitionTime: "2022-03-11T01:01:34Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2022-03-11T18:14:29Z"
lastTransitionTime: "2022-03-11T01:02:37Z"
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- docker.io/rancher/nginx-ingress-controller@sha256:8df436f5ca2748311468c4aa14d55f3ef2cc7811bda56c9bae6ab43dc132b80b
- docker.io/rancher/nginx-ingress-controller:nginx-1.0.2-hardened2
sizeBytes: 232186821
- names:
- docker.io/rancher/hardened-kubernetes@sha256:14288ba19b762f471a88e1d78779f7653e785032d99464bf0f5d57c0f4ceec21
- docker.io/rancher/hardened-kubernetes:v1.23.4-rke2r1-build20220217
sizeBytes: 223545879
- names:
- docker.io/rancher/hardened-calico@sha256:69fc28d2398a747fc15019e606b45bbc2ccc2d03343b0b7cefc4328d2842ddac
- docker.io/rancher/hardened-calico:v3.21.4-build20220208
sizeBytes: 198509698
- names:
- docker.io/rancher/hardened-flannel@sha256:f62122114ca136dcccd042e1149264eda4e901b61a0d956b1549afb98786c382
- docker.io/rancher/hardened-flannel:v0.16.1-build20220119
sizeBytes: 97290927
- names:
- docker.io/rancher/klipper-helm@sha256:1d31345264c7acf55e95327d0bf14262a71014dd1be31e8ab54adaf0926a385f
- docker.io/rancher/klipper-helm:v0.6.7-build20211110
sizeBytes: 84453872
- names:
- docker.io/rancher/hardened-coredns@sha256:55ed3a4871383cd9fe9d38e0a57b97135fe4369f953a52b254d1eeef36756365
- docker.io/rancher/hardened-coredns:v1.8.5-build20211119
sizeBytes: 50744176
- names:
- docker.io/rancher/hardened-k8s-metrics-server@sha256:2aeab35db572d3e6b769a0991c2d2b332c0acee2898b799ab3169ee62208bc89
- docker.io/rancher/hardened-k8s-metrics-server:v0.5.0-build20211119
sizeBytes: 49698028
- names:
- docker.io/rancher/hardened-etcd@sha256:5ce7ea0dd355d9d5f6b9d6d4c1e3453a438bf608792f2f5733e8355eafdb8da8
- docker.io/rancher/hardened-etcd:v3.5.1-k3s1-build20220112
sizeBytes: 49055065
- names:
- docker.io/rancher/hardened-cluster-autoscaler@sha256:7cc3ec1030240a8b69d1185611c1f89cf357cddae642e8cc082e1a49ebc3611d
- docker.io/rancher/hardened-cluster-autoscaler:v1.8.5-build20211119
sizeBytes: 43568033
- names:
- docker.io/rancher/mirrored-ingress-nginx-kube-webhook-certgen@sha256:52dc63ad0160c9ae201daaff7d9bc8defb0e8a529cc2cfe5baf9d8e0b198d4a8
- docker.io/rancher/mirrored-ingress-nginx-kube-webhook-certgen:v1.0
sizeBytes: 18592673
- names:
- docker.io/rancher/pause@sha256:036d575e82945c112ef84e4585caff3648322a2f9ed4c3a6ce409dd10abc4f34
- docker.io/rancher/pause:3.6
sizeBytes: 299396
nodeInfo:
architecture: amd64
bootID: 418a9788-d554-4acd-8f68-c4583635b172
containerRuntimeVersion: containerd://1.5.9-k3s1
kernelVersion: 4.18.0-348.12.2.el8_5.x86_64
kubeProxyVersion: v1.23.4+rke2r1
kubeletVersion: v1.23.4+rke2r1
machineID: 006336e0740647d6ab66a3143b4851e3
operatingSystem: linux
osImage: Red Hat Enterprise Linux 8.5 (Ootpa)
systemUUID: ec25551c-62f6-6689-a207-1da8c049fd90
- apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
kubernetes.io/createdby: aws-ebs-dynamic-provisioner
pv.kubernetes.io/bound-by-controller: "yes"
pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs
creationTimestamp: "2022-03-11T04:26:01Z"
finalizers:
- kubernetes.io/pv-protection
labels:
topology.kubernetes.io/region: us-east-2
topology.kubernetes.io/zone: us-east-2a
name: pvc-6ecacdd4-d0bb-4a72-a076-4f2a72d9a276
resourceVersion: "29770"
uid: ae832f24-5be1-46e9-b71f-855a3c0cfcab
spec:
accessModes:
- ReadWriteOnce
awsElasticBlockStore:
fsType: ext4
volumeID: aws://us-east-2a/vol-00ff815fe43b7d99a
capacity:
storage: 4Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: ebs-claim
namespace: default
resourceVersion: "29752"
uid: 6ecacdd4-d0bb-4a72-a076-4f2a72d9a276
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- us-east-2a
- key: topology.kubernetes.io/region
operator: In
values:
- us-east-2
persistentVolumeReclaimPolicy: Delete
storageClassName: sctest
volumeMode: Filesystem
status:
phase: Bound
kind: List
metadata:
resourceVersion: ""
selfLink: ""
Just out of curiosity, what happens if you add a worker node? I know that the ELB controller used to do dumb things like refusing to use endpoints that were on nodes with the master/control-plane role, so I kinda wonder if they're doing something similar for attaching the EBS volumes.
One of those is already a worker node:
$ k get nodes
NAME STATUS ROLES AGE VERSION
ip-172-31-10-149.us-east-2.compute.internal Ready control-plane,etcd,master 17h v1.23.4+rke2r1
ip-172-31-12-67.us-east-2.compute.internal Ready <none> 17h v1.23.4+rke2r1
ip-172-31-15-92.us-east-2.compute.internal Ready control-plane,etcd,master 17h v1.23.4+rke2r1
ip-172-31-2-220.us-east-2.compute.internal Ready control-plane,etcd,master 17h v1.23.4+rke2r1
OK, missed that, sorry.
What components did you set the FeatureGate on? I suspect it probably needs to be set on the scheduler, controller-manager, cloud-controller-manager, and kubelet.
jsdajpafpjajwifpw that's probably it. Just pulling from the node output: "--kube-controller-manager-arg","feature-gates=CSIMigrationAWS=false"
-- I only set it on controller manager
I can't imagine the apiserver also wanting it, but you might try just adding it to all the component args just to see if that fixes it.
Yeah I'll add to all of them and if that fixes it I will remove one at a time until we get the minimum viable setting 👍
I was able to get this working by setting the feature gate on all of the possible components, placing this in the config.yaml of each server node:
kube-apiserver-arg: feature-gates=CSIMigrationAWS=false
etcd-arg: feature-gates=CSIMigrationAWS=false
kube-controller-manager-arg: feature-gates=CSIMigrationAWS=false
kube-scheduler-arg: feature-gates=CSIMigrationAWS=false
kubelet-arg: feature-gates=CSIMigrationAWS=false
kube-proxy-arg: feature-gates=CSIMigrationAWS=false
I was also able to get this to work by just setting a few of these. This was the full config.yaml on each server node:
cloud-provider-name: aws
profile: "cis-1.6"
selinux: true
kube-apiserver-arg: feature-gates=CSIMigrationAWS=false
kube-controller-manager-arg: feature-gates=CSIMigrationAWS=false
kubelet-arg: feature-gates=CSIMigrationAWS=false
And this is what I set on each agent node:
cloud-provider-name: aws
profile: "cis-1.6"
selinux: true
kubelet-arg: feature-gates=CSIMigrationAWS=false
I feel that somehow it might be safer to set it on all components, but we can probably document this minimally.
If it helps, I recently created an example to install the AWS CCM on RKE2, this example is for a single server node but could be reduced to join an agent node.
It would be a nice UX to provide an RKE2 config option to deploy the chart and adjust the component arguments.
Flag is locked to true as of 1.25 https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#feature-11
The cluster will fail to come up if you try to set it to false. So we are stuck on 1.24 until we get support for installing the out of tree cloud provider?
So we are stuck on 1.24 until we get support for installing the out of tree cloud provider?
The out-of-tree cloud provider should work when installed from its upstream helm chart. You should be able to do this using a HelmChart resource with the Bootstrap field set to true. I believe there are upstream docs available that cover migrating from in-tree to out-of-tree providers.
We need to document how to install this chart via a HelmChart resource. We might also evaluate whether or not it would be trivial to repackage in rke2-charts and wire up some of the existing cloud-provider logic to automatically install the chart, similar to what we do for the rancher-vsphere cloud-provider value:
As a new user evaluating Rancher and K8s, this has been catching me out and I've lost hours and hours pulling out my hair, trying to understand why only the deprecated (but actually removed) in-tree Amazon cloud provider is the only option I'm presented with when spinning up a new cluster using RKE (1 or 2) on EC2. There's no mention anywhere that I will have to install the cloud provider from a helm chart, or even that that's how cloud providers are installed. I had no idea until reading these last two comments that installing an out of tree cloud provider was a relatively trivial matter of a helm chart.
But it absolutely should be something that's hand-held via the UI, and honestly, even something that's assumed you would want to do by default for a new RKE2 deploy on EC2.
Users should either begin migrating to out-of-tree aws cloud provider or set CSIMigrationAWS feature-gate to false. More testing is needed to determine the full impact, but currently if a user brings up a fresh cluster using v1.23.4+rke2r1 and sets config option
cloud-provider-name: aws
, then by default it will not deploy the EBS CSI provisioner, so the in-tree cloud provider will not provision EBS volumes.It is unclear currently what happens on an upgrade to v1.23; more testing will give us this information and allow us to make recommendations.
As in-tree cloud providers are being removed upstream in v1.24, now might be the time to encourage users to start migrating before even upgrading to v1.23.