kubernetes-sigs / aws-ebs-csi-driver

CSI driver for Amazon EBS https://aws.amazon.com/ebs/
Apache License 2.0
998 stars 800 forks source link

CSI driver miscounting ENIs on instance? #2064

Closed pvlkov closed 5 months ago

pvlkov commented 5 months ago

/kind bug

What happened? After repeated attachment errors of EBS-backed PVCs in our EKS cluster with the error message of AttachVolume.Attach failed for volume "pvc-xxxx" : rpc error: code = Internal desc = Could not attach volume "vol-xxxx" to node "i-xxxx": attachment of disk "vol-xxxx" failed, expected device to be attached but was attaching, we decided to investigate why the error kept occuring. Since we have a custom node template with an additional EBS volume mounted as a data volume (to /var/lib/containerd) we assumed, that the attachment limits might be wrong somehow.

We found that CSINode reports the number of allocatable volumes to be 25, which is not correct for our nodes.

# kubectl describe csinodes.storage.k8s.io
...
Name:               ip-redacted.ec2.internal
...
Spec:
  Drivers:
    ebs.csi.aws.com:
      ...
      Allocatables:
        Count:        25
      Topology Keys:  [kubernetes.io/os topology.ebs.csi.aws.com/zone topology.kubernetes.io/zone]

Since we have 2 ENIs per node + 2 EBS volumes already attached on a t3.2xlarge type instance, the reported number should be 28 - 2 (ENI) - 2(EBS) = 24.

Digging through the node driver code, we had a look at how the limits are actually calculated in getVolumeLimits() https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/v1.31.0/pkg/driver/node.go#L767

Baseline is set to 28 (since t3 is a Nitro instance type). reservedVolumeAttachments is unset in our case (-1), therefore it is calculated from instance metadata counting all volumes containing the "ebs" string in the name + 1 for the root instance. This is correct, as metadata of the instance reports

[root@ip-redacted ~]# curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/block-device-mapping
ami
ebs1

Therefore this is 2 (the same effect can be achieved by setting the cli flag for reserved-volume-attachments to 2)

Next, the number of available attachments is reduced by the number of ENIs and the number of NVME Instance store volumes for the instance type https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/v1.31.0/pkg/driver/node.go#L798 The latter is zero for t3.2xlarge. However the former seems to NOT be 2, as would be expected from instance metadata.

[root@ip-redacted ~]# curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/network/interfaces/macs
12:a7:xx:xx:xx:xx/
12:c0:xx:xx:xx:xx/[root@ip-redacted ~]#

because if it was, we would get 26 as the number for availableAttachments, which once more reduced by a reservedVolumeAttachments value of 2 should lead to a final total number of 24.

Taking a look at how the number of ENIs is calculated here https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/v1.31.0/pkg/cloud/metadata/ec2.go#L78 it seems like it just counts the number of newline characters in the response from the metadata server, which as you can see from the above output is only 1 (there is no newline at the end). Funnily enough, looking at git blame, this USED to be accounted for in earlier driver versions, but was then removed for some reason.

What you expected to happen? CSINode object reports the correct number of attachable EBS volumes

How to reproduce it (as minimally and precisely as possible)? I cannot test right now, but if my assumption is correct, CSINode should report an allocatable number which is always one more than it should be, i.e. for a normal Node with 1 EBS volume and 2 ENI interfaces it should report 26 instead of 25.

Anything else we need to know?:

Environment

AndrewSirenko commented 5 months ago

Thank you @pvlkov for diving deep on this.

I'll check in with the team, this lack of ENI + 1 does look like it could be the regression.

Will have more news later today.

pvlkov commented 5 months ago

Thank you for the quick response.

For now as a workaround we account for the missing ENI interface by manually setting the reservedVolumeAttachments to 3, i.e. one more than is actually necessary.

AndrewSirenko commented 5 months ago

This will be fixed in EBS CSI Driver v1.32.0 and we may back-port this to v1.31.1.

Thank you for root causing this issue @pvlkov!

prad9192 commented 5 days ago

Also, I am noticing that the allocatableCount does not take into account Dynamic network interfaces attached by the VPC CNI add on.

As an example we have a r6 instances, based on the docs it support 28 maximum attachments.

At node bootstrap we have 2 ENI + 1 root volume for the node, this makes the allocatable count = 25.

 apiVersion: storage.k8s.io/v1
    kind: CSINode
    metadata:
      annotations:
        storage.alpha.kubernetes.io/migrated-plugins: kubernetes.io/aws-ebs,kubernetes.io/azure-disk,kubernetes.io/azure-file,kubernetes.io/cinder,kubernetes.io/gce-pd,kubernetes.io/vsphere-volume
      creationTimestamp: "2024-10-21T09:02:47Z"
      name: xxxxx
      ownerReferences:
        - apiVersion: v1
          kind: Node
          name: xxxxx
          uid: 2da9c59d-1ac2-42bd-9e06-4ec01127153e
    spec:
      drivers:
        - allocatable:
            count: 25
          name: ebs.csi.aws.com
          nodeID: i-xxxx
          topologyKeys:
            - kubernetes.io/os
            - topology.ebs.csi.aws.com/zone
            - topology.kubernetes.io/zone

As and when workloads come up there is need for more network interfaces and this increases to > 2 ENI, but the allocatable count continues to stay at 25