linode / linode-blockstorage-csi-driver

Container Storage Interface (CSI) Driver for Linode Block Storage
Apache License 2.0
64 stars 54 forks source link

pkg/linode-bs: Do not persist volume attachments across boots #167

Closed nesv closed 3 months ago

nesv commented 4 months ago

When attaching a block storage volume to a Linode, by default, the Linode API will attempt to persist the attachment across instance reboots by adding the volume ID to the currently-booted instance configuration. This will limit the number of volumes that can be attached to 8, as instance configurations currently only allow attaching local disks and block storage volumes to sda..sdh.

This change explicitly disables persisting volume attachments across instance boots, when a block storage volume is attached to a running Linode instance. It also adds a pre-attachment check to see if another volume can be attached based on the amount of memory of the host.

The maximum number of volumes that can be attached to a single Linode instance is clamped between 8 and 64, with the allowed number of volume attachments matching the amount of memory allocated to the Linode, in gigabytes. In other words, Linodes with <= 8GiB of RAM can have a maximum of 8 volumes attached; Linodes with 16GiB of RAM can have 16 volumes attached; Linodes with > 64GiB of RAM are limited to 64 volume attachments.

General:

Pull Request Guidelines:

  1. [x] Does your submission pass tests?
  2. [x] Have you added tests?
  3. [x] Are you addressing a single feature in this PR?
  4. [x] Are your commits atomic, addressing one change per commit?
  5. [x] Are you following the conventions of the language?
  6. [x] Have you saved your large formatting changes for a different PR, so we can focus on your work?
  7. [x] Have you explained your rationale for why this feature is needed?
  8. [ ] Have you linked your PR to an open issue
nesv commented 4 months ago

Note for reviewers

I will squash out the fixup commits prior to merge.

nesv commented 4 months ago

Changes LGTM - I think an e2e test checking a Linode that can attach >8 vols would be great both to serve as validation of the idea and as a smoke test of vbin

@avestuk Absolutely. I'll get that added.

nesv commented 3 months ago

Changes LGTM - I think an e2e test checking a Linode that can attach >8 vols would be great both to serve as validation of the idea and as a smoke test of vbin

@avestuk Absolutely. I'll get that added.

@avestuk If you don't mind, I would like to punt on adding e2e tests for this for the time being. I do have some command-line output to offer, though:

Listing pods on a g6-standard-20

NAME             READY   STATUS              RESTARTS   AGE
redis-large-0    1/1     Running             0          102m
redis-large-1    1/1     Running             0          102m
redis-large-10   1/1     Running             0          98m
redis-large-11   1/1     Running             0          98m
redis-large-12   1/1     Running             0          98m
redis-large-13   1/1     Running             0          97m
redis-large-14   1/1     Running             0          97m
redis-large-15   1/1     Running             0          97m
redis-large-16   1/1     Running             0          19m
redis-large-17   1/1     Running             0          18m
redis-large-18   1/1     Running             0          18m
redis-large-19   1/1     Running             0          18m
redis-large-2    1/1     Running             0          101m
redis-large-20   1/1     Running             0          17m
redis-large-21   1/1     Running             0          17m
redis-large-22   1/1     Running             0          17m
redis-large-23   1/1     Running             0          17m
redis-large-24   1/1     Running             0          16m
redis-large-25   1/1     Running             0          16m
redis-large-26   1/1     Running             0          15m
redis-large-27   1/1     Running             0          15m
redis-large-28   1/1     Running             0          15m
redis-large-29   1/1     Running             0          14m
redis-large-3    1/1     Running             0          101m
redis-large-30   1/1     Running             0          14m
redis-large-31   1/1     Running             0          14m
redis-large-32   1/1     Running             0          13m
redis-large-33   1/1     Running             0          13m
redis-large-34   1/1     Running             0          13m
redis-large-35   1/1     Running             0          13m
redis-large-36   1/1     Running             0          12m
redis-large-37   1/1     Running             0          12m
redis-large-38   1/1     Running             0          12m
redis-large-39   1/1     Running             0          11m
redis-large-4    1/1     Running             0          101m
redis-large-40   1/1     Running             0          11m
redis-large-41   1/1     Running             0          10m
redis-large-42   1/1     Running             0          10m
redis-large-43   1/1     Running             0          9m52s
redis-large-44   1/1     Running             0          9m36s
redis-large-45   1/1     Running             0          9m18s
redis-large-46   1/1     Running             0          8m51s
redis-large-47   1/1     Running             0          8m24s
redis-large-48   1/1     Running             0          7m59s
redis-large-49   1/1     Running             0          7m33s
redis-large-5    1/1     Running             0          100m
redis-large-50   1/1     Running             0          6m46s
redis-large-51   1/1     Running             0          6m22s
redis-large-52   1/1     Running             0          6m4s
redis-large-53   1/1     Running             0          5m45s
redis-large-54   1/1     Running             0          5m19s
redis-large-55   1/1     Running             0          4m55s
redis-large-56   1/1     Running             0          4m28s
redis-large-57   1/1     Running             0          4m11s
redis-large-58   1/1     Running             0          3m53s
redis-large-59   1/1     Running             0          3m37s
redis-large-6    1/1     Running             0          100m
redis-large-60   1/1     Running             0          3m19s
redis-large-61   1/1     Running             0          3m1s
redis-large-62   1/1     Running             0          2m37s
redis-large-63   0/1     ContainerCreating   0          2m18s
redis-large-7    1/1     Running             0          99m
redis-large-8    1/1     Running             0          99m
redis-large-9    1/1     Running             0          99m

All of these pods were given a required node affinity to target the g6-standard-20 instance.

That pesky redis-large-63 pod

...it's in ContainerCreating, and eventually moved into the Pending phase. The reason being the instance already had 64 volumes attached (1 local instance disk + 63 block storage volumes). Output has been snipped for brevity:

Name:             redis-large-63
---8<---
Status:           Pending
---8<---
Events:
  Type     Reason              Age                   From                     Message
  ----     ------              ----                  ----                     -------
---8<---
  Warning  FailedAttachVolume  72s (x10 over 5m27s)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-138c422977ad49b0" : rpc error: code = ResourceExhausted desc = max number of volumes (64) already attached to instance

The PVCs

For giggles, here is the list of persistent volume claims:

NAME                  STATUS   VOLUME                 CAPACITY   ACCESS MODES   STORAGECLASS           VOLUMEATTRIBUTESCLASS   AGE
data-redis-large-0    Bound    pvc-078f3de65c844ec6   10Gi       RWOP           linode-block-storage   <unset>                 105m
data-redis-large-1    Bound    pvc-f7d359089d5f452f   10Gi       RWOP           linode-block-storage   <unset>                 104m
data-redis-large-10   Bound    pvc-2fde1037cf1b449e   10Gi       RWOP           linode-block-storage   <unset>                 101m
data-redis-large-11   Bound    pvc-94b8e52f868a4988   10Gi       RWOP           linode-block-storage   <unset>                 100m
data-redis-large-12   Bound    pvc-ac81815d4e894c5a   10Gi       RWOP           linode-block-storage   <unset>                 100m
data-redis-large-13   Bound    pvc-2761d76374f04036   10Gi       RWOP           linode-block-storage   <unset>                 100m
data-redis-large-14   Bound    pvc-05c057cc5c744464   10Gi       RWOP           linode-block-storage   <unset>                 100m
data-redis-large-15   Bound    pvc-8065729a3ee04c4c   10Gi       RWOP           linode-block-storage   <unset>                 99m
data-redis-large-16   Bound    pvc-f6a5be3a597d411f   10Gi       RWOP           linode-block-storage   <unset>                 21m
data-redis-large-17   Bound    pvc-5c2db348c79f4888   10Gi       RWOP           linode-block-storage   <unset>                 21m
data-redis-large-18   Bound    pvc-9c03f60feeb94d8f   10Gi       RWOP           linode-block-storage   <unset>                 20m
data-redis-large-19   Bound    pvc-96eee3e9020d4eb7   10Gi       RWOP           linode-block-storage   <unset>                 20m
data-redis-large-2    Bound    pvc-5b2b9cae59cc4661   10Gi       RWOP           linode-block-storage   <unset>                 104m
data-redis-large-20   Bound    pvc-cad2d86cab284d0a   10Gi       RWOP           linode-block-storage   <unset>                 20m
data-redis-large-21   Bound    pvc-e388731288bb4ba8   10Gi       RWOP           linode-block-storage   <unset>                 20m
data-redis-large-22   Bound    pvc-14babb14ac4e4658   10Gi       RWOP           linode-block-storage   <unset>                 19m
data-redis-large-23   Bound    pvc-40f39b94b4ac4716   10Gi       RWOP           linode-block-storage   <unset>                 19m
data-redis-large-24   Bound    pvc-8350e85412ba405e   10Gi       RWOP           linode-block-storage   <unset>                 19m
data-redis-large-25   Bound    pvc-e79f43cf09ce47c1   10Gi       RWOP           linode-block-storage   <unset>                 18m
data-redis-large-26   Bound    pvc-287905878d044852   10Gi       RWOP           linode-block-storage   <unset>                 18m
data-redis-large-27   Bound    pvc-19ec7066e10c431d   10Gi       RWOP           linode-block-storage   <unset>                 17m
data-redis-large-28   Bound    pvc-2f151e4fb0c04aaa   10Gi       RWOP           linode-block-storage   <unset>                 17m
data-redis-large-29   Bound    pvc-8318cb50fc024d86   10Gi       RWOP           linode-block-storage   <unset>                 17m
data-redis-large-3    Bound    pvc-9ac1fd7616974532   10Gi       RWOP           linode-block-storage   <unset>                 103m
data-redis-large-30   Bound    pvc-5b8738645d6c4841   10Gi       RWOP           linode-block-storage   <unset>                 16m
data-redis-large-31   Bound    pvc-4f1726d3eecf4a28   10Gi       RWOP           linode-block-storage   <unset>                 16m
data-redis-large-32   Bound    pvc-1588ae019c704147   10Gi       RWOP           linode-block-storage   <unset>                 16m
data-redis-large-33   Bound    pvc-b653c9d3a28745d1   10Gi       RWOP           linode-block-storage   <unset>                 16m
data-redis-large-34   Bound    pvc-5b1992637fc441ab   10Gi       RWOP           linode-block-storage   <unset>                 15m
data-redis-large-35   Bound    pvc-65a7095027fa4061   10Gi       RWOP           linode-block-storage   <unset>                 15m
data-redis-large-36   Bound    pvc-6bd57aa29ca04b3a   10Gi       RWOP           linode-block-storage   <unset>                 15m
data-redis-large-37   Bound    pvc-66ed6de400094a27   10Gi       RWOP           linode-block-storage   <unset>                 14m
data-redis-large-38   Bound    pvc-ba561d6c4e044ad8   10Gi       RWOP           linode-block-storage   <unset>                 14m
data-redis-large-39   Bound    pvc-e13bf84d947a4eac   10Gi       RWOP           linode-block-storage   <unset>                 14m
data-redis-large-4    Bound    pvc-77696d62944a4919   10Gi       RWOP           linode-block-storage   <unset>                 103m
data-redis-large-40   Bound    pvc-3ad31fd2574a400b   10Gi       RWOP           linode-block-storage   <unset>                 13m
data-redis-large-41   Bound    pvc-477b6f1f45c245ea   10Gi       RWOP           linode-block-storage   <unset>                 13m
data-redis-large-42   Bound    pvc-388d40861ad540fb   10Gi       RWOP           linode-block-storage   <unset>                 12m
data-redis-large-43   Bound    pvc-a8414d126ec34c4c   10Gi       RWOP           linode-block-storage   <unset>                 12m
data-redis-large-44   Bound    pvc-f82dd026e5ce45ff   10Gi       RWOP           linode-block-storage   <unset>                 12m
data-redis-large-45   Bound    pvc-823d31dcda544738   10Gi       RWOP           linode-block-storage   <unset>                 11m
data-redis-large-46   Bound    pvc-cb5d04c47e1c4b93   10Gi       RWOP           linode-block-storage   <unset>                 11m
data-redis-large-47   Bound    pvc-2011b60e730d40a3   10Gi       RWOP           linode-block-storage   <unset>                 10m
data-redis-large-48   Bound    pvc-d2efc7dd404d4774   10Gi       RWOP           linode-block-storage   <unset>                 10m
data-redis-large-49   Bound    pvc-9f0595bfdd254548   10Gi       RWOP           linode-block-storage   <unset>                 9m59s
data-redis-large-5    Bound    pvc-aca894247a6c46fe   10Gi       RWOP           linode-block-storage   <unset>                 103m
data-redis-large-50   Bound    pvc-60c1d4dd56104c9c   10Gi       RWOP           linode-block-storage   <unset>                 9m12s
data-redis-large-51   Bound    pvc-550e86e592d340bc   10Gi       RWOP           linode-block-storage   <unset>                 8m48s
data-redis-large-52   Bound    pvc-9c89f1792e0a413e   10Gi       RWOP           linode-block-storage   <unset>                 8m30s
data-redis-large-53   Bound    pvc-f003f04fcce440ad   10Gi       RWOP           linode-block-storage   <unset>                 8m11s
data-redis-large-54   Bound    pvc-f8295eb7a877428f   10Gi       RWOP           linode-block-storage   <unset>                 7m45s
data-redis-large-55   Bound    pvc-6a85c2f3da404e60   10Gi       RWOP           linode-block-storage   <unset>                 7m21s
data-redis-large-56   Bound    pvc-0ef25c6db1b045dd   10Gi       RWOP           linode-block-storage   <unset>                 6m54s
data-redis-large-57   Bound    pvc-640fc8c733e74ad4   10Gi       RWOP           linode-block-storage   <unset>                 6m37s
data-redis-large-58   Bound    pvc-ce6be776f5f04643   10Gi       RWOP           linode-block-storage   <unset>                 6m19s
data-redis-large-59   Bound    pvc-38ca8d0e0f8d44d3   10Gi       RWOP           linode-block-storage   <unset>                 6m3s
data-redis-large-6    Bound    pvc-20dcf0cdd90a44de   10Gi       RWOP           linode-block-storage   <unset>                 102m
data-redis-large-60   Bound    pvc-4771fb385d254a22   10Gi       RWOP           linode-block-storage   <unset>                 5m45s
data-redis-large-61   Bound    pvc-62f1a918aede4eba   10Gi       RWOP           linode-block-storage   <unset>                 5m27s
data-redis-large-62   Bound    pvc-33d40b5b54ca4aa6   10Gi       RWOP           linode-block-storage   <unset>                 5m3s
data-redis-large-63   Bound    pvc-138c422977ad49b0   10Gi       RWOP           linode-block-storage   <unset>                 4m44s
data-redis-large-7    Bound    pvc-0f87785faf804e94   10Gi       RWOP           linode-block-storage   <unset>                 102m
data-redis-large-8    Bound    pvc-6d95a53e43944776   10Gi       RWOP           linode-block-storage   <unset>                 102m
data-redis-large-9    Bound    pvc-7fa0cbb89c8a4758   10Gi       RWOP           linode-block-storage   <unset>                 101m

The block storage volume backing data-redis-large-63 does get created, and bound, but as stated in the above snippet, the pod will remain pending since the CSI driver will not attach a 65th disk to the instance.

nesv commented 3 months ago

The ci (stable) job is failing with this message:

go: updates to go.mod needed; to update it:
    go mod tidy

...but this pull request already includes and updated and tidied go.mod and go.lock.

nesv commented 3 months ago

Commits have been squashed, and rebased onto main.

srust commented 3 months ago

Out of curiosity can you reboot a node with attached volumes and have the pods recover when the nodes come back, and see the CSI driver re-attaching the volumes?

avestuk commented 3 months ago

@nesv Sure e2e tests can be handled in another PR. I do agree with @srust that doing a node reboot and seeing reattach would be excellent reassurance.

nesv commented 3 months ago

@srust @avestuk Here is a starting listing of pods and persistent volumes:

NAME                READY   STATUS    RESTARTS   AGE   IP          NODE                            NOMINATED NODE   READINESS GATES
pod/redis-small-0   1/1     Running   0          9h    10.2.1.11   lke180282-261343-185784930000   <none>           <none>
pod/redis-small-1   1/1     Running   0          9h    10.2.1.5    lke180282-261343-185784930000   <none>           <none>
pod/redis-small-2   1/1     Running   0          9h    10.2.1.12   lke180282-261343-185784930000   <none>           <none>
pod/redis-small-3   1/1     Running   0          9h    10.2.1.6    lke180282-261343-185784930000   <none>           <none>
pod/redis-small-4   1/1     Running   0          9h    10.2.1.13   lke180282-261343-185784930000   <none>           <none>
pod/redis-small-5   1/1     Running   0          9h    10.2.1.14   lke180282-261343-185784930000   <none>           <none>
pod/redis-small-6   1/1     Running   0          9h    10.2.1.7    lke180282-261343-185784930000   <none>           <none>

NAME                                    CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                        STORAGECLASS           VOLUMEATTRIBUTESCLASS   REASON   AGE   VOLUMEMODE
persistentvolume/pvc-2b2672e9536c4928   10Gi       RWOP           Delete           Bound    default/data-redis-small-4   linode-block-storage   <unset>                          21h   Filesystem
persistentvolume/pvc-427890f51c4048bd   10Gi       RWOP           Delete           Bound    default/data-redis-small-2   linode-block-storage   <unset>                          21h   Filesystem
persistentvolume/pvc-44daff53e8fb4b4c   10Gi       RWOP           Delete           Bound    default/data-redis-small-5   linode-block-storage   <unset>                          21h   Filesystem
persistentvolume/pvc-6c848b06fa1e4f86   10Gi       RWOP           Delete           Bound    default/data-redis-small-1   linode-block-storage   <unset>                          21h   Filesystem
persistentvolume/pvc-a1ef3459cfcc40ea   10Gi       RWOP           Delete           Bound    default/data-redis-small-0   linode-block-storage   <unset>                          21h   Filesystem
persistentvolume/pvc-b66551a6dc204904   10Gi       RWOP           Delete           Bound    default/data-redis-small-3   linode-block-storage   <unset>                          21h   Filesystem
persistentvolume/pvc-fb4acbdb06d34ac7   10Gi       RWOP           Delete           Bound    default/data-redis-small-6   linode-block-storage   <unset>                          21h   Filesystem

After recycling the node, then deleting it to remove it from the worker pool:

NAME                READY   STATUS    RESTARTS   AGE     IP           NODE                            NOMINATED NODE   READINESS GATES
pod/redis-small-0   1/1     Running   0          3m50s   10.2.0.133   lke180282-261343-415684180000   <none>           <none>
pod/redis-small-1   1/1     Running   0          2m12s   10.2.0.134   lke180282-261343-415684180000   <none>           <none>
pod/redis-small-2   1/1     Running   0          2m4s    10.2.0.135   lke180282-261343-415684180000   <none>           <none>
pod/redis-small-3   1/1     Running   0          112s    10.2.0.136   lke180282-261343-415684180000   <none>           <none>
pod/redis-small-4   1/1     Running   0          97s     10.2.0.137   lke180282-261343-415684180000   <none>           <none>
pod/redis-small-5   1/1     Running   0          88s     10.2.0.138   lke180282-261343-415684180000   <none>           <none>
pod/redis-small-6   1/1     Running   0          78s     10.2.0.139   lke180282-261343-415684180000   <none>           <none>

NAME                                    CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                        STORAGECLASS           VOLUMEATTRIBUTESCLASS   REASON   AGE   VOLUMEMODE
persistentvolume/pvc-2b2672e9536c4928   10Gi       RWOP           Delete           Bound    default/data-redis-small-4   linode-block-storage   <unset>                          21h   Filesystem
persistentvolume/pvc-427890f51c4048bd   10Gi       RWOP           Delete           Bound    default/data-redis-small-2   linode-block-storage   <unset>                          21h   Filesystem
persistentvolume/pvc-44daff53e8fb4b4c   10Gi       RWOP           Delete           Bound    default/data-redis-small-5   linode-block-storage   <unset>                          21h   Filesystem
persistentvolume/pvc-6c848b06fa1e4f86   10Gi       RWOP           Delete           Bound    default/data-redis-small-1   linode-block-storage   <unset>                          21h   Filesystem
persistentvolume/pvc-a1ef3459cfcc40ea   10Gi       RWOP           Delete           Bound    default/data-redis-small-0   linode-block-storage   <unset>                          21h   Filesystem
persistentvolume/pvc-b66551a6dc204904   10Gi       RWOP           Delete           Bound    default/data-redis-small-3   linode-block-storage   <unset>                          21h   Filesystem
persistentvolume/pvc-fb4acbdb06d34ac7   10Gi       RWOP           Delete           Bound    default/data-redis-small-6   linode-block-storage   <unset>                          21h   Filesystem

And the events that came in through Cloud Manager:

Screenshot from 2024-05-24 09-47-16

nesv commented 3 months ago

I would like to rebase this pull request's branch onto main after #168 is merged, in order to fix the CI failures.

nesv commented 3 months ago

Rebased onto main to pull in the commits from #168.

nesv commented 3 months ago

I was going down a rabbit hole trying to clean some things up alongside the functional changes in this branch. I have backed most of those changes out, and kept the changes in this branch more tightly-scoped to the task at-hand.

Unfortunately, one of the things that cannot be done in this branch, is go.mod cannot be modified. I also realized that I was pulling in a whole new version of k8s.io/utils just for the k8s.io/utils/ptr.To function, which is complete overkill when all I needed was a pointer to a boolean value.