kubernetes-csi / external-resizer

Sidecar container that watches Kubernetes PersistentVolumeClaims objects and triggers controller side expansion operation against a CSI endpoint
Apache License 2.0
125 stars 126 forks source link

Add Modify Volume Support #351

Closed sunnylovestiramisu closed 8 months ago

sunnylovestiramisu commented 11 months ago

/kind feature

What this PR does / why we need it: Implement logic of this diagram

Which issue(s) this PR fixes:

Fixes https://github.com/kubernetes-csi/external-resizer/issues/314

Special notes for your reviewer:

Testing Steps

Positive Test Cases

  1. Using kubetest at with k8s 1.29 change including VAC
    KUBE_FEATURE_GATES="VolumeAttributesClass=true" kubetest --up --runtime-config=api/all=true
  2. Build the container image of external-resizer with logs
  3. Set up csi-driver-host-path with the resizer image built in step 2 as in this branch
  4. Follow deploy-1.17-and-later.md runbook to deploy hostpath driver in the cluster
  5. Create csi-pvc, csi-storageclass, csi-volumeattributesclass, and csi-app
    // csi-volumeattributesclass.yaml
    apiVersion: storage.k8s.io/v1alpha1
    kind: VolumeAttributesClass
    metadata:
    name: silver
    driverName: hostpath.csi.k8s.io
    parameters:
    provisioned-iops: "3000"
  6. Modify the pvc with kubectl edit pvc csi-pvc and add volumeAttributesClassName: silver to the pvc
  7. Verify modify volume is successful
csi-driver-host-path git:(testModifyVolume) ✗ k describe pvc
Name:          csi-pvc
Namespace:     default
StorageClass:  csi-hostpath-sc
Status:        Bound
Volume:        pvc-561aca0d-a67b-4cb1-81c3-791b5c654423
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: hostpath.csi.k8s.io
               volume.kubernetes.io/storage-provisioner: hostpath.csi.k8s.io
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       my-csi-app
Events:
  Type    Reason                  Age                From                                                                           Message
  ----    ------                  ----               ----                                                                           -------
  Normal  ExternalProvisioning    49s (x2 over 49s)  persistentvolume-controller                                                    Waiting for a volume to be created either by the external provisioner 'hostpath.csi.k8s.io' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
  Normal  Provisioning            49s                hostpath.csi.k8s.io_csi-hostpathplugin-0_a9a3cc54-1f3f-4f2c-9304-0cfe731945b6  External provisioner is provisioning volume for claim "default/csi-pvc"
  Normal  ProvisioningSucceeded   49s                hostpath.csi.k8s.io_csi-hostpathplugin-0_a9a3cc54-1f3f-4f2c-9304-0cfe731945b6  Successfully provisioned volume pvc-561aca0d-a67b-4cb1-81c3-791b5c654423
  Normal  VolumeModify            25s                external-resizer hostpath.csi.k8s.io                                           external resizer is modifying volume csi-pvc
  Normal  VolumeModifySuccessful  25s                external-resizer hostpath.csi.k8s.io                                           external resizer modified volume csi-pvc successfully
  1. Create a new VAC named gold and apply
    apiVersion: storage.k8s.io/v1alpha1
    kind: VolumeAttributesClass
    metadata:
    name: gold
    driverName: hostpath.csi.k8s.io
    parameters:
    provisioned-iops: "8000"
  2. Modify the pvc with kubectl edit pvc csi-pvc and add volumeAttributesClassName: gold to the pvc
  3. Verify it is successful:
csi-driver-host-path git:(testModifyVolume) ✗ k describe pvc                                         
Name:          csi-pvc
Namespace:     default
StorageClass:  csi-hostpath-sc
Status:        Bound
Volume:        pvc-561aca0d-a67b-4cb1-81c3-791b5c654423
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: hostpath.csi.k8s.io
               volume.kubernetes.io/storage-provisioner: hostpath.csi.k8s.io
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       my-csi-app
Events:
  Type    Reason                  Age                  From                                                                           Message
  ----    ------                  ----                 ----                                                                           -------
  Normal  ExternalProvisioning    42m (x2 over 42m)    persistentvolume-controller                                                    Waiting for a volume to be created either by the external provisioner 'hostpath.csi.k8s.io' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
  Normal  Provisioning            42m                  hostpath.csi.k8s.io_csi-hostpathplugin-0_a9a3cc54-1f3f-4f2c-9304-0cfe731945b6  External provisioner is provisioning volume for claim "default/csi-pvc"
  Normal  ProvisioningSucceeded   42m                  hostpath.csi.k8s.io_csi-hostpathplugin-0_a9a3cc54-1f3f-4f2c-9304-0cfe731945b6  Successfully provisioned volume pvc-561aca0d-a67b-4cb1-81c3-791b5c654423
  Normal  VolumeModify            3m42s (x2 over 42m)  external-resizer hostpath.csi.k8s.io                                           external resizer is modifying volume csi-pvc
  Normal  VolumeModifySuccessful  3m42s (x2 over 42m)  external-resizer hostpath.csi.k8s.io                                           external resizer modified volume csi-pvc successfully

csi-driver-host-path git:(testModifyVolume) ✗ k get pvc
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      VOLUMEATTRIBUTESCLASS   AGE
csi-pvc   Bound    pvc-561aca0d-a67b-4cb1-81c3-791b5c654423   1Gi        RWO            csi-hostpath-sc   gold                    42m

Negative Test Cases

  1. The feature gate is enabled in k8s and enable-controller-modify-volume in hostpath driver is set to false, I got the error case:
csi-driver-host-path git:(testModifyVolume) ✗ k describe pvc
Name:          csi-pvc
Namespace:     default
StorageClass:  csi-hostpath-sc
Status:        Bound
Volume:        pvc-4bb1d1e5-f3d2-47ee-af83-05ee0430de61
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: hostpath.csi.k8s.io
               volume.kubernetes.io/storage-provisioner: hostpath.csi.k8s.io
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       my-csi-app
Conditions:
  Type                Status  LastProbeTime                     LastTransitionTime                Reason  Message
  ----                ------  -----------------                 ------------------                ------  -------
  ModifyVolumeError   True    Mon, 01 Jan 0001 00:00:00 +0000   Wed, 24 Jan 2024 00:00:17 +0000           ModifyVolume failed with error: rpc error: code = Unimplemented desc = unknown method ControllerModifyVolume for service csi.v1.Controller. Waiting for retry.
Events:
  Type     Reason                 Age                    From                                                                           Message
  ----     ------                 ----                   ----                                                                           -------
  Normal   Provisioning           7m13s                  hostpath.csi.k8s.io_csi-hostpathplugin-0_cd7172db-b406-43bf-8278-c5740f369279  External provisioner is provisioning volume for claim "default/csi-pvc"
  Normal   ExternalProvisioning   7m13s (x2 over 7m13s)  persistentvolume-controller                                                    Waiting for a volume to be created either by the external provisioner 'hostpath.csi.k8s.io' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
  Normal   ProvisioningSucceeded  7m13s                  hostpath.csi.k8s.io_csi-hostpathplugin-0_cd7172db-b406-43bf-8278-c5740f369279  Successfully provisioned volume pvc-4bb1d1e5-f3d2-47ee-af83-05ee0430de61
  Normal   VolumeModify           2m27s (x9 over 6m42s)  external-resizer hostpath.csi.k8s.io                                           external resizer is modifying volume csi-pvc
  Warning  VolumeModifyFailed     2m27s (x9 over 6m42s)  external-resizer hostpath.csi.k8s.io                                           rpc error: code = Unimplemented desc = unknown method ControllerModifyVolume
  1. The feature gate is disabled in k8s and external-resizer(remove feature gate setting in this commit), starting the csi-driver-host-path

Verify in the log via k logs csi-hostpathplugin-0 csi-resizer, and the log print out:

I0124 19:48:04.652450       1 main.go:108] "Version" version="66ff3280dd401e4ed84dbce9b981a08ca69f22cc"
I0124 19:48:04.652552       1 feature_gate.go:249] feature gates: &{map[]}
I0124 19:48:04.654874       1 connection.go:215] Connecting to unix:///csi/csi.sock
I0124 19:48:04.656828       1 common.go:138] Probing CSI driver for readiness
I0124 19:48:04.656856       1 connection.go:244] GRPC call: /csi.v1.Identity/Probe
I0124 19:48:04.656864       1 connection.go:245] GRPC request: {}
I0124 19:48:04.660521       1 connection.go:251] GRPC response: {}
I0124 19:48:04.660597       1 connection.go:252] GRPC error: <nil>
I0124 19:48:04.660642       1 connection.go:244] GRPC call: /csi.v1.Identity/GetPluginInfo
I0124 19:48:04.660726       1 connection.go:245] GRPC request: {}
I0124 19:48:04.661739       1 connection.go:251] GRPC response: {"name":"hostpath.csi.k8s.io","vendor_version":"61b168ca726c91a9e025a8af8533e18453639f71"}
I0124 19:48:04.661758       1 connection.go:252] GRPC error: <nil>
I0124 19:48:04.661777       1 main.go:161] "CSI driver name" driverName="hostpath.csi.k8s.io"
I0124 19:48:04.661793       1 connection.go:244] GRPC call: /csi.v1.Identity/GetPluginCapabilities
I0124 19:48:04.661801       1 connection.go:245] GRPC request: {}
I0124 19:48:04.663366       1 connection.go:251] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}},{"Type":{"Service":{"type":3}}},{"Type":{"Service":{"type":2}}}]}
I0124 19:48:04.663439       1 connection.go:252] GRPC error: <nil>
I0124 19:48:04.663521       1 connection.go:244] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0124 19:48:04.663558       1 connection.go:245] GRPC request: {}
I0124 19:48:04.665109       1 connection.go:251] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":12}}},{"Type":{"Rpc":{"type":4}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":11}}},{"Type":{"Rpc":{"type":13}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":14}}}]}
I0124 19:48:04.665159       1 connection.go:252] GRPC error: <nil>
I0124 19:48:04.665211       1 connection.go:244] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0124 19:48:04.665257       1 connection.go:245] GRPC request: {}
I0124 19:48:04.666352       1 connection.go:251] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":12}}},{"Type":{"Rpc":{"type":4}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":11}}},{"Type":{"Rpc":{"type":13}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":14}}}]}
I0124 19:48:04.666426       1 connection.go:252] GRPC error: <nil>
I0124 19:48:04.666509       1 main.go:185] "===== Creating csiModifier ====="
I0124 19:48:04.666593       1 csi_modifier.go:39] "===== Creating NewModifierFromClient ====="
I0124 19:48:04.666646       1 connection.go:244] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0124 19:48:04.666686       1 connection.go:245] GRPC request: {}
I0124 19:48:04.667733       1 connection.go:251] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":12}}},{"Type":{"Rpc":{"type":4}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":11}}},{"Type":{"Rpc":{"type":13}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":14}}}]}
I0124 19:48:04.667816       1 connection.go:252] GRPC error: <nil>
I0124 19:48:04.668037       1 controller.go:120] "Register Pod informer for resizer" controller="hostpath.csi.k8s.io"
I0124 19:48:04.668150       1 main.go:218] "===== Check if VolumeAttributesClass is Enabled ====="
I0124 19:48:04.668262       1 controller.go:243] "Starting external resizer" controller="hostpath.csi.k8s.io"
I0124 19:48:04.668459       1 reflector.go:289] Starting reflector *v1.PersistentVolumeClaim (10m0s) from k8s.io/client-go/informers/factory.go:159
I0124 19:48:04.668485       1 reflector.go:325] Listing and watching *v1.PersistentVolumeClaim from k8s.io/client-go/informers/factory.go:159
I0124 19:48:04.668462       1 reflector.go:289] Starting reflector *v1.Pod (10m0s) from k8s.io/client-go/informers/factory.go:159
I0124 19:48:04.668574       1 reflector.go:325] Listing and watching *v1.Pod from k8s.io/client-go/informers/factory.go:159
I0124 19:48:04.668764       1 reflector.go:289] Starting reflector *v1.PersistentVolume (10m0s) from k8s.io/client-go/informers/factory.go:159
I0124 19:48:04.668818       1 reflector.go:325] Listing and watching *v1.PersistentVolume from k8s.io/client-go/informers/factory.go:159
I0124 19:48:04.679786       1 reflector.go:351] Caches populated for *v1.PersistentVolumeClaim from k8s.io/client-go/informers/factory.go:159
I0124 19:48:04.680732       1 reflector.go:351] Caches populated for *v1.PersistentVolume from k8s.io/client-go/informers/factory.go:159
I0124 19:48:04.705037       1 reflector.go:351] Caches populated for *v1.Pod from k8s.io/client-go/informers/factory.go:159

The ===== Check if VolumeAttributesClass is Enabled ===== shows that VAC is not enabled and the resizer is up and running without initializing the modify controller, please see the logs print out setting here.

Does this PR introduce a user-facing change?:

Add Modify Volume Support for VolumeAttributesClass, this requires:
1. The feature gate VolumeAttributesClass and API to be enabled in Kubernetes cluster
2. The feature gate VolumeAttributesClass feature gate enabled in external-resizer
sunnylovestiramisu commented 10 months ago

make test succeeded locally, but the presubmit jobs failed with:

go: go.mod file indicates go 1.21, but maximum version supported by tidy is 1.20
ERROR: vendor check failed.
make: *** [release-tools/build.make:279: test-vendor] Error 1
sunnylovestiramisu commented 10 months ago

/assign @gnufied

carlory commented 10 months ago

The current implementation of the VolumeAttributesClass feature re-uses the workflow of volume resize controller. We have to take care of the following issues:

I am not sure if this is the best approach. I would like to discuss this topic here.

The volume resize controller should be used only for volume resize. I think that the VolumeAttributesClass feature should be implemented as a separate workflow. In other words, implement a new controller for the VolumeAttributesClass feature. The new controller codes can be placed in the external-resizer repository to avoid introducing a new sidecar container.

sunnylovestiramisu commented 10 months ago

The volume resize controller should be used only for volume resize <-- we are putting it here because we do not want to add another sidecar to maintain. So we do not want to have a new controller. This has been discussed a few time during design sessions. If there is new points you want to make we can revisit.

About having a new controller, I am neutral on this. Let's see what others say.

Okay talked with Michelle, her words are "we are trying to consolidate components. It is too much of maintenance burden to manage so many. kube-controller-manager contains many controllers all in one process."

sunnylovestiramisu commented 9 months ago

Have we done an e2e test (and by e2e here I mean manual testing) of this PR? No we haven't, we need a csi driver actually implement this feature to test end to end. And right now no csi driver implemented it.

gnufied commented 9 months ago

Have we done an e2e test (and by e2e here I mean manual testing) of this PR? No we haven't, we need a csi driver actually implement this feature to test end to end. And right now no csi driver implemented it.

I am afraid, we shouldn't merge this feature without testing the entire workflow with either mock driver or something similar.

sunnylovestiramisu commented 9 months ago

@gnufied we have mock driver unit tests in each sidecars though, just not together. What did you do for other resizer features in the past?

sunnylovestiramisu commented 9 months ago

Okay I talked with Michelle, the end to end testing does not need to be in an actual provider's driver, we can use csi-driver-host-path.

Let's add the test cases in: https://github.com/kubernetes-csi/csi-driver-host-path/issues/479

gnufied commented 9 months ago

What did you do for other resizer features in the past?

We tested the entire workflow e2e using either mock or hostpath driver. It will be very unusual to ship a feature without working through entire workflow.

gnufied commented 9 months ago

Okay talked with Michelle, her words are "we are trying to consolidate components. It is too much of maintenance burden to manage so many. kube-controller-manager contains many controllers all in one process."

I am sort of leaning towards what @carlory proposed here and while having a entirely new sidecar indeed will be bad, but having just a simple control-loop is not so bad. We already have examples of simple control-loops in KCM such as for PVC protection etc. @msau42 what do you think?

Currently - putting both VAC modification and volume resizing kinda tramples over each other's state. For example - if there is an error while modifying volume, then resizing can't happen as well because control will return from the loop. We will avoid these problems if we chose to keep a separate controller.

gnufied commented 9 months ago

Also things like retry rate for resizing could be different for retry rate for volume modification. All these things are hard to control in single control loop for same PVC.

msau42 commented 8 months ago

Also add in the release note that this requires the feature gate in resizer to be enabled, and the feature gate and api to be enabled in the k8s cluster.

msau42 commented 8 months ago

Also add the feature here: https://github.com/kubernetes-csi/external-resizer?tab=readme-ov-file#feature-status

sunnylovestiramisu commented 8 months ago

Changed the release note and also added the feature to README.md

msau42 commented 8 months ago

Thanks for the release note, can you add the specific feature gate and API that have to be enabled?

sunnylovestiramisu commented 8 months ago

I added VolumeAttributesClass feature gate to the description.

sunnylovestiramisu commented 8 months ago

Also retested the code end to end:

k describe pvc
Name:          csi-pvc
Namespace:     default
StorageClass:  csi-hostpath-sc
Status:        Bound
Volume:        pvc-eb5c043c-d60f-4b38-92b5-9d581eb97c7e
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: hostpath.csi.k8s.io
               volume.kubernetes.io/storage-provisioner: hostpath.csi.k8s.io
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       my-csi-app
Events:
  Type    Reason                  Age    From                                                                           Message
  ----    ------                  ----   ----                                                                           -------
  Normal  ExternalProvisioning    6m16s  persistentvolume-controller                                                    Waiting for a volume to be created either by the external provisioner 'hostpath.csi.k8s.io' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
  Normal  Provisioning            6m16s  hostpath.csi.k8s.io_csi-hostpathplugin-0_49e258d5-2317-4bba-94e8-24fc5c72b4aa  External provisioner is provisioning volume for claim "default/csi-pvc"
  Normal  ProvisioningSucceeded   6m16s  hostpath.csi.k8s.io_csi-hostpathplugin-0_49e258d5-2317-4bba-94e8-24fc5c72b4aa  Successfully provisioned volume pvc-eb5c043c-d60f-4b38-92b5-9d581eb97c7e
  Normal  VolumeModify            99s    external-resizer hostpath.csi.k8s.io                                           external resizer is modifying volume csi-pvc with vac silver
  Normal  VolumeModifySuccessful  99s    external-resizer hostpath.csi.k8s.io                                           external resizer modified volume csi-pvc with vac silver successfully
  Normal  VolumeModify            4s     external-resizer hostpath.csi.k8s.io                                           external resizer is modifying volume csi-pvc with vac gold
  Normal  VolumeModifySuccessful  4s     external-resizer hostpath.csi.k8s.io                                           external resizer modified volume csi-pvc with vac gold successfully
gnufied commented 8 months ago

Can you squash the commits?

sunnylovestiramisu commented 8 months ago

@gnufied Rebased and combined all the commits into one :)

gnufied commented 8 months ago

/lgtm

gnufied commented 8 months ago

/approve

k8s-ci-robot commented 8 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gnufied, sunnylovestiramisu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-csi/external-resizer/blob/master/OWNERS)~~ [gnufied] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment