kubernetes-csi / external-snapshotter

Sidecar container that watches Kubernetes Snapshot CRD objects and triggers CreateSnapshot/DeleteSnapshot against a CSI endpoint.
Apache License 2.0
483 stars 367 forks source link

Avoid panicking when snapshotting a non-CSI PV #1067

Closed leonardoce closed 5 months ago

leonardoce commented 5 months ago

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change /kind bug /kind cleanup /kind design /kind documentation /kind failing-test /kind feature /kind flake

What this PR does / why we need it:

This PR avoids the external-snapshotter controller from panicking when the user requests a VolumeGroupSnapshot across non-CSI PVs.

In my tests, master was failing with:

goroutine 124 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0000f8800?})
    /Users/leonardo.cecchi/go/src/github.com/kubernetes-csi/external-snapshotter/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xcd
panic({0x17ff940?, 0x29eba50?})
    /Users/leonardo.cecchi/.asdf/installs/golang/1.21.9/go/src/runtime/panic.go:914 +0x21f
github.com/kubernetes-csi/external-snapshotter/v7/pkg/common-controller.(*csiSnapshotCommonController).createGroupSnapshotContent(0xc00031ad00, 0xc0008883c0)
    /Users/leonardo.cecchi/go/src/github.com/kubernetes-csi/external-snapshotter/pkg/common-controller/groupsnapshot_controller_helper.go:757 +0x2db
github.com/kubernetes-csi/external-snapshotter/v7/pkg/common-controller.(*csiSnapshotCommonController).syncUnreadyGroupSnapshot(0xc0000f8ff0?, 0xc0008883c0)
    /Users/leonardo.cecchi/go/src/github.com/kubernetes-csi/external-snapshotter/pkg/common-controller/groupsnapshot_controller_helper.go:445 +0x105b
github.com/kubernetes-csi/external-snapshotter/v7/pkg/common-controller.(*csiSnapshotCommonController).syncGroupSnapshot(0xc00031ad00, 0xc0008883c0)
    /Users/leonardo.cecchi/go/src/github.com/kubernetes-csi/external-snapshotter/pkg/common-controller/groupsnapshot_controller_helper.go:319 +0x6c5
github.com/kubernetes-csi/external-snapshotter/v7/pkg/common-controller.(*csiSnapshotCommonController).updateGroupSnapshot(0xc00031ad00, 0xc0008883c0)
    /Users/leonardo.cecchi/go/src/github.com/kubernetes-csi/external-snapshotter/pkg/common-controller/groupsnapshot_controller_helper.go:247 +0x2a7
github.com/kubernetes-csi/external-snapshotter/v7/pkg/common-controller.(*csiSnapshotCommonController).syncGroupSnapshotByKey(0xc00031ad00, {0xc000058480, 0x1e})
    /Users/leonardo.cecchi/go/src/github.com/kubernetes-csi/external-snapshotter/pkg/common-controller/snapshot_controller_base.go:702 +0xb87
github.com/kubernetes-csi/external-snapshotter/v7/pkg/common-controller.(*csiSnapshotCommonController).groupSnapshotWorker(0xc00031ad00)
    /Users/leonardo.cecchi/go/src/github.com/kubernetes-csi/external-snapshotter/pkg/common-controller/snapshot_controller_base.go:647 +0xed
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
    /Users/leonardo.cecchi/go/src/github.com/kubernetes-csi/external-snapshotter/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x1cc8900, 0xc0000f7290}, 0x1, 0xc0000ae540)
    /Users/leonardo.cecchi/go/src/github.com/kubernetes-csi/external-snapshotter/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x0, 0x0, 0x0?, 0x0?)
    /Users/leonardo.cecchi/go/src/github.com/kubernetes-csi/external-snapshotter/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?)
    /Users/leonardo.cecchi/go/src/github.com/kubernetes-csi/external-snapshotter/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x1e
created by github.com/kubernetes-csi/external-snapshotter/v7/pkg/common-controller.(*csiSnapshotCommonController).Run in goroutine 48
    /Users/leonardo.cecchi/go/src/github.com/kubernetes-csi/external-snapshotter/pkg/common-controller/snapshot_controller_base.go:234 +0x729

With this PR, the controller won't panic anymore, and an event will be recorder for the volumegroupsnapshot telling the user what happened:

➜ k get events |grep volumegroupsnapshot
13s         Warning   CreateGroupSnapshotContentFailed     volumegroupsnapshot/new-groupsnapshot-demo                 Cannot snapshot a non-CSI volume: pvc-de499b35-6856-4c52-98fc-74f3caa6ac27
42s         Warning   GroupSnapshotContentCreationFailed   volumegroupsnapshot/new-groupsnapshot-demo                 failed to create group snapshot content with error cannot snapshot a non-CSI volume for group snapshot default/new-groupsnapshot-demo: pvc-de499b35-6856-4c52-98fc-74f3caa6ac27

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Prevent snapshot controller from panicking when requesting a VolumeGroupSnapshot of a non-CSI volume.
k8s-ci-robot commented 5 months ago

Hi @leonardoce. Thanks for your PR.

I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
nixpanic commented 5 months ago

Nice catch!

/lgtm /ok-to-test

nixpanic commented 5 months ago

/unassign /assign xing-yang

xing-yang commented 5 months ago

In the release note, can you change "external-snapshotter" to "snapshot controller"?

leonardoce commented 5 months ago

In the release note, can you change "external-snapshotter" to "snapshot controller"?

@xing-yang done. Thank you!

xing-yang commented 5 months ago

/lgtm /approve

k8s-ci-robot commented 5 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: leonardoce, xing-yang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-csi/external-snapshotter/blob/master/OWNERS)~~ [xing-yang] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment