vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.78k stars 1.41k forks source link

EnableCSI feature flag causes velero to crash loop if snapshot API group missing #5862

Closed rnarenpujari closed 1 year ago

rnarenpujari commented 1 year ago

What steps did you take and what happened:

If you install velero with the EnableCSI feature flag on a cluster which doesn't have the volume snapshot CRDs installed, velero bails out with the following error causing the pod to crash loop.

time="2023-02-04T04:46:27Z" level=fatal msg="The 'EnableCSI' feature flag was specified, but CSI API group [snapshot.storage.k8s.io/v1] was not found." logSource="pkg/cmd/server/server.go:589"

What did you expect to happen:

Perhaps not technically a bug but it may be more graceful to instead fail at backup/restore time if the prerequisites are not met versus crash looping at install time.

The following information will help us better understand what's going on:

N/A

Anything else you would like to add:

N/A

Environment:

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

allenxu404 commented 1 year ago

Make sense. But if we want to thoroughly fix this issue, just checking volume snapshot CRDs might not enough since the prerequisites to take CSI snapshot for different providers could be different. It's tricky to cover all scenario in validation function.

shubham-pampattiwar commented 1 year ago

Whenever you specify EnableCSI flag at install time, Velero controller adds the velero-plugin-for-csi plugin, this plugin specifically helps take CSI volumesnapshots, if the CSI API group is not present in the cluster the plugin would not be able to take CSI snapshots and hence not able to function as expected, hence the error, but yes this should be handled gracefully. @blackpiglet Any thoughts ?

reasonerjt commented 1 year ago

Yep, per discussion we wanna print the warning and delay the error so velero pod won't fail to start.

blackpiglet commented 1 year ago

Fixed in PR #5969. Close.

reasonerjt commented 1 year ago

In my test the velero pod still restarts if enableCSI feature flag is set but CSI is not enabled on the target cluster, maybe we need to check if there's any log.Fatal in the library?:

ime="2023-03-30T08:22:10Z" level=info msg="1 feature flags enabled [EnableCSI]" logSource="pkg/cmd/server/server.go:190"
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=BackupItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/crd-remap-version
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=BackupItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/pod
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=BackupItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/pv
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=BackupItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/service-account
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/add-pv-from-pvc
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/add-pvc-from-pod
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/admission-webhook-configuration
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/apiservice
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/change-image-name
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/change-pvc-node-selector
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/change-storage-class
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/cluster-role-bindings
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/crd-preserve-fields
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/init-restore-hook
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/job
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/pod
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/pod-volume-restore
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/role-bindings
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/secret
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/service
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/service-account
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/plugins/velero-plugin-for-aws kind=VolumeSnapshotter logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/aws
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/plugins/velero-plugin-for-aws kind=ObjectStore logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/aws
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/plugins/velero-plugin-for-csi kind=BackupItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/csi-pvc-backupper
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/plugins/velero-plugin-for-csi kind=BackupItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/csi-volumesnapshot-backupper
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/plugins/velero-plugin-for-csi kind=BackupItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/csi-volumesnapshotclass-backupper
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/plugins/velero-plugin-for-csi kind=BackupItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/csi-volumesnapshotcontent-backupper
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/plugins/velero-plugin-for-csi kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/csi-pvc-restorer
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/plugins/velero-plugin-for-csi kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/csi-volumesnapshot-restorer
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/plugins/velero-plugin-for-csi kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/csi-volumesnapshotclass-restorer
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/plugins/velero-plugin-for-csi kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/csi-volumesnapshotcontent-restorer
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/plugins/velero-plugin-for-csi kind=DeleteItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/csi-volumesnapshot-delete
time="2023-03-30T08:22:10Z" level=info msg="registering plugin" command=/plugins/velero-plugin-for-csi kind=DeleteItemAction logSource="pkg/plugin/clientmgmt/process/registry.go:101" name=velero.io/csi-volumesnapshotcontent-delete
time="2023-03-30T08:22:10Z" level=info msg="Metrics server is starting to listen" addr=":8080" logSource="/go/pkg/mod/github.com/bombsimon/logrusr/v3@v3.0.0/logrusr.go:108" logger=controller-runtime.metrics
An error occurred: the server could not find the requested resource
time="2023-03-30T08:22:10Z" level=warning msg="The 'EnableCSI' feature flag was specified, but CSI API group [snapshot.storage.k8s.io/v1] was not found." logSource="pkg/cmd/server/server.go:587"