vmware-samples / cloud-native-storage-self-service-manager

Cloud Native Storage (CNS) Manager is a diagnostic and self-service tool that helps detect and auto-remediate some of the known issues in storage control plane.
Apache License 2.0
14 stars 4 forks source link

Missing RBAC permissions for cnsmanager-sa #6

Open owwweiha opened 1 year ago

owwweiha commented 1 year ago

Describe the bug

When using the get-kubeconfig.sh script, the created cnsmanager-sa lacks of some permissions:

2023-03-28T11:17:56.643Z ERROR VolumeMigrationJobController.controller.volumemigrationjob-controller volumemigrationjob/volumemigrationjob_controller.go:216 failed to get volume migration tasks for the job {"name": "volumemigrationjob-2b352bca-cd5a-11ed-8421-1e896c6caec5", "namespace": "cns-manager", "TraceId": "ac7f0efe-bea9-48c0-9630-77b7bfcfa475", "error": "volumemigrationtasks.cnsmanager.cns.vmware.com is forbidden: User \"system:serviceaccount:default:cnsmanager-sa\" cannot list resource \"volumemigrationtasks\" in API group \"cnsmanager.cns.vmware.com\" in the namespace \"cns-manager\""}

E0328 11:18:19.819033 1 event.go:264] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"volumemigrationjob-3d5c91be-cd5a-11ed-8421-1e896c6caec5.17509130cb8b2e2a", GenerateName:"", Namespace:"cns-manager", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(time.Location)(nil)}}, DeletionTimestamp:(v1.Time)(nil), DeletionGracePeriodSeconds:(int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"VolumeMigrationJob", Namespace:"cns-manager", Name:"volumemigrationjob-3d5c91be-cd5a-11ed-8421-1e896c6caec5", UID:"b3f4cf99-fdde-4bec-b2e6-9383e35f9240", APIVersion:"cnsmanager.cns.vmware.com/v1alpha1", ResourceVersion:"243516", FieldPath:""}, Reason:"VolumeMigrationJobCompleted", Message:"All tasks finished. Volume migration job is complete", Source:v1.EventSource{Component:"cnsmanager.cns.vmware.com", Host:""}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc100d11eedcf202a, ext:6201947942605, loc:(time.Location)(0x36647a0)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc100d11eedcf202a, ext:6201947942605, loc:(time.Location)(0x36647a0)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(time.Location)(nil)}}, Series:(v1.EventSeries)(nil), Action:"", Related:(v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events is forbidden: User "system:serviceaccount:default:cnsmanager-sa" cannot create resource "events" in API group "" in the namespace "cns-manager"' (will not retry!)

Adding this to the ClusterRole makes the error messages disappear:

- apiGroups: [""]
  resources: ["events"]
  verbs: ["create", "get", "list", "watch", "update", "patch", "delete"]
- apiGroups: ["cnsmanager.cns.vmware.com"]
  resources: ["volumemigrationtasks"]
  verbs: ["get", "list"]

Reproduction steps

  1. Deploy cns-manager according to documentation
  2. Run a migration
  3. Get pod logs and see the ERROR messages

Expected behavior

All needed RBAC permissions should be included and it shouldn't be necessary to add some in addition to the scripts that create them.

Additional context

No response

gohilankit commented 1 year ago

@owwweiha Can you check the service account for your cns-manager deployment?

get-kubeconfig.sh script creates service account cnsmanager-sa, ClusterRole & ClusterRoleBinding on the remote Kubernetes server(with vSphere CSI driver) that it will be managing. The purpose of it is to create a service account on remote cluster with minimum necessary privileges, and then create a kubeconfig with that service account which can be used for cluster registration.

There's a different service account for cns-manager deployment itself(assuming you're using basicauth deployment) - https://github.com/vmware-samples/cloud-native-storage-self-service-manager/blob/main/deploy/basic-auth/deploy-template.yaml#L1 . This service account is bound to a ClusterRole which has all the necessary permissions.

I'm not sure why you have a service account(cnsmanager-sa) which is supposed to manage resource access on remote Kubernetes cluster(with vSphere CSI driver) trying to access resources meant for cns-manager application. You may have CNS manager deployed on the same kubernetes cluster, but then there should be 2 different service accounts each bound to a different ClusterRole.

owwweiha commented 1 year ago

Hi @gohilankit,

thank you for your response. You're right... I used the get-kubeconfig script to generate the sv_kubeconfig, that's obviously wrong. The reason why I did this is that we're using OIDC auth-provider. When using my admins' kubeconfig containing OIDC, I'll get:

2023-04-12T08:47:51.579Z ERROR Main volumemigrationjob/volumemigrationjob_controller.go:79 KubeClient creation failed {"error": "no Auth Provider found for name \"oidc\""} gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/pkg/cnsoperator/controller/volumemigrationjob.Add /go/src/pkg/cnsoperator/controller/volumemigrationjob/volumemigrationjob_controller.go:79 gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/pkg/cnsoperator/controller.AddToManager /go/src/pkg/cnsoperator/controller/controller.go:32 gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/pkg/cnsoperator/manager.InitCnsManagerOperator /go/src/pkg/cnsoperator/manager/init.go:92 main.initCnsManagerOperator.func1 /go/src/main.go:64 2023-04-12T08:47:51.579Z ERROR Main manager/init.go:93 failed to setup controller for Cns manager operator {"error": "no Auth Provider found for name \"oidc\""} gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/pkg/cnsoperator/manager.InitCnsManagerOperator /go/src/pkg/cnsoperator/manager/init.go:93 main.initCnsManagerOperator.func1 /go/src/main.go:64 2023-04-12T08:47:51.579Z ERROR Main src/main.go:65 Error initializing Cns manager Operator {"error": "no Auth Provider found for name \"oidc\""} main.initCnsManagerOperator.func1 /go/src/main.go:65

I modified the kubeconfig to use the cns-manager SA now. So there is no issue for volumemigrationtasks. But I think events are missing on the ClusterRole for the cns-manager SA:

'events is forbidden: User "system:serviceaccount:cns-manager:cns-manager" cannot create resource "events" in API group "" in the namespace "cns-manager"' (will not retry!)

Nevertheless, using the cns-manager SA will only work once it's created (which will be done by the deployment script) - so it won't work to pass the sv_kubeconfig containing cns-manager SA to the first run of deploy.sh because the ServiceAccount does not exist yet. Any chance to get this working with OIDC as auth provider?

By the way, we're using TKGI. Maybe something is different here? E.g., the deployment uses the psp:vmware-system-privileged ClusterRole (https://github.com/vmware-samples/cloud-native-storage-self-service-manager/blob/main/deploy/basic-auth/deploy-template.yaml#L15) which does not exist in TKGI. I created a ClusterRole to use the pks-privileged PSP instead.

owwweiha commented 1 year ago

Closing this issue was a mistake, sorry!

owwweiha commented 1 year ago

Any news on this? Would be great to use a valid kubeconfig file during installation process (deploy.sh + basicauth) without modifying it. Right now, fresh (and totally valid) kubeconfig with my admin user gives me:

2023-07-10T08:19:35.135Z ERROR OrphanVolumeMonitoring ov/monitoring.go:403 Failed to create kube client. {"TraceId": "3784cc71-5a17-4c04-acfb-aa3afb983c3e", "ClusterID": "sv_kubeconfig", "erro r": "no Auth Provider found for name \"oidc\""} gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/pkg/ov.getPVsInRegisteredClusters /go/src/pkg/ov/monitoring.go:403 gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/pkg/ov.updateOrphanVolumeCache /go/src/pkg/ov/monitoring.go:106 gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/pkg/ov.InitOrphanVolumeMonitoring.func1 /go/src/pkg/ov/monitoring.go:61 reflect.Value.call /usr/local/go/src/reflect/value.go:556 reflect.Value.Call /usr/local/go/src/reflect/value.go:339 github.com/go-co-op/gocron.callJobFuncWithParams /go/pkg/mod/github.com/go-co-op/gocron@v1.6.2/gocron.go:76 github.com/go-co-op/gocron.(executor).start.func1.1 /go/pkg/mod/github.com/go-co-op/gocron@v1.6.2/executor.go:90 golang.org/x/sync/singleflight.(Group).doCall.func2 /go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/singleflight/singleflight.go:193 golang.org/x/sync/singleflight.(Group).doCall /go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/singleflight/singleflight.go:195 golang.org/x/sync/singleflight.(Group).Do /go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/singleflight/singleflight.go:108 github.com/go-co-op/gocron.(*executor).start.func1 /go/pkg/mod/github.com/go-co-op/gocron@v1.6.2/executor.go:82