knative / serving-operator

DEPRECATED: Development continues in https://github.com/knative/operator/
Apache License 2.0
39 stars 45 forks source link

Restrict RBAC for Operator #291

Closed Cynocracy closed 4 years ago

Cynocracy commented 4 years ago

Fixes #282

374 is a base of this, you can solely review the YAML here

(and not the reconciler change) though you're welcome to review that as well.

Proposed Changes

By using the escalate verb on clusterroles, we allow the Operator to have the ability to create any clusterrole. This prevents us from needing to grant it cluster-admin.

Additionally, by having the Operator create an aggregated clusterrole for itself (aggregating all clusteroles created by Knative Serving), we pick up any permissions that the upstream dictate as necessary by any component.

Any roles that are not created by upstream, but which are bound as part of install are added with the bind verb to allow the Operator to bind them without necessarily having the permissions they grant.

Some other permissions are needed to support controller infra and are explicitly included, as well as some specific to the Operator itself. An attempt at grouping these by sections is included, though may be slightly out of sync (in particular: Controller infra / Specific to this Operator may overlap).

Overall, this should keep the Operator roughly operating only on Knative Serving resources (or things which can be operated on by upstream).

Release Note

Change RBAC to minimize permissions of Operator service account.
Cynocracy commented 4 years ago

/retest

(seems to be running OK locally)

Cynocracy commented 4 years ago

Oh, I see the issue.

Cynocracy commented 4 years ago

/retest

:(

Cynocracy commented 4 years ago

Found out how to read the logs:

{"ts":"2020-02-12T01:09:32.279Z","knative.dev/controller":"knativeserving-controller","msg":"Reconcile succeeded. Time taken: 11.99198359s.","commit":"0b4d6f8","level":"info","logger":"serving_operator.knativeserving-controller","knative.dev/key":"operator-tests/knative-serving","caller":"controller…

Hmmmm

/retest

so that I can poke around more :)

Cynocracy commented 4 years ago

raceid":"fd80c601-0e62-4773-bdd6-1ecb60dbbf78","knative.dev/key":"operator-tests/knative-serving"} {"level":"info","ts":"2020-02-13T01:16:24.454Z","logger":"serving_operator.knativeserving-controller.event-broadcaster","caller":"record/event.go:274","msg":"Event(v1.ObjectReference{Kind:\"KnativeServing\", Namespace:\"operator-tests\", Name:\"knative-serving\", UID:\"5e9ab652-b2da-431a-86cb-211efa5cbd70\", APIVersion:\"operator.knative.dev/v1alpha1\", ResourceVersion:\"4761\", FieldPath:\"\"}): type: 'Warning' reason: 'InternalError' apiservices.apiregistration.k8s.io \"v1beta1.custom.metrics.k8s.io\" is forbidden: User \"system:serviceaccount:default:knative-serving-operator\" cannot update resource \"apiservices\" in API group \"apiregistration.k8s.io\" at the cluster scope","commit":"b29f72a","knative.dev/controller":"knativeserving-controller"}

OK, so that was required.

Cynocracy commented 4 years ago

This is ready for a preliminary review.

Let me know if more detail is necessary.

Cynocracy commented 4 years ago

I think splitting it into multiple files makes sense, especially since some of the sections will be drop-in equivalents between the serving and eventing operator.

I'm working on another WIP for that repo, let me take a stab at splitting once I have that working (it will be a good proof of concept that the general idea of 'escalate' + 'bind' maps well).

Cynocracy commented 4 years ago

/retest

Cynocracy commented 4 years ago

{"level":"info","ts":"2020-03-06T23:30:27.273Z","logger":"serving-operator.knativeserving-controller.event-broadcaster","caller":"record/event.go:274","msg":"Event(v1.ObjectReference{Kind:\"KnativeServing\", Namespace:\"knative-serving\", Name:\"knative-serving\", UID:\"cbacee1e-5ed9-4bac-a8ba-d65b0ba9225a\", APIVersion:\"operator.knative.dev/v1alpha1\", ResourceVersion:\"2288\", FieldPath:\"\"}): type: 'Warning' reason: 'InternalError' serviceaccounts \"controller\" is forbidden: User \"system:serviceaccount:default:knative-serving-operator\" cannot get resource \"serviceaccounts\" in API group \"\" in the namespace \"knative-serving\"","commit":"1419ff1","knative.dev/controller":"knativeserving-controller"}

Cynocracy commented 4 years ago

Wait.. it should have * in knative-serving.

trshafer commented 4 years ago

Wait.. it should have * in knative-serving.

Are there more error messages or does it need GET serviceaccounts in knative-serving?

Cynocracy commented 4 years ago

I believe the issue has been fixed (the issue was that the role and the rolebinding need to be in the relevant namespace, and the subjects needed to point to the default namespace).

This opens up a question I need to resolve though: do we need to support multiple KS installations in the same cluster, or can we assume the default namespace is used?

Cynocracy commented 4 years ago

OK, the error this time: deleting fails, because we delete the role that gives us the ability to delete 'admissionregistration.k8s.io' resources before we delete those resources.

Going to take a stab at saving the deletion of the roles/rolebindings until last as we need to 'unbootstrap' ourselves from having these permissions.

Cynocracy commented 4 years ago

/retest

Cynocracy commented 4 years ago

/assign @jcrossley3

Cynocracy commented 4 years ago

WIP as eventing change still is, and I'd like these to go in concurrently if at all.

Cynocracy commented 4 years ago

/unassign @jcrossley3

Cynocracy commented 4 years ago

/assign @jcrossley3

Assuming my last minute diff-tweaking didn't break this, this should be ready to review :)

The only remaining diff between the role here and in eventing are the replacement of some labels, a few explicit 'bind's (which now need to be added to roles the Operator binds to but does not create), and the resource deletion section at the end.

Cynocracy commented 4 years ago

One more comment re: the last deltas.

In the future, if we move to a job based cleanup, deletion roles can be defined one-off. If the Operator repos and/or controllers are to merge, this dramatically simplifies the collapse of the roles needed.

Cynocracy commented 4 years ago

Commits now squashed (squished?) into a single one :)

Thanks for the tip jcrossley3

jcrossley3 commented 4 years ago

Runs fine on OpenShift, so... /lgtm

Cynocracy commented 4 years ago

To double check the diff, I did

  1. ko apply config
  2. kubectl api-resources -oname | xargs -I% kubectl get -A % -oname | sort | uniq > /tmp/before
  3. Create the KS resource
  4. Wait for it to finish
  5. Delete the KS resource
  6. kubectl api-resources -oname | xargs -I% kubectl get -A % -oname | sort | uniq > /tmp/after 7 diff /tmp/before /tmp/after

Which spat out

https://pastebin.com/tugjzdeU

The only oddness is the activator pod, which seems to be hanging around.

Cynocracy commented 4 years ago

https://pastebin.com/bjuPParB

Before, no activator leak, curious.

Cynocracy commented 4 years ago

A second attempt at a diff produced https://pastebin.com/N0w4R9QZ

So it's entirely possible I generated the last one while the pod was going down (I only diff the names, so who knows :)

Either way, this is ready for an approval, IMO.

trshafer commented 4 years ago

/approve

knative-prow-robot commented 4 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Cynocracy, trshafer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/knative/serving-operator/blob/master/OWNERS)~~ [trshafer] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment