Closed Cynocracy closed 4 years ago
/retest
(seems to be running OK locally)
Oh, I see the issue.
/retest
:(
Found out how to read the logs:
{"ts":"2020-02-12T01:09:32.279Z","knative.dev/controller":"knativeserving-controller","msg":"Reconcile succeeded. Time taken: 11.99198359s.","commit":"0b4d6f8","level":"info","logger":"serving_operator.knativeserving-controller","knative.dev/key":"operator-tests/knative-serving","caller":"controller…
Hmmmm
/retest
so that I can poke around more :)
raceid":"fd80c601-0e62-4773-bdd6-1ecb60dbbf78","knative.dev/key":"operator-tests/knative-serving"} {"level":"info","ts":"2020-02-13T01:16:24.454Z","logger":"serving_operator.knativeserving-controller.event-broadcaster","caller":"record/event.go:274","msg":"Event(v1.ObjectReference{Kind:\"KnativeServing\", Namespace:\"operator-tests\", Name:\"knative-serving\", UID:\"5e9ab652-b2da-431a-86cb-211efa5cbd70\", APIVersion:\"operator.knative.dev/v1alpha1\", ResourceVersion:\"4761\", FieldPath:\"\"}): type: 'Warning' reason: 'InternalError' apiservices.apiregistration.k8s.io \"v1beta1.custom.metrics.k8s.io\" is forbidden: User \"system:serviceaccount:default:knative-serving-operator\" cannot update resource \"apiservices\" in API group \"apiregistration.k8s.io\" at the cluster scope","commit":"b29f72a","knative.dev/controller":"knativeserving-controller"}
OK, so that was required.
This is ready for a preliminary review.
Let me know if more detail is necessary.
I think splitting it into multiple files makes sense, especially since some of the sections will be drop-in equivalents between the serving and eventing operator.
I'm working on another WIP for that repo, let me take a stab at splitting once I have that working (it will be a good proof of concept that the general idea of 'escalate' + 'bind' maps well).
/retest
{"level":"info","ts":"2020-03-06T23:30:27.273Z","logger":"serving-operator.knativeserving-controller.event-broadcaster","caller":"record/event.go:274","msg":"Event(v1.ObjectReference{Kind:\"KnativeServing\", Namespace:\"knative-serving\", Name:\"knative-serving\", UID:\"cbacee1e-5ed9-4bac-a8ba-d65b0ba9225a\", APIVersion:\"operator.knative.dev/v1alpha1\", ResourceVersion:\"2288\", FieldPath:\"\"}): type: 'Warning' reason: 'InternalError' serviceaccounts \"controller\" is forbidden: User \"system:serviceaccount:default:knative-serving-operator\" cannot get resource \"serviceaccounts\" in API group \"\" in the namespace \"knative-serving\"","commit":"1419ff1","knative.dev/controller":"knativeserving-controller"}
Wait.. it should have * in knative-serving.
Wait.. it should have * in knative-serving.
Are there more error messages or does it need GET serviceaccounts in knative-serving?
I believe the issue has been fixed (the issue was that the role and the rolebinding need to be in the relevant namespace, and the subjects needed to point to the default namespace).
This opens up a question I need to resolve though: do we need to support multiple KS installations in the same cluster, or can we assume the default namespace is used?
OK, the error this time: deleting fails, because we delete the role that gives us the ability to delete 'admissionregistration.k8s.io' resources before we delete those resources.
Going to take a stab at saving the deletion of the roles/rolebindings until last as we need to 'unbootstrap' ourselves from having these permissions.
/retest
/assign @jcrossley3
WIP as eventing change still is, and I'd like these to go in concurrently if at all.
/unassign @jcrossley3
/assign @jcrossley3
Assuming my last minute diff-tweaking didn't break this, this should be ready to review :)
The only remaining diff between the role here and in eventing are the replacement of some labels, a few explicit 'bind's (which now need to be added to roles the Operator binds to but does not create), and the resource deletion section at the end.
One more comment re: the last deltas.
In the future, if we move to a job based cleanup, deletion roles can be defined one-off. If the Operator repos and/or controllers are to merge, this dramatically simplifies the collapse of the roles needed.
Commits now squashed (squished?) into a single one :)
Thanks for the tip jcrossley3
Runs fine on OpenShift, so... /lgtm
To double check the diff, I did
Which spat out
The only oddness is the activator pod, which seems to be hanging around.
Before, no activator leak, curious.
A second attempt at a diff produced https://pastebin.com/N0w4R9QZ
So it's entirely possible I generated the last one while the pod was going down (I only diff the names, so who knows :)
Either way, this is ready for an approval, IMO.
/approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: Cynocracy, trshafer
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Fixes #282
374 is a base of this, you can solely review the YAML here
(and not the reconciler change) though you're welcome to review that as well.
Proposed Changes
By using the
escalate
verb on clusterroles, we allow the Operator to have the ability to create any clusterrole. This prevents us from needing to grant itcluster-admin
.Additionally, by having the Operator create an aggregated clusterrole for itself (aggregating all clusteroles created by Knative Serving), we pick up any permissions that the upstream dictate as necessary by any component.
Any roles that are not created by upstream, but which are bound as part of install are added with the
bind
verb to allow the Operator to bind them without necessarily having the permissions they grant.Some other permissions are needed to support controller infra and are explicitly included, as well as some specific to the Operator itself. An attempt at grouping these by sections is included, though may be slightly out of sync (in particular: Controller infra / Specific to this Operator may overlap).
Overall, this should keep the Operator roughly operating only on Knative Serving resources (or things which can be operated on by upstream).
Release Note