Closed CecileRobertMichon closed 5 months ago
Hi @CecileRobertMichon. Thanks for your PR.
I'm waiting for a medik8s member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
Hi @CecileRobertMichon! Nice catch, we missed those log lines so far.
The reason for the error is that we use get
and create
for a common namespace for leases, which was added not too long ago. By default controller-runtime adds every resource we get
to its cache, which then needs list
and watch
.
We would prefer though to not extend permissions in this case. Listing namespaces can reveal information about the cluster which we would'n have without that permission.
Instead we would prefer to disable caching for namespaces. That is possible with the ClientDisableCacheFor
option when creating the manager in main.go
, see https://sdk.operatorframework.io/docs/faqs/#after-deploying-my-operator-i-see-errors-like-failed-to-watch-external-type.
Do you mind updating the PR to use that option? Otherwise we can add it to our backlog.
Thanks!
Hi @slintes, thanks for the reply and detailed guidance. I've adjusted the PR. Note that the ClientDisableCacheFor
no longer exists in controller-runtime, it was deprecated and later removed so https://sdk.operatorframework.io/docs/faqs/#after-deploying-my-operator-i-see-errors-like-failed-to-watch-external-type is outdated.
@slintes while we're here, I'm also seeing:
E0402 04:46:21.482510 1 event.go:346] "Server rejected event (will not retry!)" err="events is forbidden: User \"system:serviceaccount:node-healthcheck:node-healthcheck-controller-manager\" cannot create resource \"events\" in API group \"\" in the namespace \"default\""
Happy to open a separate PR for that one if there is a desirable fix.
Hi @CecileRobertMichon. Thanks for the PR update!
We remembered why you run into the missing namespace permission while we missed it: for some reason, the Operator Lifecycle Manager (OLM), which is the only supported way to install the operator, is adding a cluster role with that permission. So is it safe to assume you don't use OLM?
The missing event permission is related but different: here the permission exists in OLM's deployment configuration (the ClusterServiceVersion manifest, see https://github.com/medik8s/node-healthcheck-operator/blob/6e935de8cc8fd61e56706667499697858002f2d9/bundle/manifests/node-healthcheck-operator.clusterserviceversion.yaml#L569. It's coming from here: https://github.com/medik8s/node-healthcheck-operator/blob/main/config/rbac/leader_election_role.yaml#L31. Somehow you miss this in your deployment.
/ok-to-test
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: CecileRobertMichon, slintes
The full list of commands accepted by this bot can be found here.
The pull request process is described here
@slintes correct, I'm attempting to install the operator without going through OLM to manage its lifecycle. So far it's worked well, aside from these few quirks.
Why we need this PR
The
node-healthcheck-manager-role
ClusterRole is missing permissions to list and watch namespaces, causing the following errors in the manager logs from controller-runtime:This PR disables the cache option for Namespaces according to https://sdk.operatorframework.io/docs/faqs/#after-deploying-my-operator-i-see-errors-like-failed-to-watch-external-type.
Changes made
Which issue(s) this PR fixes
ECOPROJECT-1940
Test plan