Closed viccuad closed 2 years ago
This bug seems to be pretty similar to https://github.com/kubewarden/kubewarden-controller/issues/115. I guess they are the same issue. Don't you agree? If so, I believe we can close the bug that I've opened as a duplicate of this one.
argh, yes, it's the same one. Sorry for opening a new one instead of commenting on that one. Given that here we have 2 examples, I'm happy closing that one as duplicate of this one.
argh, yes, it's the same one. Sorry for opening a new one instead of commenting on that one. Given that here we have 2 examples, I'm happy closing that one as duplicate of this one.
Done! ;)
As the closed bug was in the Development board and you are working on this. I'm adding it in the board as well. ;)
It looks like watch is called twice in an update. The first time gets called with the old object and then it gets called with the new one.
When we do an update or delete (which is an update because it just add the deletion timestamp), first is called with the old object, then with the new. But when is called with the old object we modify the status from active to pending https://github.com/kubewarden/kubewarden-controller/blob/main/controllers/policies/policyserver_controller.go#L185 then next time it calls with the new object and it says the error the object has been modified; please apply your changes to the latest version and try again
https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/handler@v0.10.0#EnqueueRequestsFromMapFunc
I would suggest moving https://github.com/kubewarden/kubewarden-controller/blob/main/controllers/policies/policyserver_controller.go#L185-L186 inside the PolicyServer reconciliation loop
I would suggest moving https://github.com/kubewarden/kubewarden-controller/blob/main/controllers/policies/policyserver_controller.go#L185-L186 inside the PolicyServer reconciliation loop
I'm not sure it's possible. Inside the reconciliation loop one has access to the list of policies associated with the PolicyServer, but doesn't know which one is new or updated with new settings, etc. The best we could do is to mark all policies as "pending" when a PolicyServer reconcile loop is started, but I think that's wrong.
Hence, marking the policy as pending if there's a new change in it in the watch, and waiting for the PolicyServer reconcile loop to mark it as active seems a better approach.
Looking at https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/handler@v0.10.0, I see no useful options for the handler function, so I'm sticking with handler.EnqueueRequestsFromMapFunc()
.
So, 2 objects per call, 1 old, 1 new. I see no way to skip the old one, which gets evaluated first. Any update on it invalidates the watch call for the new object, and triggers 2 more watch calls with the updated object.
After thinking, I see no difference between having 2 watches. I prefer not having 2 watches to reduce the number of pointers to old and new objects.
Given that any update invalidates the second watch call, I tried to minimize updates.
PolicyStatus
to unscheduled
, updating the object and invalidating the next watch call.I tried to default the PolicyStatus
to unscheduled
on the CRD definition to reduce the amount of updates, and at least not have the first one:
// +kubebuilder:default:=unscheduled
for enums doesn't take effect. See upstream issue https://github.com/kubernetes-sigs/controller-tools/issues/622
2, Tried to craft a kustomize patch, but with strategicMerge it will substitute the whole versions` list:
# The following patch sets the Clusteradmissionpolicy status to unscheduled.
# This is a workaround for a bug with:
# `// +kubebuilder:default:=unscheduled`
# as it doesn't take effect.
# See upstream issue https://github.com/kubernetes-sigs/controller-tools/issues/622
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: clusteradmissionpolicies.policies.kubewarden.io
spec:
versions:
- $patch: merge
- additionalPrinterColumns:
schema:
openAPIV3Schema:
properties:
status:
properties:
policyStatus:
default: unscheduled
versions
, which I don't think can be changed.Two cases:
None of those updates can be postponed. Minimum we have 2 updates (point 1 and 2) on the case of policies with PolicyServer. Setting PolicyStatus to pending can't be done in the reconcile loop for only the changed policies.
I can't see other way besides creating a reconciler for ClusterAdmissionPolicies. It would:
In the watch, we could then only trigger the PolicyServer reconciler for those policies that have the correct status, and are associated with a PolicyServer. This way, the watch doesn't do any Update of the object, and we can use the marked statuses in the policies to cribe calls to reconcile.
We might have race conditions if we add a ClusterAdmissionPolicy
reconciler as both would be updating the policy status. We do not know when a ClusterAdmissionPolicy
becomes active in the ClusterAdmissionPolicy
reconciler, so we would still need to do this in the PolicyServer
reconciler.
We check in the PolicyServer
reconciler loop if any ClusterAdmissionPolicy
has changed, if so we do a rollout of a new policy server. Could we check there which policy has changed and mark it as pending? I think that was done here https://github.com/kubewarden/kubewarden-controller/blob/200eb77938def4757defb88d43772b3129dc65c2/internal/pkg/admission/policy-server-configmap.go#L64 and also in the webhook reconciliation
I think we should update the ClusterAdmissionPolicy
status always in the PolicyServer
reconciler loop if possible. The watch should just trigger PolicyServer
reconciler loop or handle orphan policies that don't have PolicyServer
Or try to find a way to sync both reconciliers, so we don't have race conditions. I don't know if it's possible to always run them in a specific order
Changes to ClusterAdmissionPolicies such as performing a Delete trigger the watch of policyserver_controller.go twice (or so it seems). This doesn't break the functionality, but makes the operations fail on the second run, printing inocuous, albeit bad looking errors.
Example 1
Instantiate an ClusterAdmissionPolicy into the default PolicyServer. Wait for it to be
active
. Then delete it. Notice that there is an error:The policy gets
Delete
d. The change in the object triggers the watch in policyserver_controller.go, exiting through https://github.com/viccuad/kubewarden-controller/blob/a0b9e9397b0d8f66ac17cdebe26ab773a20a4ced/controllers/policies/policyserver_controller.go#L200-L206. No change has been performed in the policy inside the watch, we go through directly to the PolicyServer reconciler loop, that will take care of de-registering the policy webhooks and removing the Finalizer. Still, we get an error.Example 2
Instantiate a ClusterAdmissionPolicy that specifies an nonexistent PolicyServer. Wait for it to be unschedulable. Then, delete it (with the code from PR https://github.com/kubewarden/kubewarden-controller/pull/135). Notice that there are 2 errors:
Error 1 seems analogous to example 1. Error 2, after debugging, seems to appear while going through https://github.com/viccuad/kubewarden-controller/blob/a0b9e9397b0d8f66ac17cdebe26ab773a20a4ced/controllers/policies/policyserver_controller.go#L181-L187. There, we are just removing the Finalizer. Which to me indicates that we go once, remove the finalizer, which triggers the watch on policyserver_controller.go and the watch of the Kubernetes controller for garbage collecting. The garbage collection happens, fully removing the policy Object from etcd. Our watch goes through, seeing a policy object with DeletionTimestamp and unschedulable, tries to remove the Finalizer, and tada, there's no object.