Closed mresvanis closed 2 years ago
Thanks for the PR @mresvanis , could you do so against https://github.com/kubernetes-sigs/node-feature-discovery-operator ?
/uncc dagrayvid
/assign
Thanks for the PR @mresvanis , could you do so against https://github.com/kubernetes-sigs/node-feature-discovery-operator ?
@ArangoGutierrez thank you for your comment, but if I'm not mistaken the upstream project currently doesn't have this issue. Specifically, it doesn't set up any predicates for the NFD CR events.
/test all
Thanks for this PR @mresvanis , can you test that this change doesn't add unwanted logs to the controller? @lenahorsley
@mresvanis, the reconciliation loop always returns Requeue=True upon status update when not all the components of the CR has been fully reconciled. Can you point what is the exact scenario that causes the reconciliation loop to stop? Maybe we can fix those specific cases in the reconciliation loop instead of in the predicate?
@yevgeny-shnaidman the case I bumped into, when Requeue=false
, is when adding the finalizer (so it's a .metadata
update that leads to the update event not being requeued).
As you suggested, I can change the line above to Requeue=true
, instead of removing the predicate. Whatever makes more sense in your view.
IMHO having this predicate is indeed an optimization, which could lead to further issues down the road and I'm not so sure that it's worth it. But I don't have enough experience with the NFD operator to be able to support this opinion deeply.
@ArangoGutierrez
Thanks for this PR @mresvanis , can you test that this change doesn't add unwanted logs to the controller?
We are currently running a fork of the Cluster NFD operator with this PR included in production and as far as I can tell, we haven't seen more logs than the usual ones.
@yevgeny-shnaidman the case I bumped into, when
Requeue=false
, is when adding the finalizer (so it's a.metadata
update that leads to the update event not being requeued).As you suggested, I can change the line above to
Requeue=true
, instead of removing the predicate. Whatever makes more sense in your view. Please do, I think it is better to change the reconcile loop instead of the predicate. Just wondering why we have not seen such cases before...IMHO having this predicate is indeed an optimization, which could lead to further issues down the road and I'm not so sure that it's worth it. But I don't have enough experience with the NFD operator to be able to support this opinion deeply.
@ArangoGutierrez
Thanks for this PR @mresvanis , can you test that this change doesn't add unwanted logs to the controller?
We are currently running a fork of the Cluster NFD operator with this PR included in production and as far as I can tell, we haven't seen more logs than the usual ones.
/lgtm
@mresvanis: all tests passed!
Full PR test history. Your PR dashboard.
/hold
/unhold
4.11 Build version: 4.11.0-0.nightly-2022-07-26-041421 NFD version: quay.io/lenahorsley/nfd-operator-bundle:4.11.20220726-pr265 Kube version: v1.24.0+9546431
4.12 Build version: 4.12.0-0.nightly-2022-07-26-082655 NFD version: quay.io/lenahorsley/nfd-operator-bundle:4.12.20220726-pr265 Kube version: v1.24.0+1ff0731
Tests performed (AWS 3 master/3 worker nodes):
Note: for both bundle versions, NFD had ONE master node and THREE worker nodes
All tests were executed successfully.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: ArangoGutierrez, mresvanis, yevgeny-shnaidman
The full list of commands accepted by this bot can be found here.
The pull request process is described here
This PR removes the update predicate that checks for the change of the
.metadata.generation
field of the NFD CR, in order to be able to fully reconcile a created NFD CR after its finalizer and status is updated.Context
According to the Kubernetes documentation,
.metadata
or.status
updates do not bump the.metadata.generation
of an object. This leads to the NFD CR not being fully reconciled after its creation, i.e. after the addition of its finalizer and the update of its.status
, there is no additional update reconciliation event queued. The cause is the update predicate used to check for the difference between the.metadata.generation
field of the old and new NFD objects.kubectl
vs Kubernetes Go clientThis issue cannot be reproduced using
kubectl
(not sure yet why that is), but it is reproducible by using a Kubernetes Go client (e.g. by creating a new NFD CR via anothercontroller-runtime
-based operator).Resources
Signed-off-by: Michail Resvanis mresvani@redhat.com