Closed squeedee closed 6 months ago
We were able to find a more "stable" form of identity
Can you describe what identity you started out with and what you settled on eventually? I reckon it would help to exemplify the issue.
// Non-deterministic IDs will result in the rapid deletion and creation of child resources.
to mean deterministic for the data passed to this
reconcile
Deterministic within the context of a single reconcile request, but it would be odd for the value to change between different reconcile requests.
IdentifyChild
is called for every known and desired child during the reconcile. The result is used to determine if a desired child needs to be created because it doesn't exist, update an existing resource, or delete a no longer desired resource.
Deterministic within the context of a single reconcile request, but it would be odd for the value to change between different reconcile requests.
During such time the "knownChild" and it's counterpart on cluster can diverge.
We see the known child, we replace it with an update, but the key for the "update" does not match because some status on cluster changed. I know its a pretty wild edge case, but we were keying on what we knew about the resource in KnownChild and kept it consistant for the duration of the reconcile, and yet, it changed. It required us to have an intimate awareness of the entire machinery for ChildSet.
To try and add clarity:
During such time the "knownChild" and it's counterpart on cluster can diverge.
This is inherently true for all reconcilers. Either the change is detected and the reconciled resource is reenqueued normally, or the update fails and the request is retried with the updated child.
we were keying on what we knew about the resource in KnownChild and kept it consistant for the duration of the reconcile, and yet, it changed
Can you explain more about the scenario?
If another process is modifying the child resource, you need to be robust and handle it, or fail gracefully. If that process is modifying the identity that's harder to handle. Depending on the context it may result in that resource being orphaned or deleted as no longer desired.
If you need to defend against a hostile process an admission webhook for the child resource can prevent the identity from being removed.
Child.rm.Manage - lists children, filters, and it's a newer version than seen in "knownChildren" above.
ahh. Ok, this issue something that bugged me about the initial implementation as it's a bit inefficient, but didn't think it would cause issues. We can update the Manager to accept a list of known children so it doesn't re-list children. The update will fail if the underlying resource was mutated, as usual.
Yah we're already conditioned to look for RV changes, but this was a "trying to create but you gave it a name and generate-name" because it was now being treated as a create. It took a lot of tracer logs for us to work this out, because we were working on the false assumption that the "data in" didn't change for the duration of the core reconciler methods, namely "DesiredChildren" and "IdentifyChild"
Describe the bug
oc
oc
) and produce desired childrendc
(one-to-one for this example)Because the ResourceManager lists and filters the resources via the client again, at the point of
.Manage
- it loads a possibly mutated resource and uses that for that lastIdentifyChild
This leads to surprising results, because I take the guidance in the docs:
to mean deterministic for the data passed to this
reconcile
We were able to find a more "stable" form of identity, however it took a long time to discover the issue because of our assumptions about the data passed into the reconcile.