Closed VedRatan closed 1 year ago
@VedRatan We need to make sure we can watch on two labels combined with an OR.
oh sure
If not possible we will have to use a single label (containing either a duration or a full date/time).
If not possible we will have to use a single label (containing either a duration or a full date/time).
I need some time to figure out the right decision on both implementation strategies.
It’s always combined with AND https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
Oh, got it.
Any more changes required? @eddycharly
We’ll see during implementation.
What is the plan to make this feature stateless so that failure of one of the cleanup replicas does not cause disruption?
What is the plan to make this feature stateless so that failure of one of the cleanup replicas does not cause disruption?
What do you mean ? Can you elaborate ?
Assuming this reconciliation will be handled by the cleanup controller, it allows multiple replicas. If there are two replicas and the leader fails, will the other replica be able to handle all the cleanup duties for these labeled resources without losing some?
Yes
Ok. It's probably worth having a statement in this KDP to acknowledge this as a requirement: The solution must allow for proper tracking and accurate cleanup even upon controller replica failure or failover.
The desired state can be established at any point in time from the observed state. Reconciliation should always be possible and accurate.
Yes, but this needs to be explicitly tested to ensure such a failure can be tolerated. For example, if a replica failure occurs just before a resource is scheduled to be cleaned up and recovery or take-over by another replica should, within a minimal amount of time, see that resource has "missed" its cleanup interval and should be removed.
Such a test sounds difficult to implement, however the design will guarantee that if a resource failed to be deleted it will be picked up by the next leader.
Just saying this should be tested during development to ensure it actually happens as intended. We can handle automated tests at a later point.
Thanks @eddycharly .
@JimBugwadia
We don't emit events, there's no policy to attach events to, do you think we should/can emit events ?
For metrics, we will have standard controller metrics and we will create additional metrics next week.
We also have logs when a resource is deleted, nothing more for reporting.
Anything you want to add ?
Fixed the inconsistencies, and replaced the annotations with labels and similar fixes.
Closes #44