[Docs]: Fix the inconsistencies in the cleanup proposal - Githubissues

kyverno / KDP

Kyverno Design Proposals

Apache License 2.0

16 stars 25 forks source link

[Docs]: Fix the inconsistencies in the cleanup proposal #45

Closed VedRatan closed 1 year ago

VedRatan commented 1 year ago

Fixed the inconsistencies, and replaced the annotations with labels and similar fixes.

Closes #44

eddycharly commented 1 year ago

@VedRatan We need to make sure we can watch on two labels combined with an OR.

VedRatan commented 1 year ago

oh sure

eddycharly commented 1 year ago

If not possible we will have to use a single label (containing either a duration or a full date/time).

VedRatan commented 1 year ago

If not possible we will have to use a single label (containing either a duration or a full date/time).

I need some time to figure out the right decision on both implementation strategies.

eddycharly commented 1 year ago

It’s always combined with AND https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors

VedRatan commented 1 year ago

Oh, got it.

VedRatan commented 1 year ago

Any more changes required? @eddycharly

eddycharly commented 1 year ago

We’ll see during implementation.

chipzoller commented 1 year ago

What is the plan to make this feature stateless so that failure of one of the cleanup replicas does not cause disruption?

eddycharly commented 1 year ago

What is the plan to make this feature stateless so that failure of one of the cleanup replicas does not cause disruption?

What do you mean ? Can you elaborate ?

chipzoller commented 1 year ago

Assuming this reconciliation will be handled by the cleanup controller, it allows multiple replicas. If there are two replicas and the leader fails, will the other replica be able to handle all the cleanup duties for these labeled resources without losing some?

eddycharly commented 1 year ago

Yes

chipzoller commented 1 year ago

Ok. It's probably worth having a statement in this KDP to acknowledge this as a requirement: The solution must allow for proper tracking and accurate cleanup even upon controller replica failure or failover.

eddycharly commented 1 year ago

The desired state can be established at any point in time from the observed state. Reconciliation should always be possible and accurate.

chipzoller commented 1 year ago

Yes, but this needs to be explicitly tested to ensure such a failure can be tolerated. For example, if a replica failure occurs just before a resource is scheduled to be cleaned up and recovery or take-over by another replica should, within a minimal amount of time, see that resource has "missed" its cleanup interval and should be removed.

eddycharly commented 1 year ago

Such a test sounds difficult to implement, however the design will guarantee that if a resource failed to be deleted it will be picked up by the next leader.

chipzoller commented 1 year ago

Just saying this should be tested during development to ensure it actually happens as intended. We can handle automated tests at a later point.

VedRatan commented 1 year ago

Thanks @eddycharly .

eddycharly commented 1 year ago

@JimBugwadia

We don't emit events, there's no policy to attach events to, do you think we should/can emit events ?

For metrics, we will have standard controller metrics and we will create additional metrics next week.

We also have logs when a resource is deleted, nothing more for reporting.

Anything you want to add ?