operator-framework / enhancements

Apache License 2.0
9 stars 40 forks source link

Automatic resource pruning EP #109

Closed ryantking closed 2 years ago

ryantking commented 2 years ago

Signed-off-by: Ryan King ryking@redhat.com

jmrodri commented 2 years ago

@ryantking the API proposed in Appendix A was easier for me to rationalize and understand. I was able to easily picture a workflow for that API. The API proposed in Appendix B was definitely more flexible but the workflow felt more awkward to me. Like it was more complicated but for little gain in the short term.

Basically I could not think of a scenario that was not doable with the API from Appendix A.

fgiloux commented 2 years ago

@ryantking One thing that I am missing is that operators are often used in multi-tenant environments. Each tenant may want to have, if not a separate retention policy, at least the possibility to parameterize the retention policy. Let take a CI/CD pipeline operator as an example. You may have a Pipeline custom resource, which creates pods for the processing. In case of nightly runs these pods (with the logs) may be kept for 3 days. In case of the execution of a Pipeline for a final release you may want to keep them for longer. Another scenario is that teams may want to have different retention periods for different projects. With the approach in appendix A my understanding is that the IsPruneableFunc would need to be re-registered each time a new Pipeline is created possibly using a closure holding the retention period for each Pipeline. Is that the way you see it?

fgiloux commented 2 years ago

@ryantking I have had a look at the latest version of the enhancement proposal and it looks good to me. The only things that I have slight concerns with are:

ryantking commented 2 years ago

@fgiloux As far as the tenancy aspect goes, in the example of nightly pipelines vs release pipelines, I think that you could solve it with an IsPruneable func?

func pipelineIsPruneable(obj client.Object) error {
    pipeline, ok := obj.(*pipelinesv1.Pipeline)
    // check assertion
    if isRelease(pipeline) and age(pipeline) < 3 * time.Week {
        return ErrUnprunable
    }
    // additional logic
    return nil
}

Would that pseudo-code work in theory? Another option for multi-tenancy is to create multiple Pruner objects, maybe one per project? I'll integrate the registry type from appendix B into A so each pruner can have its own set of IsPruneable funcs if that is desirable. I'm imagining a pattern similar to http.ServeMux where new http.Server objects start with http.DefaultServeMux, but the user can swap that out to a custom one.

Re: logging, I'll make sure the proposal incorporates the logging changes you made to the current implementation.

fgiloux commented 2 years ago

@ryantking I think it is important to consider personas:

ryantking commented 2 years ago

@fgiloux I'm still struggling to understand which specific design choices in the proposed API limit the use cases you are laying out. If you are giving me use cases that you want to make sure are covered, then can you present them as user stories so I can add them to the EP? If there are use cases that you think are not possible with the proposed API then I think we should talk offline in greater detail to figure out what changes I need to make. The way I look at it, there are two real ways the user implements prune logic:

  1. The IsPruneable function registry. This allows the operator author to define per-GVK when a resource is eligible for pruning. This function can know about namespaces or read configuration from a different resource or do basically anything that can be determined from the object itself, the type of the object, and whatever information it can retrieve from external sources including the cluster itself.
  2. The StrategyFunc: This allows the operator author to define how many pruneable objects should be removed, and how to select the specific objects from the set.

A concrete example would be an IsPruneable function that determines if a pipeline object is a release or nightly job then a strategy that keeps the latest 32 nightly runs. You could even create two pruners, one for releases that keeps only the supported versions' pipelines and one for nightly that keeps the last 30 days worth.

Let me know your thoughts or feel free to reach out if you want to talk in more detail.

fgiloux commented 2 years ago

@ryantking I am fine with the proposal. Mind that I have little karma on this repository as I am not part of the Operator Framework organisation.