ODH-ADR-Operator-0005: Whitelist component fields for user customizations

VaishnaviHire commented 8 months ago

Description

This ADR is a subset of feature requests defined in #23 .

How Has This Been Tested?

Merge criteria:

[ ] The commits are squashed in a cohesive manner and have meaningful messages.
[ ] Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
[ ] The developer has manually tested the changes and verified that the changes work

VaishnaviHire commented 8 months ago

/cc @israel-hdez @danielezonca @etirelli @LaVLaS

VaishnaviHire commented 8 months ago

After reading this ADR, the main question I have is:

Why do we reconcile everything back to the default state?

Shouldn't we treat the components' manifests as a sort of "reasonable" default and allow users to adjust them according to their needs? KfDef version (v1) might have been too liberal in this regard, as it merely checked if a given resource already existed and skipped applying if that was the case. I understand that we want to have control over certain aspects, but this proposal suggests we might have gone too far with this approach.

Perhaps we should start considering what changes users can make (or define what is not changeable) to the resources we create and allow them to do so.

I do agree with letting users update the resources, I am just not sure how we can limit the scope of the updates. So if we do not reconcile to the desired state, there can be usecase when users do multiple config updates and result in an error state. Maybe we can add a way for user to return to defaults if they are in such error state?

Kfdef actually provided no way to customize, it used to reconcile everything back to the default state.

bartoszmajsak commented 8 months ago

I do agree with letting users update the resources, I am just not sure how we can limit the scope of the updates

Maybe we could find a simple way to define paths/fields of a particular resource that we will skip during reconciliation. We can for example remove such a field from the patch operation when applying the original, "desired" state. Or flipping this around - reconciliation will apply a patch with only the fields that we really want to make sure are not changed.

So if we do not reconcile to the desired state, there can be usecase when users do multiple config updates and result in an error state. Maybe we can add a way for user to return to defaults if they are in such error state?

I am not sure about this one. My line of thinking is more of what kind of changes would result in "unsupported configuration". For example, we might want to make sure our images are always reconciled. But things like resources are very dependent on the target env and usage patterns so this could be skipped from reconciliation.

Kfdef actually provided no way to customize, it used to reconcile everything back to the default state.

I have only worked on the plugin part of KfDef and in there when you were creating so-called "raw resources" it was skipping apply during reconciliation when the resource already existed. So I stand corrected :)

OTOH it lets you provide your own overlays to customize the kustomize :) I am not saying this is the way to go, but it was possible without blowing up the KfDef API.

israel-hdez commented 8 months ago

Why do we reconcile everything back to the default state?

Shouldn't we treat the components' manifests as a sort of "reasonable" default and allow users to adjust them according to their needs?

I know we can go this way to be quite flexible. However, I'm more on the side that it is the operator CRDs that should expose the needed settings, rather than instructing the users to do it themselves.

IMHO, the more we allow for such flexibility, the more the operator becomes less valuable -- i.e. why not just provide the manifests to the user?

Personally, I'm quite sold to the explanation in operator framework website:

With Operators you can stop treating an application as a collection of primitives like Pods, Deployments, Services or ConfigMaps, but instead as a single object that only exposes the knobs that make sense for the application.

So, the general idea that this ADR is proposing is good to me.

bartoszmajsak commented 8 months ago

Why do we reconcile everything back to the default state? Shouldn't we treat the components' manifests as a sort of "reasonable" default and allow users to adjust them according to their needs?

I know we can go this way to be quite flexible. However, I'm more on the side that it is the operator CRDs that should expose the needed settings, rather than instructing the users to do it themselves.

IMHO, the more we allow for such flexibility, the more the operator becomes less valuable -- i.e. why not just provide the manifests to the user? Personally, I'm quite sold to the explanation in operator framework website:

With Operators you can stop treating an application as a collection of primitives like Pods, Deployments, Services or ConfigMaps, but instead as a single object that only exposes the knobs that make sense for the application.

So, the general idea that this ADR is proposing is good to me.

Treating a component as a whole sounds like the right thing to do, I am not opposing that, nor do I suggest going back to "using primitives". I fear that CRD can become bloated with different levels of abstraction. That's why I started thinking that perhaps we can open up certain aspects of our building blocks for the users to fine-tune instead of exposing those in our CRDs.

Is the suggested approach based on the assumption that the "user" is someone not knowledgable in the k8s domain, so instead of letting them fine-tune certain "low-level" settings, we are going to expose those low-level settings in some other place? I wonder if by doing so aren't we exactly exposing knobs?

israel-hdez commented 8 months ago

perhaps we can open up certain aspects of our building blocks for the users to fine-tune

But, then, aren't these building blocks the primitives? I think this confuses me.

so instead of letting them fine-tune certain "low-level" settings, we are going to expose those low-level settings in some other place? I wonder if by doing so aren't we exactly exposing knobs?

On this point I agree with you that the different high/low level settings look weird. However, I do think resource management is still within the scope of an operator. IMO, it depends on how much an operator is planned to evolve on its capability level. A level-1 operator would require users to tune these low level settings, while a level-5 operator would relief the user from resource tuning.

AFAIK, we don't have the data to automate the resource tuning, so IMO we need the user to do it for now. I think this is quite clear, and the discussion seems to be around on how the user is supposed to do it.

Maybe, a middle ground is to expose in the CRD a high level setting like kserve.resources.expectedDeployedModels and similarly modelmesh.resources.expectedDeployedModels so that the operator does some math and adjusts the low-level settings of all impacted components (kserve, modelmesh, odh-model-controller, ¿etc?). But I don't think we have data to do this.

VaishnaviHire commented 8 months ago

Maybe, a middle ground is to expose in the CRD a high level setting like kserve.resources.expectedDeployedModels and similarly modelmesh.resources.expectedDeployedModels so that the operator does some math and adjusts the low-level settings of all impacted components (kserve, modelmesh, odh-model-controller, ¿etc?). But I don't think we have data to do this.

Will this still require us to expose component specific details? Should we consider components to have a minimal and default config(overlays). This will not be limited to only resources.

spec.components.Kserve.config: minimal
OR
spec.components.Kserve.config: default

@israel-hdez However I think this will require us to have data for minimal resources required by a component.

israel-hdez commented 8 months ago

@VaishnaviHire

Will this still require us to expose component specific details? Should we consider components to have a minimal and default config(overlays). This will not be limited to only resources.

My rationale tells me yes.

Should we consider components to have a minimal and default config(overlays). This will not be limited to only resources.
spec.components.Kserve.config: minimal
OR
spec.components.Kserve.config: default

I'm not sure overlays are suitable, because resources are something can need fine-tuning, and we already had got this feedback from the field.

This ADR is proposing to use the DataScienceCluster for that, and I think it is fine. Then, my understanding is that @bartoszmajsak is suggesting that it is better to whitelist fields of the resources so that the user can manually change those fields... this is also fine! My personal preference would be to do it in the DSC just because my understanding is that centralized configs is one of the added benefits of using operators (and this is what this ADR is proposing, so this is looking fine to me!).

In model serving we just need a mechanism to configure the resources. Whether this ADR is refined, or dropped in favor of whitelisting fields, or something else... that's more up to you :-)

VaishnaviHire commented 7 months ago

I will update the ADR to reflect the discussion to provide list of fields to whitelist across all components. See PR for Kserve specific changes.

VaishnaviHire commented 6 months ago

I have updated the adr to reflect implementation to whitelist fields across components cc: @bartoszmajsak

etirelli commented 5 months ago

@VaishnaviHire is this ready for merging?

VaishnaviHire commented 5 months ago

Hi Yes, Just need lgtm from component owners

opendatahub-io / architecture-decision-records