Closed VaishnaviHire closed 5 months ago
/cc @israel-hdez @danielezonca @etirelli @LaVLaS
After reading this ADR, the main question I have is:
Why do we reconcile everything back to the default state?
Shouldn't we treat the components' manifests as a sort of "reasonable" default and allow users to adjust them according to their needs? KfDef version (v1) might have been too liberal in this regard, as it merely checked if a given resource already existed and skipped applying if that was the case. I understand that we want to have control over certain aspects, but this proposal suggests we might have gone too far with this approach.
Perhaps we should start considering what changes users can make (or define what is not changeable) to the resources we create and allow them to do so.
I do agree with letting users update the resources, I am just not sure how we can limit the scope of the updates. So if we do not reconcile to the desired state, there can be usecase when users do multiple config updates and result in an error state. Maybe we can add a way for user to return to defaults if they are in such error state?
Kfdef actually provided no way to customize, it used to reconcile everything back to the default state.
I do agree with letting users update the resources, I am just not sure how we can limit the scope of the updates
Maybe we could find a simple way to define paths/fields of a particular resource that we will skip during reconciliation. We can for example remove such a field from the patch operation when applying the original, "desired" state. Or flipping this around - reconciliation will apply a patch with only the fields that we really want to make sure are not changed.
So if we do not reconcile to the desired state, there can be usecase when users do multiple config updates and result in an error state. Maybe we can add a way for user to return to defaults if they are in such error state?
I am not sure about this one. My line of thinking is more of what kind of changes would result in "unsupported configuration". For example, we might want to make sure our images are always reconciled. But things like resources are very dependent on the target env and usage patterns so this could be skipped from reconciliation.
Kfdef actually provided no way to customize, it used to reconcile everything back to the default state.
I have only worked on the plugin part of KfDef and in there when you were creating so-called "raw resources" it was skipping apply during reconciliation when the resource already existed. So I stand corrected :)
OTOH it lets you provide your own overlays to customize the kustomize :) I am not saying this is the way to go, but it was possible without blowing up the KfDef
API.
Why do we reconcile everything back to the default state?
Shouldn't we treat the components' manifests as a sort of "reasonable" default and allow users to adjust them according to their needs?
I know we can go this way to be quite flexible. However, I'm more on the side that it is the operator CRDs that should expose the needed settings, rather than instructing the users to do it themselves.
IMHO, the more we allow for such flexibility, the more the operator becomes less valuable -- i.e. why not just provide the manifests to the user?
Personally, I'm quite sold to the explanation in operator framework website:
With Operators you can stop treating an application as a collection of primitives like Pods, Deployments, Services or ConfigMaps, but instead as a single object that only exposes the knobs that make sense for the application.
So, the general idea that this ADR is proposing is good to me.
Why do we reconcile everything back to the default state? Shouldn't we treat the components' manifests as a sort of "reasonable" default and allow users to adjust them according to their needs?
I know we can go this way to be quite flexible. However, I'm more on the side that it is the operator CRDs that should expose the needed settings, rather than instructing the users to do it themselves.
IMHO, the more we allow for such flexibility, the more the operator becomes less valuable -- i.e. why not just provide the manifests to the user? Personally, I'm quite sold to the explanation in operator framework website:
With Operators you can stop treating an application as a collection of primitives like Pods, Deployments, Services or ConfigMaps, but instead as a single object that only exposes the knobs that make sense for the application.
So, the general idea that this ADR is proposing is good to me.
Treating a component as a whole sounds like the right thing to do, I am not opposing that, nor do I suggest going back to "using primitives". I fear that CRD can become bloated with different levels of abstraction. That's why I started thinking that perhaps we can open up certain aspects of our building blocks for the users to fine-tune instead of exposing those in our CRDs.
Is the suggested approach based on the assumption that the "user" is someone not knowledgable in the k8s domain, so instead of letting them fine-tune certain "low-level" settings, we are going to expose those low-level settings in some other place? I wonder if by doing so aren't we exactly exposing knobs?
perhaps we can open up certain aspects of our building blocks for the users to fine-tune
But, then, aren't these building blocks the primitives? I think this confuses me.
so instead of letting them fine-tune certain "low-level" settings, we are going to expose those low-level settings in some other place? I wonder if by doing so aren't we exactly exposing knobs?
On this point I agree with you that the different high/low level settings look weird. However, I do think resource management is still within the scope of an operator. IMO, it depends on how much an operator is planned to evolve on its capability level. A level-1 operator would require users to tune these low level settings, while a level-5 operator would relief the user from resource tuning.
AFAIK, we don't have the data to automate the resource tuning, so IMO we need the user to do it for now. I think this is quite clear, and the discussion seems to be around on how the user is supposed to do it.
Maybe, a middle ground is to expose in the CRD a high level setting like kserve.resources.expectedDeployedModels
and similarly modelmesh.resources.expectedDeployedModels
so that the operator does some math and adjusts the low-level settings of all impacted components (kserve, modelmesh, odh-model-controller, ¿etc?). But I don't think we have data to do this.
Maybe, a middle ground is to expose in the CRD a high level setting like
kserve.resources.expectedDeployedModels
and similarlymodelmesh.resources.expectedDeployedModels
so that the operator does some math and adjusts the low-level settings of all impacted components (kserve, modelmesh, odh-model-controller, ¿etc?). But I don't think we have data to do this.
Will this still require us to expose component specific details?
Should we consider components to have a minimal
and default
config(overlays). This will not be limited to only resources.
spec.components.Kserve.config: minimal
OR
spec.components.Kserve.config: default
@israel-hdez However I think this will require us to have data for minimal resources required by a component.
@VaishnaviHire
Will this still require us to expose component specific details? Should we consider components to have a
minimal
anddefault
config(overlays). This will not be limited to only resources.
My rationale tells me yes.
Should we consider components to have a minimal and default config(overlays). This will not be limited to only resources.
spec.components.Kserve.config: minimal OR spec.components.Kserve.config: default
I'm not sure overlays are suitable, because resources are something can need fine-tuning, and we already had got this feedback from the field.
This ADR is proposing to use the DataScienceCluster
for that, and I think it is fine. Then, my understanding is that @bartoszmajsak is suggesting that it is better to whitelist fields of the resources so that the user can manually change those fields... this is also fine! My personal preference would be to do it in the DSC just because my understanding is that centralized configs is one of the added benefits of using operators (and this is what this ADR is proposing, so this is looking fine to me!).
In model serving we just need a mechanism to configure the resources. Whether this ADR is refined, or dropped in favor of whitelisting fields, or something else... that's more up to you :-)
I will update the ADR to reflect the discussion to provide list of fields to whitelist across all components. See PR for Kserve specific changes.
I have updated the adr to reflect implementation to whitelist fields across components cc: @bartoszmajsak
@VaishnaviHire is this ready for merging?
Hi Yes, Just need lgtm from component owners
Description
This ADR is a subset of feature requests defined in #23 .
How Has This Been Tested?
Merge criteria: