KnativeServing operator should enforce the configuration defined in the CR

garron commented 4 years ago

This issue was split out of #138 to separate the internal operator state management (handled in that issue) from discussion on how the operator should go about ensuring that it's configuration is in effect.

Consensus at the operator working group meeting was that the KnativeServing CR should be the source of truth and this issue is intended to cover work in making that so.

trshafer commented 4 years ago

/assign garron

duglin commented 4 years ago

I was going to open an issue but I think this one might be related, if not let me know and I'll open a new one.

Overall: where should a user modify things if they want to change their Knative install?

Today, w/o the operator, if someone wants to do something like modify the template of the URL generated for their ksvc they would modify the config-network configMap's domainTemplate value. However, with the operator any updates made directly to that config-map would eventually be overwritten by the operator's reconciliation loop.

This means that we have two different sources of truth based on how Knative was installed. This also means that the user needs to know which install mechanism was used so they can modify the correct data to get the results they want. This is not a good user experience for people manually managing Knative, nor is it good for tool authors who will want their tools to work against all Knative installs easily (and with minimal conditional logic).

It would seem to me that either: 1 - the operator should become a required component of KnServing and THE ONLY WAY to install it, and thus all configurations must be done via the CR. This would then put the burden on the operator to ensure that ALL possible configuration knobs are available via the CR since the low-level kube resources would then be off-limits to manual/tooling edits. And our docs should only point to the operator for configuring things. 2 - treat the operator as an optional tool. This would then mean it would need to probably support some kind of "policy" for how to deal with each resource in KnServing it manages. For example, within IBM we've had an operator-like system for a while and we allow each resource to have a policy that indicates whether the operator should a) just make sure the resource exists, and as long as it does then the operator won't touch it or override any edits made by the user, or b) make sure the resource looks exactly like the operator expects and any edits made by the user will be overwritten.

I think either way can work, we just need to pick a direction. But the current state doesn't seem ideal.

Thoughts?

/cc @houshengbo @mattmoor

markusthoemmes commented 4 years ago

I tend to disagree with this being a black and white decision.

If Knative Serving was installed without the operator, changes should definitely be made through ConfigMap edits. If Knative Serving was installed with the operator, changes should be made through the operator's API surface which is the KnativeServing CR currently.

Edits to the configuration are in the operator's persona, not the developers. I feel like the operator should know how Knative Serving was installed. The rest, I feel, is documentation of the specific platform.

duglin commented 4 years ago

In theory I hear ya.... but in practice I keep thinking about situations where someone is trying to create a tool, or even just a simple bash script, that they want to hand to someone else and in order to make sure it "just works" means they have to code up all config changes twice and wrapper them with an if-statement each time. That's why I'm leaning towards option 2 - it allows for people to only need to know about the configMaps.

jcrossley3 commented 4 years ago

If the choice is between 1 and 2, I would choose 1. The policy that would enable 2a runs counter to the declarative (spec->status) nature of k8s.

duglin commented 4 years ago

Can you elaborate on how it doesn't adhere to kube? With 2 the operator just sets the initial values for certain resources. I don't think kube has anything to say about whether it must then revert any changes on those resources. I think it's up to the semantics of the operator to decide that.

jcrossley3 commented 4 years ago

Maybe "runs counter" and "doesn't adhere" are too strong. It's just that with 2a you still have two sources of truth, right? The spec for the initial defaults and whatever someone might manually change them to. Of course, we can make the semantics of the operator whatever we like, but I personally prefer the single source of truth in the first option you suggested.

duglin commented 4 years ago

It might depend on our point of view, but for each resource there is only one source of truth. It is true that depending on the resource (and it's policy) its source of truth might be in a kn configMap or might be in the operator's CR/manifest. But, I'm hoping that for configMaps that we expect users to be able to edit, the source of truth would be in kn's configMaps not in the operator.

But obviously 1 works too :-) I'm just not excited by today's model.

markusthoemmes commented 4 years ago

If you're using an operator, the surface area of what you can do should be encoded by the operator. If we encourage users to reach around we imo have failed to provide a good API for the human-operator and need to ask ourselves why we want to use an operator in the first place.

Allowing to reach-around also makes certain decisions around how to handle these config maps in an upgrade impossible.

knative / serving-operator

KnativeServing operator should enforce the configuration defined in the CR #163