knative / serving-operator

DEPRECATED: Development continues in https://github.com/knative/operator/
Apache License 2.0
39 stars 45 forks source link

Discussion: Operator cli installation steps #39

Open trshafer opened 5 years ago

trshafer commented 5 years ago

There are two fundamental pieces to consider:

  1. Ease of installation. From the 2019-06-24 meeting notes: "Plain install, it should just work"
  2. Ease of customization. Having the operator just work when installed, might lead to unexpected results if the proper upfront customization isn't handled.

Option 1

This is the current flow.

The user must apply the CRD and then the operator checks for the CR and applies the default one if one does not automatically exist. https://github.com/knative/serving-operator/blob/6c0272eca46470454567e78aafbfd53cf7a8f53c/pkg/controller/knativeserving/knativeserving_controller.go#L298-L300

One disadvantage of this flow is there is no way to install the operator in the cluster and not have the knative installed, essentially have the operator in a wait state. We could modify the KnativeServing CRD to have an enabled flag, which supports an deactivated operator.

kubectl apply -f config/crds/serving_v1alpha1_knativeserving_crd.yaml
Deploy the image such as ko apply -f config/

If the user wants to apply a non-default CR, the flow is:

kubectl apply -f config/crds/serving_v1alpha1_knativeserving_crd.yaml
kubectl apply -f config/crds/serving_v1alpha1_knativeserving_cr.yaml // mutate this file, or apply a separate cr.
Deploy the image such as `ko apply -f config/

Option 2

Another option is to have the operator install the CRD, but the user installs the CR. Because the operator now requires a second step to "activate" it, this makes it easier to have the operator live in a cluster without being activated. This is also a nice flow if the user wishes to customize the CR before applying it as there is an explicit step to now start installing.

Deploy the image such as `ko apply -f config/`
kubectl apply -f config/crds/serving_v1alpha1_knativeserving_cr.yaml

Option 3

The operator will install everything for the user.

Deploy the image such as `ko apply -f config/`

Option 4

The user needs to apply both the CRD and the CR.

kubectl apply -f config/crds/serving_v1alpha1_knativeserving_crd.yaml
Then deploy the image such as `ko apply -f config/`
kubectl apply -f config/crds/serving_v1alpha1_knativeserving_cr.yaml
greghaynes commented 5 years ago

Do we know of any examples of operators performing the CRD installation themselves? My cursory googling indicated most operators seem to follow option 4 which, IMO, is better than the current design if we always create a default install when no CR exists. Mainly, I think its a must that a user can prevent the operator from starting installation until they have provided their desired configuration.

jcrossley3 commented 5 years ago

FWIW, we used to follow option 4, and I don't disagree that it can lead to a less "surprising" experience, but our internal user feedback said it was too complicated to install both the operator and the custom resource. :man_shrugging:

greghaynes commented 5 years ago

ISTM if we want to simplify that install process we could require users to set a flag on the CR which enables the installation. i.e. the controller can install both the CRD and CR but then a user has to set enable: true on the CR before the controller actually does anything.

bbrowning commented 5 years ago

Another thing that influenced the choice of option 1 that we have today is the idea that users should get a working setup out of the box with zero config. And, that users should be able to change that setup as desired as a "Day 2" task. The goal being to eliminate having to make configuration choices from the outset just to get a working cluster AND ensuring that no configuration choice (whether by defaults or provided by the user) is permanent and can be changed at a later time.

bbrowning commented 5 years ago

I will also say that if it becomes possible to run multiple Knative control planes in a single cluster, then Option 4 would become my preference. However, since that is not possible today, Option 1 is my preference purely from an ease-of-install standpoint. Although I'd say we could really get some hybrid of Option 1 and Option 3 where there's a single kubectl apply command to install Knative Serving - that would apply yaml to define the CRD and deploy the operator with the appropriate RBAC/service accounts. Then the operator would auto-install its CR if one isn't already found in the cluster.

nimakaviani commented 5 years ago

Looks like Weaveworks Flux's Helm Operator settles on Option 2. Particularly here where it applies the operator and the CRD but leaves the CR (which can involve version selection and other tuning) to the user .

I have to admit Option 2 is my preferred choice too, since it sits somewhere in between.

evankanderson commented 5 years ago

I'd like to agree with bbrowning's points in https://github.com/knative/serving-operator/issues/39#issuecomment-506438597

ensuring that no configuration choice (whether by defaults or provided by the user) is permanent and can be changed at a later time.

(emphasis added)

It's highly unlikely that anyone will get the correct settings "out of the box" when they install the operator; in particular, use of the software will shift over time (either due to scaling or additional requirements), so correct settings for day 30 will not be the same as for day 365.

I think this means I'm in favor of option 1. Practically, until CRD application is transactional, we may need a 2-pass system to first create the CRD and then to create an instance of the custom resource. Operators could then delete the custom resource, if desired, or modify it (for example) to restrict the set of namespaces it operated on.

trshafer commented 5 years ago

If we support Option 1, does that also mean we support a way to install the operator without it automatically installing knative-serving? Or is the idea that if the operator is installed, then knative should be installed or the operator is in an ERROR state.

apiVersion: serving.knative.dev/v1alpha1
kind: KnativeServing
metadata:
  name: knative-serving
spec:
  enabled: false
evankanderson commented 5 years ago

IIUC (and I might be missing something), "install the operator" actually encompasses all of the following (we've just added a turtle to the middle of the stack):

  1. Defining a "knative-operator" custom resource type (installing a CRD)
  2. Installing a controller for the "knative-operator" custom resource type
  3. Creating an instance of the "knative-operator" custom resource

It seems like a sufficiently determined operator could remove the item for #3 from the yaml (possibly assisted by Kustomize and/or our use of labels), or delete the instance of the custom resource type after it was created. I would hope that deleting the custom resource would have the same effect as not having created the resource at all.

greghaynes commented 5 years ago

Technically I think having a user updating an existing knative installation would work but I think there are some practical issues that come up:

This is why I have a preference for some way a user can install this operator and say 'but don't do anything to install knative until I tell you how to do it'. I'd be perfectly happy if this is opt-in though (a flag which defaults to true on a CRD we supply by default).

nimakaviani commented 5 years ago

+1 to this:

It can be difficult for a consumer to distinguish between a working installation (default) and an installation which works how they want it (their preferred configuration). Again, in the CI context, this means I may need some special guarding to know not to run against an installation just because I am told it is healthy - I need to know its healthy with my configuration.

vincent-pli commented 5 years ago

I think a flag disable or enable the cr automatic installation is necessary. Think about case of private cluster, customer must define the registry before cr creation. can not image everything works OOB

trshafer commented 5 years ago

@vincent-pli The current flow (Option 1) supports this flow. The user creates the KnativeServing CRD, then creates the KnativeServing CR with a custom registry configuration. Then they finally apply the operator yaml. The operator will first look for a KnativeServing CR and use that instead of creating a new one. An enabled flag is useful if the user wants to be able to install the operator but have the operator NOOP.

I like @greghaynes suggestion of having the operator install by default but the user could explicitly noop the operator with a flag that is by default disabled (aka operator by default installs).

trshafer commented 5 years ago

If there's a flag called no-install or similar, should this actively delete the knative-serving resources, or should it just make the operator noop, not delete or install?

One possible issue is deleting the resources also deletes the knative-serving namespace, which I think will delete the CR because that also exists in the knative-serving namespace. I need to double check this.