"Descope" OLM delivered operators

Feature Request

Note: This issue mostly consists of select snippets from a document @ecordell drafted a while back. I've curated the important bits to frame the problem for further discussion.

Scoping, Descoping, What?

In short, when we talk about "scope" in OLM, we're talking about how OLM handles the privileges granted to an operator and its users with respect to the namespaces an admin configures it to install; i.e. the opinionated behavior of RBAC generation around ClusterServiceVersions, their InstallModes, and OperatorGroups.

Note: see the OperatorGroup docs for more details.

Problem

APIs in a kubernetes cluster are cluster-scoped. They are visible via discovery to any user that wishes to see them. Even operators that agree on a particular GVK may have differences of opinion in how those objects should be admitted to a cluster, or how conversion between API versions should happen.

With Operator Framework, we want to build an ecosystem of high-quality operators that can be re-used across different projects, whether they’re in the same cluster or not. But re-using operators compounds the scoping problems within a cluster - it increases the likelihood that more than one “opinion” about an API exists in the cluster.

History

When OLM was first written, CRDs defined only the existence of a GVK in a cluster. Operators developed for OLM could only install in a namespace, watching that namespace - this delivered on the self-service, operational-encoding story of operators. The same operator could be installed in every namespace of a cluster.

Privilege escalation became a concern - since operators are run with a service account in a namespace, anyone with the ability to create workloads in that namespace could escalate to the permissions of the operator. This made service provider/consumer relationships a difficult sell for operators in OLM.

At the same time, CRDs continued to add features. With version schemas and admission and conversion webhooks, CRDs no longer simply registered a global name for a type, and operators in separate namespaces had lots of options to interfere with one another if they shared the same CRD. OLM also expanded to support APIServices in addition to operators based on CRDs, and so required a notion of cluster-wide operators.

To address these concerns, a notion of scoping operators was introduced via the OperatorGroup object. An OperatorGroup would specify a set of namespaces within a cluster in which all operators installed would share the same scope. OLM would ensure that only one operator within a namespace owned a particular CRD to avoid collision problems, and more installation options were provided to allow separating operators from their managed workloads.

Proposal

Entirely remove the notion of scoping from OLM; i.e. "descope".

This means that:

Only one operator that provides an API -- e.g. via CRD or APIService -- may be installed simultaneously
OLM stops being opinionated about how privileges are granted to operators and users; i.e. OperatorGroups, InstallModes, and today's generated RBAC are deprecated and removed
Operator authors and admins use more traditional means -- e.g. (cluster)Roles/RoleBindings -- to declare cluster privileges for both operators and their users

It does not mean that:

Every operator needs to have permission to do its job in every namespace
Every user in a cluster needs to have permission to use the operator's APIs
Only one controller pod needs to run for that api in a cluster
Only one controller can be installed to manage an API (i.e. ingress-style)
More sophisticated privilege generation cannot be used; e.g. FeatureBinding

Design!?

The specifics of how we will achieve descoping will need an enhancement proposal to be made clear. Such a proposal will, at minimum, need to cover:

Deprecation of related APIs; e.g. OperatorGroups, ClusterServiceVersions, etc
Migration of existing operator content; i.e. How does an author define a scoped -> descoped operator upgrade? what does a cluster admin need to do?
If/how existing "scope user stories" can be achieved after "descoping"; e.g. config management
How does Operator Discovery work without a first-class concept of scoping?

operator-framework / operator-lifecycle-manager