operator-framework / operator-lifecycle-manager

A management framework for extending Kubernetes with Operators
https://olm.operatorframework.io
Apache License 2.0
1.7k stars 543 forks source link

PullSecret management for operators and their workload #2682

Open thetechnick opened 2 years ago

thetechnick commented 2 years ago

Type of question

Best practice, general context/help :)

Question

What did you do? We want to deploy a large operator with multiple dependencies from a private repository, the structure looks like this:

+--------------+         +--------------+
|Private Repo 1|         |Private Repo 2|
+----^---------+         +----^---------+
     |                        |
+----+----+              +----+----+<------------------------+
|Catalog 1|              |Catalog 2|                         |
+----^----+              +----^----+<--------+               |
     |                        |              |               |
+----+----+              +----+-----+   +----+-----+   +-----+----+
|Top level|  dependency  |Operator 1|   |Operator 2|   |Operator 3|
|Operator +------------->+----+-----+   +--^---+---+   +---^------+
+---------+                   | dependency |   |dependency |
                              +------------+   +-----------+

Now it's easy for us to add pullSecrets to the CatalogSources of Catalog 1 & 2, as described here: https://olm.operatorframework.io/docs/tasks/make-catalog-available-on-cluster/#using-registry-images-that-require-authentication-as-catalogbundleoperatoroperand-images

But we have a hard time to figure out how to best setup the pull secrets for the operand workload itself.

  1. We cannot use the cluster global pull secret, because we already have differently scoped credentials for the same registry (quay.io) included. https://docs.openshift.com/container-platform/4.7/openshift_images/managing_images/using-image-pull-secrets.html#images-update-global-pull-secret_using-image-pull-secrets Even if we could use the global pull secret, we would very much prefer to have the pull secrets scoped to just a single namespace/stack, as we want to deliver operator stacks, like this multiple times into the same cluster, which makes managing these secrets in a central place a bit tricky.

  2. Every Operator get's it's own ServiceAccount to isolate RBAC, so patching the default ServiceAccount will not give us pull permissions for every operand.

  3. Patching every Operator in the dependency chain to specify imagePullSecrets as part of their deployment specs within their CSV and to ensure that these Operators deploy all their pods with this set explicitly is very prone to errors and a lot of work.

What did you expect to see? A simple UX to handle private registries via OLM. e.g. by being able to specify pull secrets as part of a Subscription, similar to a CatalogSource, so the pullSecrets are added to ServiceAccount created by OLM?

What did you see instead? Under which circumstances? Manual patching after installation or massive amount of work to handle pullSecrets within a whole product chain.

Environment

Additional context Ref, similar question: https://github.com/operator-framework/operator-lifecycle-manager/issues/2307

nb-ohad commented 2 years ago

@thetechnick in your example you are suggesting specifying the pull secret via the subscription, but that is also problematic as the subscriptions for the dependent operators (in all levels of the dependency chain) are created by OLM itself (as part of dependency resolution) which mean we have no way to specify a pull secret in them.

One way to solve this can be to provide a way to instruct OLM to copy the pull secret information from subscription to subscription throughout the dependency chain.

An alternative way, to handle the entire problem, can be to have a namespace scoped global pull secret (same as the global one, but scoped to a single namespace), which will take effect for all pull operations inside the namespace. I believe this can solve 90% (which include all of the common use cases)

dmesser commented 2 years ago

@thetechnick did you attempt to put in entries in the global pull secret scoped to a particular namespace in the respective registry, e.g. quay.io/somenamespace/

thetechnick commented 2 years ago

@dmesser Can you point me to any documentation for this? I could not find any hint about sub-scoping pull credentials in the global pull secret, but I didn't try it out yet. :) https://docs.openshift.com/container-platform/4.7/openshift_images/managing_images/using-image-pull-secrets.html

dmesser commented 2 years ago

@thetechnick check out the current 4.9 docs for coverage: https://docs.openshift.com/container-platform/4.9/openshift_images/managing_images/using-image-pull-secrets.html#images-allow-pods-to-reference-images-from-secure-registries_using-image-pull-secrets

thetechnick commented 2 years ago

@dmesser Thank you very much! Didn't know this is possible. That's super useful, but not quite solving our case here.

When we deploy multiple of these private stacks via automation, we have to add and remove multiple of these secrets to the global pull secret, it would be way easier for us to manage if we could use pull secrets scoped to a namespace/installation. What we don't like about patching the global pull secret in this case:

dmesser commented 2 years ago

@thetechnick Yeah, makes a lot of sense. In this case just attach pull secrets to local service accounts in the namespace.

thetechnick commented 2 years ago

@dmesser Yes you are right, we had the same idea yesterday. @nb-ohad is looking into that now. It still requires a change to every single bundle throughout their stack, hardcoding them to use a specific secret.

As OLM is managing and abstracting the installation of operators I would expect OLM to offer something to manage these pull secrets in a standardized fashion, without package authors having to document a hardcoded secret name in their bundles.

Maybe we can just document the hardcoded pull Secret reference as best practice for OLM v1, but it would be really nice if the new OLM v2 APIs would provide a better UX.

dmesser commented 2 years ago

@thetechnick We discussed in the past what we could do with the current separation of controllers, the options weren't great (distribute the catalog Secret in all watch namespaces of the operator).

dmesser commented 2 years ago

We've looked at this closer and there is probably a good middle ground we can land on in propagating the the secrets to the operator controller pods, since their definition is something that OLM is owning. See https://issues.redhat.com/browse/OLM-2457 for further details.