operator-framework / operator-lifecycle-manager

A management framework for extending Kubernetes with Operators
https://olm.operatorframework.io
Apache License 2.0
1.67k stars 540 forks source link

Pull operator image from private registry with OLM #2307

Open bszeti opened 2 years ago

bszeti commented 2 years ago

Feature Request

Is your feature request related to a problem? Please describe. When we want to deploy an operator from a private (auth required) image registry, there is no easy way to set the image pull secret for the operator Pod. The pull secrets specified in the CatalogSource spec.secrets are only used for pulling the catalog and the bundle image, but they are not used for the operator image itself. Related issues:

Describe the solution you'd like Two ideas:

bszeti commented 2 years ago

Using global pull secrets is an easy solution of course, but that's not always doable.

One workaround is to expect a pull secret in the namespace with a hardcoded name we set in the ServiceAccount. So the ServiceAccount manifest in the bundle has:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: controller-manager
imagePullSecrets:
- name: my-pull-secret

Then we create the my-pull-secret in the operator namespace (openshift-operators for cluster scoped operators), so the Pod will use that eventually. Fortunately Kubernetes accepts the ServiceAccount even if the image pull secret doesn't exists, so the same bundle can be used in case of using a global pull secret. No errors caused by the missing secret, so we can also create the secret later did we forget it.

anik120 commented 2 years ago

@bszeti olm not propagating the secrets to the namespaces that the operator is deployed in/watching(for facilitating operand images) was an explicit design decision that was made. If olm does that, cluster admins will have no way of controlling which namespaces have access to an image registry once such an operator is deployed. See the notes in https://olm.operatorframework.io/docs/tasks/make-index-available-on-cluster/#using-registry-images-that-require-authentication-as-indexbundleoperatoroperand-images that essentially states that the "workaround" you mentioned is the way to go, otherwise we have no way of letting the admin control access to which namespaces have access to which image registry.

bszeti commented 2 years ago

Hi @anik120 Thanks for the response. I understand the reasoning behind not copying secrets between namespaces, that doesn't sound good from the security point of view. I also read the linked section from the docs, but I find that confusing.

We talk about pull secrets for four different images here: catalog (1), bundle (2), operator (3) and image(s) required by a deployment created by the operator (4). The pull secret for catalog (1) and bundle (2) images can be added to the CatalogSource so those images can be pulled. But there is no solution to set the pull secret for the operator Pod itself (3). The description says "By referencing one or more pull secrets in a CatalogSource, OLM can handle placing the secrets in the operator and catalog namespace to allow installation". Based on my experience the OLM can not handle placing secrets in the operator namespace, neither can I reference a pre-created pull-secret in the operator namespace. This is why I opened this ticket.

Statement "OLM does not handle placing the secrets in target tenant namespaces for this scenario" is about the operator managed deployment's images (4), and the workaround "the pull secret can be added to the default service accounts of the target tenant namespaces" makes sense. But again, the problem is what to do for the operator image itself (3)?

For example in case of a cluster scoped operator I can create a pull secret in the "openshift-marketplace" namespace for the catalog (1) and the bundle (2) image and refer it in the CatalogSource. But when I try to install the operator from the catalog, its Pod is created in the "openshift-operators" namespace (where the Subscription is created) - ending up in an image pull error. The issue is that even if I create the pull secret in that namespace in advance, the ServiceAccount of the operator Pod doesn't exists at this point, as it's created only by the bundle at the same time when the Deployment/Pod for the operator, so I can't link the pull secret in advance.

I think the solution would be to have a "secrets" field in the Subscription resource too, similarly to the CatalogSource. So we could create the pull secret in advance in the namespace of the Subscription, so it's used for creating the operator Pod.

jnpacker commented 2 years ago

When creating a ServiceAccount with an imagePullSecret param, I noticed that at least one operator I tried overwrote the SA resource, so the imagePullSecret param was lost. Once I updated the new SA, I still had to kill the pod that was in imageBackoff to get it to use the new secret. This doesn't work well when your trying to automate activation.

It is possible I just didn't wait long enough as well. I'll do a bit more testing.

jnpacker commented 2 years ago

It is not ideal, but @bszeti suggestion, to at least correctly form the ServiceAccount to look for the secret, and let the Cluster Admin worry about how the secret gets there, might be helpful. But I'm wondering if the ServiceAccount created by the operator bundle happens outside the catalog flow.

Summary suggestion

  1. If secrets is defined for the catalog, those secrets references (imagePullSecrets: [ ]) are pushed to the operator ServiceAccount resource if one is being used in the operator directory.
  2. ClusterAdmin is responsible for delivering the Pull Secret to the operator namespace.
SOLDIERz commented 2 months ago

Still facing the Issue today with OpenShift 4.13.39 - For sometime it worked with the described setup under: https://olm.operatorframework.io/docs/tasks/make-catalog-available-on-cluster/#using-registry-images-that-require-authentication-as-indexbundleoperatoroperand-images

If the imagePullSecret is referenced in the bundle, for instance when the controller-manager image is pulled from a private registry, there is no place in the API to tell OLM to attach the imagePullSecrets. As a consequence, permissions to pull the image should be added directly to the operator Deployment’s manifest by adding the required secret name to the list deployment.spec.template.spec.imagePullSecrets. For the operator-sdk abstraction, the operator Deployment’s manifest is found under config/manager/manager.yaml. Below is an example of a controller-manager Deployment’s manifest configured with an imagePullSecret to pull container images from a private registry.

But sometimes it really doesn't render the "imagePullSecret" at all and falling back to default "imagePullSecret" from the Service Account itself ending in an "ImageBackOffPull". There is no 100% Solution inside the Operator Build to guarantee that the "imagePullSecret" is used. So the only workaround I see at the moment is to patch the ServiceAccount after the Deployment of the Operator itself to get it working in 100% of the cases.

Will there be any patch, any time soon? This Issue is open for at least 2 years.