operator-framework / operator-lifecycle-manager

A management framework for extending Kubernetes with Operators
https://olm.operatorframework.io
Apache License 2.0
1.72k stars 545 forks source link

CatalogSource Pod Tolerations and Selector #2452

Open perdasilva opened 2 years ago

perdasilva commented 2 years ago

Feature Request

Is your feature request related to a problem? Please describe. Downstream in OCP-land, the marketplace operator deployment is being configured to run on the correct nodes, but the CatalogSource pods don't get the same tolerations and node selectors.

It would be nice if CatalogSource pods could be allocated to specific nodes in my cluster through node selectors and tolerations.

Describe the solution you'd like I'd like to be able to specify the node selectors and tolerations in a configuration section of the CatalogSource spec in a similar way as is done with Subscription

For example:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: operatorhubio-catalog
  namespace: olm
spec:
  sourceType: grpc
  image: quay.io/operatorhubio/catalog:latest
  displayName: Community Operators
  publisher: OperatorHub.io
  config:
    nodeSelector:
        node-role.kubernetes.io/master: ""
    tolerations:
    - key: "node-role.kubernetes.io/master"
      operator: Exists
      effect: "NoSchedule"

The downside here is that it makes the CatalogSource a bit of a leaky abstraction by floating these pod concerns to the top. If we decide on a podless implementation of the spec, these cease to make any sense. Having said that, if it were the case, the fields could just be ignored while they are being deprecated. Not nice, but doesn't get in the way of users.

perdasilva commented 2 years ago

I've started to work on these changes (https://github.com/operator-framework/api/pull/173). I've set the PR to WIP while we align on this thread.

anik120 commented 2 years ago

The downside here is that it makes the CatalogSource a bit of a leaky abstraction by floating these pod concerns to the top

We do this frequently with our APIs though. For example, the spec.secrets field is essentially a leaky abstraction which floats SA concerns to the top. So this is not something new we'll be doing at least.

In fact, aren't all kube apis doing the same thing too? Pod "concerns" are also floated to Deployment, i.e you can specify taints and toleration in the Deployment for pods to adhere to.

njhale commented 2 years ago

In fact, aren't all kube apis doing the same thing too? Pod "concerns" are also floated to Deployment, i.e you can specify taints and toleration in the Deployment for pods to adhere to.

Deployments always result in Pods, whileCatalogSources may be implemented in other ways; i.e. what does a nodeSelector mean for a ConfigMap backed CatalogSource.

I'm playing devils advocate here, and I think this sort of thing is usually solved with union/sum types.

perdasilva commented 2 years ago

Would injecting this information into the controller environment via the downward api be a possible solution? (this does force things one way - so we'd still need an avenue for turning this off, maybe via flag)

perdasilva commented 2 years ago

@njhale just for my understanding, when we say union types, we mean something like accepting different structures for a particular attribute right? E.g. having the cross cutting attributes at the top level, then specific configurations for the different deployment options. For instance,

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: operatorhubio-catalog
  namespace: olm
spec:
  # sourceType: grpc # deprecated
  # image: quay.io/operatorhubio/catalog:latest # deprecated
  displayName: Community Operators
  publisher: OperatorHub.io
  source:
    grpc:
      image: quay.io/operatorhubio/catalog:latest
      nodeSelector:
          node-role.kubernetes.io/master: ""
      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: Exists
        effect: "NoSchedule"

(I haven't looked into the spec enough to know if this is accurate, but I'm more wondering about the meaning behind union types in this context.)

P.S. I think this would be a really nice way to go. It might just be a little more painful.

Jamstah commented 1 year ago

@perdasilva as OpenShift moves closer to multi-arch clusters we're now looking at using this to make sure our catalog pods get scheduled onto nodes of a supported architecture, which really means we need affinity too.

Looking here, it seems as though you managed to get as far as updating the API, but not updating the catalog source reconciler to pass through the configuration to the pods themselves (looking here), is that right?

Jamstah commented 1 year ago

Actually, this did get done here: https://github.com/operator-framework/operator-lifecycle-manager/pull/2512

I expect this issue can be closed.