operator-framework / java-operator-sdk

Java SDK for building Kubernetes Operators
https://javaoperatorsdk.io/
Apache License 2.0
809 stars 216 forks source link

Multi Cluster Support #1308

Open bs-matil opened 2 years ago

bs-matil commented 2 years ago

As discussed in #1307 it would be nice to have cluster a context setting for each controller.

Why? Currently a operator can only a act on a single context. Traditionally that's fine as an operator often orchestrates only resources in the very cluster its deployed it self. In this scenario the operator gets its identity natively from the kubernetes context and is statically configured for its own cluster.
In contrast to that we have a use case where we manage multiple clusters with on operator which is deployed to a meta/central cluster. The operator still manages resources within this meta cluster but to instantiate the desired application it will also manage child/runtime clusters.

Example:

The control cluster manages a CRD which has been instantiated twice. The operator picks them up and deploys resources to fulfil the desired resource. To do that the operator has to deploy the application into multiple clusters which are e.g. physically separated and therefore can't have a single identity or are separated due to availability needs or to test new versions of kuberentes etc. Therefore a full replication of the resources is created in each cluster and the operator watches all of them.

Control Cluster
┌──────────────────────────────┐  Cluster A
│                              │ ┌──────────────────────────────────────────┐
│    Namespace-crds            │ │  Namespace-A            Namespace-B      │
│    ┌──────────────┐      ┌───┼─┤►┌─────────────┐       ┌─────────────┐    │
│    │              │      │   │ │ │             │       │             │    │
│    │ CRD-A        │      │   │ │ │ Deployment  │       │  Deployment │    │
│    │ CRD-B        │      │   │ │ │             │       │             │    │
│    │              │      │   │ │ └─────────────┘       └─────────────┘    │
│    └─────▲────────┘      │   │ │                                          │
│          │               │   │ └──────────────────────────────────────────┘
│          │ Reconcile     │   │
│          │               │   │  Cluster B
│     ┌────┴────────┬──────┘   │  ┌──────────────────────────────────────────┐
│     │             │          │  │  Namespace-A            Namespace-B      │
│     │  Operator   ├──────────┼──┤►┌─────────────┐       ┌─────────────┐    │
│     │             │  Create/ │  │ │             │       │             │    │
│     └─────────────┘  Reconcile  │ │ Deployment  │       │  Deployment │    │
│                              │  │ │             │       │             │    │
│                              │  │ └─────────────┘       └─────────────┘    │
│                              │  │                                          │
└──────────────────────────────┘  └──────────────────────────────────────────┘

Goals:

jmrodri commented 2 years ago

Controller runtime had an issue about multi cluster support: https://github.com/kubernetes-sigs/controller-runtime/issues/745#issuecomment-570077989 and a POC https://github.com/kubernetes-sigs/controller-runtime/pull/950

juangon commented 2 years ago

Wow, this would be a great feature. In my case we are using operators to provision applications and having the chance to select which cluster (maybe in other clouds) can be great

csviri commented 2 years ago

So thinking about this issue more, the real benefits would be:

  1. Have a out of the box access/experinece to other clusters and able to register informers that will trigger the reconciliation regarding a custom resource on a different cluster.
  2. Manage resources on different clusters in a sense that reconciler is able to make decision based on a global state (taking in mind resources on multiple clusters)

But also we have to be very careful about this, since this has a huge impact on overall architecture and the use case might target only a small subset of users. (Just think how would you model this in terms of a dependent resource? )

So I would recommend to create a prototype to see more clearly the implications, and make decision if we want to do this or not at the end.

csviri commented 2 years ago

This is a very interesting argument, why actually this makes sense in general to have: https://youtu.be/1p00SMLletY?t=1249

Although this not necessarily about the multiple clusters in one instance, but nice example how to manage resources from an other cluster and why it makes sense.

csviri commented 1 year ago

see also: https://github.com/java-operator-sdk/java-operator-sdk/issues/1817

katheris commented 3 months ago

Hey, is there any update on this feature? I work on the Strimzi Access Operator which is responsible for delivering connection details for a Kafka cluster to Kafka applications. We are looking at a usecase where you have one Kubernetes cluster hosting your Kafka cluster, and a different one hosting your Kafka applications. So the operator would only watch for CRs in a single Kubernetes cluster, but could query on Kubernetes clusters as part of the reconciliation loop.

Strimzi issue for reference.

csviri commented 3 months ago

Hi @katheris Having a generic multi cluster support is quite complex change. But having Informers watching different clusters might be something doable (your use case). Will create a separate issue with such scope.

see issue with limited scope here: https://github.com/operator-framework/java-operator-sdk/issues/2493

csviri commented 3 months ago

@katheris created the PR: https://github.com/operator-framework/java-operator-sdk/pull/2499 that includes a sample, tbh I think this covers the original request within this issue.

csviri commented 3 months ago

Just a side not having a controller managing primary resources (not just secondary resources as in the PR above) from multiple clusters is more complex to implement, but definitely doable. Pls let us know if that is needed.

katheris commented 3 months ago

For our usecase we only need to manage secondary resources in a different Kubernetes cluster, the controller will be deployed in the same cluster as both the CR it is the controller for, and any primary resources it creates.