[proposal] Declarative multi-cluster deployments

stefanprodan commented 10 months ago

This proposal is for improving the delivery of apps across clusters with Timoni Bundles by allowing users to define the target clusters and environments (group of clusters) in a declarative manner. Bundles will support customising the configuration of each app based on the target group of clusters and cluster name.

Current solutions

Currently, deploying apps to multiple clusters and customising the apps configuration based on the target environment is possible with Timoni Bundles and Runtime attributes.

Bundle example:

bundle: {
    _cluster: string @timoni(runtime:string:TIMONI_CLUSTER_NAME)
    _env:     string @timoni(runtime:string:TIMONI_CLUSTER_GROUP)

    apiVersion: "v1alpha1"
    name:       "apps"
    instances: {
        "my-app": {
            module: url: "oci://registry.host/org/modules/my-app"
            namespace: "apps"
            values: {
                // common values to all clusters in all environments

                if _env == "staging" {
                    // common values to all staging clusters
                }

                if _env == "production" {
                    // common values to all production clusters

                    if _cluster == "production-gov" {
                        // cluster specific values
                    }
                }
            }
        }
    }
}

Target clusters based on exported values

To apply the bundle to all clusters, we need to export the runtime values and set the kubeconfig context for each cluster:

export TIMONI_CLUSTER_NAME=staging-eu
export TIMONI_CLUSTER_GROUP=staging
timoni bundle apply -f bundle.cue --runtime-from-env --kube-context stg-eu-central-1

export TIMONI_CLUSTER_NAME=production-eu
export TIMONI_CLUSTER_GROUP=production
timoni bundle apply -f bundle.cue --runtime-from-env --kube-context prod-eu-central-1

Target clusters based on runtime values

Another approach would be to create a ConfigMap in each cluster with the cluster name and group, and configure Timoni with a Runtime definition to read the values from each cluster.

Runtime example:

runtime: {
    apiVersion: "v1alpha1"
    name:       "cluster-info"
    values: [
        {
            query: "k8s:v1:ConfigMap:default:cluster-info"
            for: {
                "TIMONI_CLUSTER_NAME":  "obj.data.name"
                "TIMONI_CLUSTER_GROUP": "obj.data.group"
            }
        },
    ]
}

To apply the bundle on all clusters, we no longer need to export the runtime values, but we still need to set the kubeconfig context for each cluster:

timoni bundle apply -f bundle.cue --runtime runtime.cue --kube-context stg-eu-central-1

timoni bundle apply -f bundle.cue --runtime runtime.cue --kube-context prod-eu-central-1

Drawbacks

The major drawback with exporting the target cluster name and group, is that these values coming from the local environment are opaque and, they can change between executions. Also, the env vars must match the kubeconfig context, users are at risc of deploying the staging configuration on the production clusters and vice versa.

While querying the cluster for the name and group is way better than relying on env vars, these values must be set in each cluster ahead of time in a ConfigMap. Users many not be aware of the actual values set in the ConfigMaps, so they must run timoni runtime build for each cluster to check the values are correct. Also, any changes to the ConfigMaps, like renaming a cluster or moving a cluster to a different group, must be kept in sync with the conditions in the Bundle.

The major drawback of both approaches, is that we need to run the timoni bundle apply command for each cluster while passing the right --kube-context argument. When deploying to a large fleet of clusters, having to run dozens commands requires some sorts of scripting on top of Timoni CLI, which increases the complexity and is error-prone.

Proposed solution

To improve the multi-cluster deployment capabilities, the Runtime API could be extended to allow users to set the target clusters in a declarative manner.

Runtime clusters example:

runtime: {
    apiVersion: "v1alpha1"
    name:       "fleet"
    clusters: {
        "staging-eu": {
            group:       "staging"
            kubeContext: "stg-eu-central-1"
        }
        "staging-us": {
            group:       "staging"
            kubeContext: "stg-us-west-1"
        }
        "production-eu": {
            group:       "production"
            kubeContext: "prod-eu-central-1"
        }
        "production-us": {
            group:       "production"
            kubeContext: "prod-us-west-1"
        }
    }
    values: [...]
}

To apply the bundle on all clusters, users will run a single command:

timoni bundle apply -f bundle.cue -r runtime.cue --runtime-cluster="*"

With --runtime-cluster="*", Timoni will apply the bundle on each cluster, in the order defined in the Runtime definition. If the apply fails on a staging cluster, Timoni will stop the execution and not continue with production.

Timoni will automatically set the TIMONI_CLUSTER_NAME and TIMONI_CLUSTER_GROUP runtime attributes and, will change the Kubernetes context to the value specified for each cluster.

To apply the bundle on production clusters only:

timoni bundle apply -f bundle.cue -r runtime.cue --runtime-cluster="*" --runtime-group="production"

To preview changes without altering the clusters:

timoni bundle apply --dry-run --diff -f bundle.cue -r runtime.cue --runtime-cluster="*"

The bundle vet command output will change when more than one cluster is selected, so that users can review the computed bundle values for each cluster:

$ timoni bundle vet --print-value -f bundle.cue -r runtime.cue --runtime-cluster="*" 
"staging-eu": bundle: {
  // computed value of the bundle for this cluster
}
"production-eu": bundle: {
  // computed value of the bundle for this cluster
}

The bundle status and bundle delete commands will have a --runtime flag. For these two commands, we only need to read the clusters field from the runtime definition to perform the operations across clusters.

When multiple cluster are selected, the bundle build command will print the multi-doc YAML with the cluster name under the instance name comment:

---
# Instance: <instance name>
# Cluster: <cluster name>
---

All the timoni bundle commands will have the following common flags:

--runtime
--runtime-cluster (defaults to *)
--runtime-group (defaults to *)

When --runtime is not set, or when the runtime clusters field is not set, the TIMONI_CLUSTER_NAME and TIMONI_CLUSTER_GROUP runtime attributes will be set to default and, the kubeContext will default to current context or to the value set in --kube-context flag.

b4nst commented 10 months ago

This feature would be really useful for us, for multiple reasons. The main one being to avoid applying to the wrong context.

Nalum commented 10 months ago

This seems like a really nice direction, I like the structure and the flow makes sense to me. Will have a further think on it 👍

kol-ratner commented 10 months ago

Does this concept apply at all to flux-managed clusters? I have more or less committed down the flux path, and my naive interpretation is that this feature set wouldn't apply to my use case. Am I making the wrong conclusions? To clarify when I say feature set I am referring to any capability of Timoni to manage kubernetes objects as distinct from the notion of using timoni to build and push module artifacts.

stefanprodan commented 10 months ago

Does this concept apply at all to flux-managed clusters?

No, if you're using Flux, then Timoni's role is limited to generating YAMLs which you would push to the registry with flux push artifact, from there the Flux controllers running on your clusters will pull and apply the manifests.

stefanprodan / timoni