kcp-dev / contrib-tmc

An experimental add-on readding some Kubernetes compute APIs and impement transparent multi-cluster scheduling
Apache License 2.0
5 stars 3 forks source link

Sync kcp "KUBECONFIG"s to workloads #126

Open adambkaplan opened 2 years ago

adambkaplan commented 2 years ago

Is your feature request related to a problem? Please describe.

Many "off the shelf" controllers and kubernetes client consumers assume that they receive a valid KUBECONFIG from the cluster's service account. This is fine if the controller creates objects or updates status. However, if the controller/client deletes objects, the state is not synced properly. Rather than deleting the object on KCP, the syncer will attempt to re-reconcile the state of the "deleted" object.

Describe the solution you'd like

Provide a mechanism where a kubeconfig (or equivalent) can be injected into a workload that is tied to a corresponding KCP service account, and allow this mechanism to be used by standard k8s workloads (Deployment, CronJob, etc.)

Describe alternatives you've considered

Alternative - sync deletes from workload clusters to KCP. Unclear if this is feasible or compatible with KCP's architecture.

Additional context

This issue was discovered when trying to configure the Tekton Operator's pruner on workload clusters whose Tekton PipelineRun objects are synced via KCP. The pruner is not a single workload, but rather a set of CronJobs that are deployed to all namespaces on a cluster. Admins can use namespace labels to tune the deployed CronJob - the operator configuration provides the defaults.

We would like to replicate the CronJob on KCP - but it appears for this to work the container that does the pruning needs to prune objects on KCP, not on the workload cluster.

ncdc commented 2 years ago

Note, we already do this for deployments in the deployment mutator. We'll need something similar for the various other things that are pod-speccable.

adambkaplan commented 2 years ago

Note, we already do this for deployments in the deployment mutator. We'll need something similar for the various other things that are pod-speccable.

@ncdc is this mutator behavior documented anywhere? I'm hopeful we can demonstrate a proof of concept with a single replica Deployment.

ncdc commented 2 years ago

@adambkaplan I don't believe it's documented, but the code is https://github.com/kcp-dev/kcp/blob/716d71056c8494b5583205936c94c1ec68828570/pkg/syncer/spec/mutators/deployment.go.

If you create a deployment in kcp, when it's synced to a real cluster, the deployment in the real cluster will have

  1. Overridden environment variables
    1. KUBERNETES_SERVICE_PORT points to kcp, not the real cluster
    2. KUBERNETES_SERVICE_PORT_HTTPS points to kcp, not the real cluster
    3. KUBERNETES_SERVICE_HOST points to kcp, not the real cluster
  2. Overridden service account volume
    1. /var/run/secrets/kubernetes.io/serviceaccount/token comes from a secret that kcp syncs down to the real cluster (from the ServiceAccount token secret in kcp)
    2. /var/run/secrets/kubernetes.io/serviceaccount/namespace comes from a secret that kcp syncs down to the real cluster (from the ServiceAccount token secret in kcp)
    3. /var/run/secrets/kubernetes.io/serviceaccount/ca.crt comes from a configmap kcp-root-ca.crt that kcp creates in the namespace in the real cluster
  3. Any downward API environment variable references to metadata.namespace are converted to use the kcp namespace, not the real cluster namespace

Still not done yet: kcp-dev/contrib-tmc#125

sbose78 commented 2 years ago

Example: https://gist.github.com/sbose78/6881d94915cc37461f607511f59ba0c1

adambkaplan commented 2 years ago

/label appstudiokcp

openshift-ci[bot] commented 2 years ago

@adambkaplan: The label(s) /label appstudiokcp cannot be applied. These labels are supported: platform/aws, platform/azure, platform/baremetal, platform/google, platform/libvirt, platform/openstack, ga, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, px-approved, docs-approved, qe-approved, downstream-change-needed, approved, backport-risk-assessed, cherry-pick-approved

In response to [this](https://github.com/kcp-dev/contrib-tmc/issues/126): >/label appstudiokcp Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
adambkaplan commented 1 year ago

@ncdc is there a notion that the mutator can be generalized to things that are "pod-speccable", or any custom resource?

Tekton TaskRuns come to mind here. If a TaskRun needs to create other Kubernetes resources (to say, deploy an application), it may need a KCP ServiceAccount.

ncdc commented 1 year ago

It's not implemented at the moment, no.

If you have a specific use case you are trying to solve (e.g. with TaskRuns), please describe it in detail & we'll see what we can do. Thanks!

adambkaplan commented 1 year ago

@ncdc revisiting this item.

One need that Pipeline Service needs to do is regularly prune Tekton PipelineRun and TaskRun objects, re-implementing the Pruner feature of the Tekton Operator (see https://tekton.dev/vault/operator-main/tektonconfig/#pruner and https://docs.openshift.com/container-platform/4.11/cicd/pipelines/automatic-pruning-taskrun-pipelinerun.html). Without pruning, the service risks destabilizing physical clusters via etcd object exhaustion. Pruning also helps ensure we reap persistent volume claims. This is most effectively done by deploying a CronJob.

mjudeikis commented 9 months ago

/transfer-issue contrib-tmc