Provide a means to identify if a cluster is self hosted

fabriziopandini commented 2 years ago

User Story

As a developer, I would like to have a way to detect if a Cluster is self-hosted so I can use this info to improve connectivity from CAPI controllers

Detailed Description

CAPI controllers use the <cluster>-kubeconfig to connect to the workload cluster to inspect nodes, etcd etc. etc.

This connection goes through the external load balancer, and this is not ideal in the case of self-hosted clusters given that we can use "in cluster" connectivity to avoid unnecessary network hoops.

However, from a CAPI code PoV there is no well-defined way to detect "this cluster is a management cluster" or "this cluster is a self-hosted cluster".

This issue is about discussing options to provide this capability.

Anything else you would like to add:

This could be relevant for a couple of issues

/kind feature

sbueringer commented 2 years ago

This connection goes through the external load balancer, and this is not ideal in the case of self-hosted clusters given that we can use "in cluster" connectivity to avoid unnecessary network hoops.

Is the assumption correct (across providers) that the client credentials stored in <cluster>-kubeconfig are also accepted by the apiserver when connecting via kubernetes.default.svc?

(Just asking because I'm not sure what kind of authn solutions folks have build with Kubernetes, especially for managed control planes)

kfox1111 commented 2 years ago

+1. Some clusters cant even talk directly to the external load balancer. azure lb's of type internal cant be reached from the same node.

sbueringer commented 2 years ago

+1. Some clusters cant even talk directly to the external load balancer. azure lb's of type internal cant be reached from the same node.

I think that should be fine. We would use kubernetes.default.svc as a server URL. This should work everywhere because otherwise our controllers wouldn't work (it's used by the in cluster config we use for our mgmt cluster client).

I was specifically asking about the client credentials that we would take from the -kubeconfig secret

kfox1111 commented 2 years ago

@sbueringer, yeah, was speaking more generally to the raised issue. Identifying the cluster is self hosted would be important to self managing an azure cluster for example, as it has that internal lb restriction.

vincepri commented 2 years ago

We should start collecting possible ways we can identify a cluster being a self-managed one (management+workload+self)

vincepri commented 2 years ago

/priority important-soon

kfox1111 commented 2 years ago

Feels like there is a chicken and the egg problem here... but if its just an optimization, then the controller could check for its own pod in the "remote" cluster. Then it would know its within the same cluster. If it has to be established before you can talk to the "remote" cluster, maybe its better left as a driver specific implementation?

sbueringer commented 2 years ago

Some context:

When we access workload clusters today we do this by generating a controller-runtime Client via the ClusterCacheTracker.
So we could extend the ClusterCacheTracker to automatically detect if the workload cluster is the current cluster the controller is running on and then generate the Client accordingly.

Just to start with some ideas on how to identify that:

Try to find the controller Pod in the "workload cluster". One variant
- Expose Pod name/namespace/uid via downwards API: https://kubernetes.io/docs/concepts/workloads/pods/downward-api/#available-fields
- client.Get for Pod with name/namespace + check uid => selfhosted or not
- Note: See #6836 for a POC implementation how this could be done
Check for an cluster identifier in the workload cluster via about API
- Deploy a cluster id (name/namespace is not good enough) in the workload cluster on creation via about api issue
- Check for about api property in workload cluster if it matches to the one of the cluster in which the controller is running => selfhosted
- Note: Could be build on top of sig-multicluster's Cluster ID: Link. I think at this time it's not stable enough and requires us to deploy an additional CRD.

CecileRobertMichon commented 2 years ago

+1 this would be a good improvement

I'll take a look at your POC @sbueringer, thanks for starting it

vincepri commented 2 years ago

Another alternative idea would be potentially to check if the remote cluster has a Lease that matches the leader election information we have for the controller?

sbueringer commented 2 years ago

Another alternative idea would be potentially to check if the remote cluster has a Lease that matches the leader election information we have for the controller?

As far as I can tell:

we have to pass in LeaderElectionID into ClusterCacheTracker
we have to either calculate the LeaderElectionNamespace the same way as CR or calculate it in main.go and hand it over to Manager & ClusterCacheTracker

With that information we are able to retrieve the lease.

Lease looks ~ like this:

apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
  name: controller-leader-election-capi
  namespace: capi-system
spec:
  acquireTime: "2022-07-08T05:29:52.759035Z"
  holderIdentity: sbuerin_a4cbb424-9d8c-469b-adc0-e017ad05047f
  leaseDurationSeconds: 15
  leaseTransitions: 0
  renewTime: "2022-07-08T05:30:32.871202Z"

The holderIdentity is calculated in NewResourceLock roughly via "hostname_" + uuid.NewUUID(). It is not accessible for us after it has been calculated nor can it be set via NewManager today. I think this requires a change in CR and the most realistic one is to add a new field to ctrl.Options called LeaderElectionIdentity that we can calculate in main.go and then pass into the Manager and the ClusterCacheTracker.

I think this should work, but it will of course only work if the leader election is enabled. We already have cases where we disable it for debugging purposes like Tilt (otherwise the controller crashes if we wait to long at a break point during debugging). So it could make it harder to be able to live-debug this case.

Checking for if the current pod exists in the "workload cluster" seems simpler to me, mostly because it doesn't introduce a dependency to leader election.

vincepri commented 2 years ago

@sbueringer should we proceed with the above solution proposed in v1.2? We can start by having it behind a feature gate?

sbueringer commented 2 years ago

Given that it mitigates the issue(s) mentioned above it sounds fine to me to include it in >=v1.2.1 behind a feature gate.

(edited)

fabriziopandini commented 2 years ago

I'm +1 to check for if the current pod exists in the "workload cluster" because it is simple and easily back-portable. Then we can eventually consider more complex/elegant solution in a follow-up iteration.

TBH, I don't think that we need a feature flag for this behavior because If I got the WIP right if one of POD_NAMESPACE, POD_NAME, POD_UID are empty (or we drop those variables), it reverts to old behaviour

fabriziopandini commented 2 years ago

/assign @sbueringer

enxebre commented 2 years ago

CAPI controllers use the -kubeconfig to connect to the workload cluster to inspect nodes, etcd etc. etc. This connection goes through the external load balancer, and this is not ideal in the case of self-hosted clusters given that we can use "in cluster" connectivity to avoid unnecessary network hoops.

I don't think it's really implicit that it uses external load balancer but rather whatever the known secret might contain, which is dictated by any given control plane implementation. Which then is consumed by the controllers via a "weak" contract based on a name assumption. Makes sense?

Instead of coding in terms of self-hosted vs not as in this PR we could angle this in a way we provide the appropriate contractual primitives to enable consumers to expand their use cases. I can think of several personas that would be consumers for different kubeconfigs: service-network-kubeconfig (internal components) localhost-kubeconfig (kube api server sidecars) admin-kubeconfig (external cluster consumers)

Is up to the CP implementation generate those and that's the contract for consumers, e.g workload cluster controllers (machine controller) can use .status.serviceKubeconfig.service-network-kubeconfig/admin-kubeconfig.

Thoughts?

CecileRobertMichon commented 2 years ago

@enxebre your suggestion seems like a good improvement but I don't think I understand how it would solve the bug described in the original issue. In the proposed fix PR only changes the KubeadmControlPlane controller, not the cluster controller so that KCP preflight checks don't fail when the generated kubeconfig has an internal LB apiserver address and the KCP controller itself is running on one of the control plane pods. Are you saying that KCP would need to change the kubeconfig it generates for this particular case?

Is up to the CP implementation generate those and that's the contract for consumers, e.g workload cluster controllers (machine controller)

I guess I'm confused by this statement since in the case of the problem we're trying to solve the CP implementation both generates and uses the kubeconfig to connect to the workload cluster

enxebre commented 2 years ago

so that KCP preflight checks don't fail when the generated kubeconfig has an internal LB apiserver address and the KCP controller itself is running on one of the control plane pods.

Ah I see, I missed that. Makes sense. Thanks @CecileRobertMichon

Are you saying that KCP would need to change the kubeconfig it generates for this particular case?

yeh kind of, I'm suggesting long term we might want think about a model/contract letting multiple kubeconfigs scoped to different purposes to be created and so them be consumed as they best fit. But you're right for this particular case doesn't seem to help much.

fwiw the current PR lgtm I have no objections.

sbueringer commented 2 years ago

yeh kind of, I'm suggesting long term we might want think about a model/contract letting multiple kubeconfigs scoped to different purposes to be created and so them be consumed as they best fit. But you're right for this particular case doesn't seem to help much.

@enxebre Makes sense. I think we can continue this part of the discussion as part of https://github.com/kubernetes-sigs/cluster-api/issues/3661?

kubernetes-sigs / cluster-api

Provide a means to identify if a cluster is self hosted #6831