clusterctl inside cluster in pod cannot find management cluster

steve-fraser commented 2 years ago

What steps did you take and what happened: [A clear and concise description on how to REPRODUCE the bug.]

Deploy Pod in cluster
Install vsphere provider
Generate configuration clusterctl generate cluster $(TEST_CLUSTER_NAME) \ --infrastructure vsphere \ -n $(TEST_CLUSTER_NAME) \ --control-plane-machine-count 1 \ --worker-machine-count 0 > /tmp/vsphere-test-cluster.yaml

Error: management cluster not available. Cannot auto-discover target namespace. Please specify a target namespace: invalid kubeconfig file; clusterctl requires a valid kubeconfig file to connect to the management cluster: no configuration has been provided, try setting KUBERNETES_MASTER environment variable

What did you expect to happen:

It is supposed to find the local capi installation

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster-api version: v1.1.2
Minikube/KIND version:
Kubernetes version: (use kubectl version): v1.21.8
OS (e.g. from /etc/os-release):

runner@mvm-runner-2:~$ cat /etc/os-release NAME="Ubuntu" VERSION="20.04.3 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.3 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal

/kind bug [One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

sbueringer commented 2 years ago

just fyi @Jont828

/area clusterctl

killianmuldoon commented 2 years ago

Just to clarify - are you running the clusterctl binary inside a container and pod in the Kubernetes cluster? Have you supplied it with a kubeconfig so it knows the address of the API server and has access to the certs?

steve-fraser commented 2 years ago

Just to clarify - are you running the clusterctl binary inside a container and pod in the Kubernetes cluster? Have you supplied it with a kubeconfig so it knows the address of the API server and has access to the certs?

Yes I am running the clusterctl binary inside the mgmt cluster. Specifically I am using this to run a github runner inside the management cluster. This may be more of a feature request but I thought I would to not need the kubeconfig specifically instead it would behave like the kubectl binary would. Kubectl binary will work without dropping the config into the pod by using the kube api service account and env vars.

sbueringer commented 2 years ago

Agree. I think it would be nice if clusterctl just does in cluster discovery as controllers do too.

It's not really nice if folks have to generate a kubeconfig somehow even though a Pod has the ServiceAccount credentials injected.

fabriziopandini commented 2 years ago

/milestone v1.2

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

fabriziopandini commented 2 years ago

/lifecycle frozen

Jont828 commented 2 years ago

So applications like cluster autoscaler that run in the cluster initialize their client with InClusterConfig() which gets the kubeconfig of the current cluster or returns an ErrNotInCluster. We could modify the entry code for clusterctl to detect if it's in a cluster, and if it is, go ahead and use the in cluster config. Wdyt @fabriziopandini @sbueringer?

sbueringer commented 2 years ago

I think something like the following should be fine:

if --kubeconfig is set use that one
if in cluster use in cluster config

Not sure at which point we should check for the default kubeconfig, but that might be already handled by the client-go util funcs which are usually used for this.

Jont828 commented 2 years ago

So for the in-cluster config there are two approaches we could take. We could take the approach you outlined where we check for it, and if we get an ErrNotInCluster we suppress it and move on to the default kubeconfig discovery rules. Alternatively, we could add a flag to pass in the in-cluster config and if it's set, we skip the default kubeconfig discovery rules. I think the benefit of the latter approach is that developers trying to initialize the client can handle the ErrNotInCluster cases themselves instead of having it done in the background. Wdyt?

sbueringer commented 2 years ago

I would really prefer if it's just auto-discovery and simply works out of the box without anyone having to specify a special flag for it.

Let's take a look at how kubectl does it. Afaik it automatically works in a Pod / on a local env

Jont828 commented 2 years ago

Sounds good. I'll take a look at kubectl's implementation when I get the chance and follow up here.

Jacobious52 commented 2 years ago

We're also in need of this issue. We want to use clusterctl backup in a CronJob in the management cluster. As @sbueringer mentioned I'd expect this to work like most other k8s clients using https://github.com/kubernetes/client-go/blob/master/rest/config.go#L512 that works out the box if it's running inside the cluster.

Jont828 commented 2 years ago

@sbueringer I'm happy to take a stab at this issue but I'll probably need some help since I'm not very familiar with this code.

I looked at Cluster Autoscaler and here they have some logic that uses the in cluster config. I believe their idea is to have an interface that has one implementation using a kubeconfig file and another implementation using info from the InClusterConfig().

The closest thing I can find is proxy.go where we have an interface that implements certain functions like GetConfig() and CurrentNamespace(). Do you know if we could simply make another implementation of the Proxy interface, or is there other code we would want to change as well?

Jont828 commented 2 years ago

As for kubectl, I tried running it on a pod but it seems like it doesn't work out of the box.

root@capi-test-control-plane:/# kubectl get pods -A
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:default:default" cannot list resource "pods" in API group "" at the cluster scope

It seems like we need to set up permissions for it to work, and as a result I'm not too clear on how find the relevant code in their repo.

sbueringer commented 2 years ago

I did a bit more research and I think in general the behavior of controller-runtime matches relatively closely to what we want for clusterctl: https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/client/config/config.go#L43-L61 (unfortunately except the --kubeconfig flag because clusterctl has its own)

I think the clusterctl library (cluster.New) should ideally take a *rest.Config as input parameter instead of the path to a kubeconfig file. This way it can be used in various scenarios and it doesn't depend on a literal file on a disk.

But I have no idea if a change like this is acceptable and how much refactoring this would require.

killianmuldoon commented 2 years ago

Kubernetes has a kubernetes.NewForConfig(rest.Config) function that does this - we could copy that and add a new function to cover over the case where we want to create a clusterctl client from the rest.config i.e. cluster.NewForConfig(rest.Config)

sbueringer commented 2 years ago

Maybe we can keep the external API the same, by:

keeping cluster.New as is
adding cluster.NewForConfig which takes rest.Config

And then refactoring internally behind the API that we don't have to write a temporary kubeconfig with credentials somewhere?

killianmuldoon commented 2 years ago

I'll take a look at this and see what's possible (looking at the code it's not as trivial as I thought :laughing:

/assign

Jont828 commented 2 years ago

@killianmuldoon Sounds good! I started hacking on some ideas on my end. In proxy.go it seems like if we refactor to initialize it with a *rest.Config we could rework the other functions. One thing I'm not sure about is if we have access to a kubecontext from the rest.Config. For some of the other Proxy interface functions we could try to do something like this (from cluster autoscaler):

// CurrentNamespace returns the namespace from the current context in the kubeconfig file.
func (k *inClusterProxy) CurrentNamespace() (string, error) {
    // This way assumes you've set the POD_NAMESPACE environment variable using the downward API.
    // This check has to be done first for backwards compatibility with the way InClusterConfig was originally set up
    if ns := os.Getenv("POD_NAMESPACE"); ns != "" {
        return ns, nil
    }

    // Fall back to the namespace associated with the service account token, if available
    if data, err := ioutil.ReadFile("/var/run/secrets/kubernetes.io/serviceaccount/namespace"); err == nil {
        if ns := strings.TrimSpace(string(data)); len(ns) > 0 {
            return ns, nil
        }
    }

    return "default", nil
}

fabriziopandini commented 2 years ago

/triage accepted

fabriziopandini commented 2 years ago

dropping from the milestone because not blocking, but nice to have as soon as someone has bandwidth /help

k8s-ci-robot commented 2 years ago

@fabriziopandini: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/6286): >dropping from the milestone because not blocking, but nice to have as soon as someone has bandwidth >/help Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

robbie-demuth commented 1 year ago

Our organization is looking to create vclusters in our CI/CD pipeline, which runs jobs as Kubernetes pods, and clusterctl not being able to detect it's running in a pod like kubectl is somewhat blocking us from doing so (we can use vcluster directly)

fabriziopandini commented 1 year ago

@robbie-demuth it would be great if someone from your organization could help in getting this fixed, I will be happy to help in getting this over the line

k8s-triage-robot commented 10 months ago

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

mjnovice commented 9 months ago

Any updates on this ?

fabriziopandini commented 7 months ago

/priority backlog

fabriziopandini commented 7 months ago

The Cluster API project currently lacks enough contributors to adequately respond to all issues and PRs.

We keep this issue around since folks asked about it also recently, but if no-one shows up volunteering for the job most probably we will close it at the next iteration

/triage accepted /remove-lifecycle frozen

kubernetes-sigs / cluster-api

clusterctl inside cluster in pod cannot find management cluster #6286

Guidelines