karmada-io / karmada

Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration
https://karmada.io
Apache License 2.0
4.42k stars 875 forks source link

Failed to join cluster in Pull-Mode #4262

Open maaft opened 10 months ago

maaft commented 10 months ago

What happened:

  1. karmada installed in host-cluster and api-server is publicly exposed under and is reachable and functional (can do kubectl apply etc)
  2. in on-premise cluster (private networking; but access to internet) I try to join the host-cluster:
karmadactl register <domain> --token bsbmb3.<secret> --discovery-token-ca-cert-hash sha256:bb5593ab60547232a812109d87d48d3f16834bb68fa99035e64c8beeea442dd3
[preflight] Running pre-flight checks
[prefligt] All pre-flight checks were passed
[karmada-agent-start] Waiting to perform the TLS Bootstrap
[karmada-agent-start] Waiting to construct karmada-agent kubeconfig
Unable to connect to the server: dial tcp: lookup karmada-apiserver.karmada-system.svc.cluster.local on 127.0.0.53:53: server misbehaving

The join fails.

What you expected to happen:

I expect the member cluster to be added successfully.

RainbowMango commented 10 months ago

Unable to connect to the server: dial tcp: lookup karmada-apiserver.karmada-system.svc.cluster.local on 127.0.0.53:53: server misbehaving

Can you access karmada-apiserver on the machine where running the karmadactl register?

maaft commented 10 months ago

yes, I can access it (via kubectl) from the machine where I'm running karmadactl register.

I'm not sure, why karmadactl even tries to resolve karmada-apiserver.karmada-system.svc.cluster.local on my local DNS, since I specified that karmada-api server must be found at <domain>?

Maybe this helps:

karmadactl register karmada-apiserver.example.com --token qffp4d.<secret> --discovery-token-ca-cert-hash sha256:bb5593ab60547232a812109d87d48d3f16834bb68fa99035e64c8beeea442dd3 --v 3
I1120 09:48:30.999314   16579 register.go:299] Registering cluster. cluster name: default
I1120 09:48:30.999360   16579 register.go:300] Registering cluster. cluster namespace: karmada-cluster
[preflight] Running pre-flight checks
I1120 09:48:30.999377   16579 register.go:422] Validating the existence of file /etc/karmada/bootstrap-karmada-agent.conf
I1120 09:48:30.999391   16579 register.go:422] Validating the existence of file /etc/karmada/karmada-agent.conf
I1120 09:48:30.999402   16579 register.go:422] Validating the existence of file /etc/karmada/pki/ca.crt
[prefligt] All pre-flight checks were passed
[karmada-agent-start] Waiting to perform the TLS Bootstrap
I1120 09:48:31.007058   16579 register.go:765] [discovery] Created cluster-info discovery client, requesting info from "karmada-apiserver.example.com"
I1120 09:48:31.063367   16579 register.go:803] [discovery] Requesting info from "karmada-apiserver.example.com" again to validate TLS against the pinned public key
I1120 09:48:31.111340   16579 register.go:820] [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "karmada-apiserver.example.com"
I1120 09:48:31.111377   16579 register.go:437] [discovery] Using provided TLSBootstrapToken as authentication credentials for the join process
I1120 09:48:31.111399   16579 register.go:448] [discovery] writing bootstrap karmada-agent config file at /etc/karmada/bootstrap-karmada-agent.conf
I1120 09:48:31.112644   16579 register.go:457] [discovery] writing CA certificate at /etc/karmada/pki/ca.crt
[karmada-agent-start] Waiting to construct karmada-agent kubeconfig
Unable to connect to the server: dial tcp: lookup karmada-apiserver.karmada-system.svc.cluster.local on 127.0.0.53:53: server misbehaving
maaft commented 10 months ago

I did some further debugging:

karmadaClusterInfo in pgk/karmadactl/register/register.go:340 contains this

{
  "LocationOfOrigin": "",
  "server": "https://karmada-apiserver.karmada-system.svc.cluster.local:5443",
  "certificate-authority-data": "LS0tLS1C..."
}

Shouldn't there be karmada-apiserver.example.com instead?

maaft commented 10 months ago

Okay, I think I found the issue:

  1. karmada retrieves control plane info using bootstrap token
  2. this info contains karmada-apiserver.karmada-system.svc.cluster.local and not karmada-apiserver.example.com
  3. agent kubeconfig is build with that info
  4. agent tries to connect with that config and fails.

Can I configure the karmada apiserver such that it will tell bootstrapping clients to use karmada-apiserver.example.com ?

RainbowMango commented 10 months ago

cc @lonelyCZ @chaosi-zju for help

liangyuanpeng commented 10 months ago

@maaft

This server info is using karmada-apiserver.config, are you using karmada-apiserver.example.com by karmada-apiserver.config.

You can try to the blew steps:

  1. change karmada-apiserver.example.com to karmada-apiserver.example.com for karmada-apiserver.config
  2. karmadactl create token with karmada-apiserver.config
  3. karmadactl registry again
everpeace commented 8 months ago

karmada-apiserver.karmada-system.svc.cluster.local comes from default/cluster-info configmap in the karmada controlplane in bootstrapping TLS certificate (CA certificate of the karmada control plane apiserver).

I think karmada register should not use the endpoint from cluster-info, but used the endpoint(bootstrap endpoint) which user passed to the command.

It is because the endpoint in cluster-info is sometimes unreachable from member cluster as the issue reported. But, it can guarantee bootstrap endpoint can be reachable (generating kubeconfig step is after bootstrapping TLS (CA) certificate).

I think #4562 can be a fix.