karmada-io / karmada

Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration
https://karmada.io
Apache License 2.0
4.42k stars 875 forks source link

Cant get cluster members #5143

Closed oussexist closed 2 months ago

oussexist commented 3 months ago

Hello there , I have an on-prem cluster and a cloud cluster so , if i do the init on the on-prem cluster so it'll be the controlplane host cluster lets say i want to join members with create token command it'll give me the register command and use that command on the cloud cluster , it won't be able to register since the ip adresse is for a local network , althoguht the local cluster have network acces and can ping google.com just fine , what to do ? Do I need to init on the cloud cluster and join the local one or what ? even tho if i did init on the cloud cluster and want to join the local cluster , the api server config ( karmada config file ) i should manually move it the local cluster ? and when joining give the path to the file i manually created with the same content as the cloud karmada config inited ?

P.S : I tried karmada init on the local cluster and joined it but when i try to use karmada get cluster command i get this : error: failed to list all member clusters in control plane, err: the server could not find the requested resource (get clusters.cluster.karmada.io)

RainbowMango commented 3 months ago

but when i try to use karmada get cluster command i get this : error: failed to list all member clusters in control plane, err: the server could not find the requested resource (get clusters.cluster.karmada.io)

Let's get started with this. This error is usually due to the wrong kubeconfig. You might need to change kubeconfig or context to use the karmada's kubeconfig, not the host cluster's.

oussexist commented 3 months ago

Hello there , First of all ,thanks for youe reply , so I did the karmada init on the local one cluster so i want it at the same time a member and a control plane this cant be done ? as the documentations shows all i need to do iskubectl karmada init on one of my one of my clusters isnt it ? Also i wanted to add something what if my cluster have an internal ip and a public ip the kubeconfig when initializing the cluster will use the local ône so when initializing karmada its using the internal ip even tho by trying to edit karmada config file by changing the internal ip with the public one this will leads to the certif is known only for the internal ip

RainbowMango commented 3 months ago

cc @chaosi-zju for help

oussexist commented 2 months ago

Ok so as you told me i targeted the karmada config using this command : kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get clusters and i got the local clusters that i joined before ( docker-desktop cluster , and the host one )

just joining the cloud cluster is the issue that remains .. my local cluster which is the host have a public ip assigned to its master , but even tho when i initialize karmada on it its using the local ip addr ( used in kubeconfig) , idk how i'll establish the join of the cloud member since its trying to acces always to the local ip..

chaosi-zju commented 2 months ago

Hi @oussexist, I will answer your above two questions.

First

so I did the karmada init on the local one cluster so i want it at the same time a member and a control plane this cant be done ?

I spent some time today verifying this problem for you, here are two cases:

Second

when initializing karmada its using the internal ip even tho by trying to edit karmada config file by changing the internal ip with the public one this will leads to the certif is known only for the internal ip

here is a command option for kubectl karmada or karmadactl, may be can resolve your problem, you can execute kubectl karmada init -h and see:

...
  # Specify external IPs(load balancer or HA IP) which used to sign the certificate
  karmadactl init --cert-external-ip 10.235.1.2 --cert-external-dns www.karmada.io

Options:
    --cert-external-dns='':
        the external DNS of Karmada certificate (e.g localhost,localhost.com)

    --cert-external-ip='':
        the external IP of Karmada certificate (e.g 192.168.1.2,172.16.1.2)
...

So, you can add --cert-external-ip=xx.xx.xx.xx to you kubectl karmada init command, that xx.xx.xx.xx is your public ip.

Then, even though we initialize karmada using local ip, you can modify your ip in kubeconfig to public ip, and with --cert-external-ip=xx.xx.xx.xx option, you won't encounter certificate issues

oussexist commented 2 months ago

Hello there , I want to thank you for ur time , so i did as u told me with the cert-external-ip flag and yeah the problem of ceritification is gone , but it still timesout cuz it still pings the local ip-addr. i tried to join it with both Pull and push mode and i couldn't with both cuz of network issue between the 2 clusters.

PS: Although , i tried to initialize the cluster with the public addr as an endpoint and its getting it in the kubeconfig file but the karmada always uses the internal one example :

 kubectl karmada init
I0708 10:09:21.998604    3797 deploy.go:250] kubeconfig file: , kubernetes: https://publicIP:6443
I0708 10:09:22.204984    3797 deploy.go:270] karmada apiserver ip: [the-local-ip]
.
.
.
.
.

Edited: Well , i found an advertise flag on init command that may be helpfull on that case , I am checking it rn

oussexist commented 2 months ago

Hello again @chaosi-zju , So i tried to init on the cloud cluster ( i updated the kubeconfig file using the public ip as endpoint and karmada api server uses the same public ip , using this flag : kubectl karmada init --karmada-apiserver-advertise-address=my-public-ip but its the same result after this step it gets stuck : I0708 15:49:06.409095 11751 idempotency.go:291] Service karmada-system/karmada-apiserver has been created or updated. and then this error : error: wait for Deployment(karmada-system/karmada-apiserver) rollout: context deadline exceeded: client rate limiter Wait returned an error: context deadline exceeded

This is the full log :

ubuntu@master:~$ kubectl karmada init --karmada-apiserver-advertise-address=my-public-ip
I0708 15:48:30.430370   11751 deploy.go:250] kubeconfig file: , kubernetes: https://my-public-ip:6443
I0708 15:48:30.500788   11751 deploy.go:270] karmada apiserver ip: [my-public-ip]
I0708 15:48:33.577406   11751 cert.go:246] Generate ca certificate success.
I0708 15:48:35.266299   11751 cert.go:246] Generate karmada certificate success.
I0708 15:48:36.167317   11751 cert.go:246] Generate apiserver certificate success.
I0708 15:48:39.011770   11751 cert.go:246] Generate front-proxy-ca certificate success.
I0708 15:48:40.025761   11751 cert.go:246] Generate front-proxy-client certificate success.
I0708 15:48:43.695485   11751 cert.go:246] Generate etcd-ca certificate success.
I0708 15:48:44.897303   11751 cert.go:246] Generate etcd-server certificate success.
I0708 15:48:49.582634   11751 cert.go:246] Generate etcd-client certificate success.
I0708 15:48:49.582991   11751 deploy.go:366] download crds file:https://github.com/karmada-io/karmada/releases/download/v1.10.2/crds.tar.gz
Downloading...[ 100.00% ]
Download complete.
I0708 15:48:50.143988   11751 deploy.go:608] Create karmada kubeconfig success.
I0708 15:48:50.183226   11751 idempotency.go:267] Namespace karmada-system has been created or updated.
I0708 15:48:50.311282   11751 idempotency.go:291] Service karmada-system/etcd has been created or updated.
I0708 15:48:50.311310   11751 deploy.go:432] Create etcd StatefulSets
I0708 15:49:06.346320   11751 deploy.go:441] Create karmada ApiServer Deployment
I0708 15:49:06.409095   11751 idempotency.go:291] Service karmada-system/karmada-apiserver has been created or updated.
error: wait for Deployment(karmada-system/karmada-apiserver) rollout: context deadline exceeded: client rate limiter Wait returned an error: context deadline exceeded
chaosi-zju commented 2 months ago

I0708 15:49:06.409095 11751 idempotency.go:291] Service karmada-system/karmada-apiserver has been created or updated. error: wait for Deployment(karmada-system/karmada-apiserver) rollout: context deadline exceeded: client rate limiter Wait returned an error: context deadline exceeded

Hi @oussexist, sorry to hear this error.

  1. Can you provide me with current status of karmada-apiserver, like messages from kubectl describe and kubectl logs?
  2. Do you have to use karmadactl init to install, have you tried other installation methods such as karmada-operator or helm?
oussexist commented 2 months ago

By the way , this error happens only on cloud cluster although it's nsg is just allowing all so , i dont think it's a security group issue.. I installed karmada thoroughout krew in both local and cloud one with the same way , it works just fine on the local one but not on the cloud one ! here you are :

 kubectl describe deployment karmada-apiserver -n karmada-system
kubectl get pods -n karmada-system -l app=karmada-apiserver
Name:                   karmada-apiserver
Namespace:              karmada-system
CreationTimestamp:      Mon, 08 Jul 2024 21:50:09 +0000
Labels:                 karmada.io/bootstrapping=app-defaults
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               app=karmada-apiserver
Replicas:               1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=karmada-apiserver
  Containers:
   karmada-apiserver:
    Image:      registry.k8s.io/kube-apiserver:v1.27.11
    Port:       5443/TCP
    Host Port:  0/TCP
    Command:
      kube-apiserver
      --allow-privileged=true
      --authorization-mode=Node,RBAC
      --client-ca-file=/etc/karmada/pki/ca.crt
      --enable-bootstrap-token-auth=true
      --etcd-cafile=/etc/karmada/pki/etcd-ca.crt
      --etcd-certfile=/etc/karmada/pki/etcd-client.crt
      --etcd-keyfile=/etc/karmada/pki/etcd-client.key
      --etcd-servers=https://etcd-0.etcd.karmada-system.svc.cluster.local:2379
      --bind-address=0.0.0.0
      --kubelet-client-certificate=/etc/karmada/pki/karmada.crt
      --kubelet-client-key=/etc/karmada/pki/karmada.key
      --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
      --disable-admission-plugins=StorageObjectInUseProtection,ServiceAccount
      --runtime-config=
      --apiserver-count=1
      --secure-port=5443
      --service-account-issuer=https://kubernetes.default.svc.cluster.local
      --service-account-key-file=/etc/karmada/pki/karmada.key
      --service-account-signing-key-file=/etc/karmada/pki/karmada.key
      --service-cluster-ip-range=10.96.0.0/12
      --proxy-client-cert-file=/etc/karmada/pki/front-proxy-client.crt
      --proxy-client-key-file=/etc/karmada/pki/front-proxy-client.key
      --requestheader-allowed-names=front-proxy-client
      --requestheader-client-ca-file=/etc/karmada/pki/front-proxy-ca.crt
      --requestheader-extra-headers-prefix=X-Remote-Extra-
      --requestheader-group-headers=X-Remote-Group
      --requestheader-username-headers=X-Remote-User
      --tls-cert-file=/etc/karmada/pki/apiserver.crt
      --tls-private-key-file=/etc/karmada/pki/apiserver.key
      --tls-min-version=VersionTLS13
    Liveness:     http-get https://:5443/livez delay=15s timeout=5s period=30s #success=1 #failure=3
    Readiness:    http-get https://:5443/readyz delay=0s timeout=5s period=30s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/karmada/pki from karmada-cert (ro)
  Volumes:
   karmada-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  karmada-cert
    Optional:    false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      False   MinimumReplicasUnavailable
  Progressing    False   ProgressDeadlineExceeded
OldReplicaSets:  <none>
NewReplicaSet:   karmada-apiserver-56b85d8bd (1/1 replicas created)
Events:          <none>
NAME                                READY   STATUS             RESTARTS          AGE
karmada-apiserver-56b85d8bd-8v5l4   0/1     CrashLoopBackOff   119 (2m26s ago)   9h
chaosi-zju commented 2 months ago

karmada-apiserver-56b85d8bd-8v5l4 0/1 CrashLoopBackOff 119 (2m26s ago) 9h

use kubectl logs -p parameter can print logs of this pod before it crash, then we can dig into why it crashed~

just like kubectl logs -p karmada-apiserver-56b85d8bd-8v5l4 -n karmada-system

oussexist commented 2 months ago
kubectl logs -p karmada-apiserver-56b85d8bd-8v5l4 -n karmada-system
Flag --apiserver-count has been deprecated, apiserver-count is deprecated and will be removed in a future version.
I0709 07:51:42.828196       1 server.go:554] external host was not specified, using 10.244.171.72
I0709 07:51:42.829556       1 server.go:166] Version: v1.27.11
I0709 07:51:42.829606       1 server.go:168] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0709 07:51:43.951884       1 shared_informer.go:311] Waiting for caches to sync for node_authorizer
I0709 07:51:44.001634       1 plugins.go:158] Loaded 9 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I0709 07:51:44.001652       1 plugins.go:161] Loaded 12 validating admission controller(s) successfully in the following order: LimitRanger,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,ClusterTrustBundleAttest,CertificateSubjectRestriction,ValidatingAdmissionPolicy,ValidatingAdmissionWebhook,ResourceQuota.
E0709 07:52:04.012907       1 run.go:74] "command failed" err="context deadline exceeded"
chaosi-zju commented 2 months ago

Probably same unresolved issue #5105

can you refer to https://github.com/karmada-io/karmada/issues/5105#issuecomment-2198068986, and check whether it can give you some help?

chaosi-zju commented 2 months ago

As you said:

I installed karmada thoroughout krew in both local and cloud one with the same way , it works just fine on the local one but not on the cloud one !

I suspect that there is something to do with the container network of your cloud environment, which makes the karmada-apiserver unable to connect to etcd.

chaosi-zju commented 2 months ago

Hi @oussexist, so did you come to any new conclusions later?

oussexist commented 2 months ago

Hi @chaosi-zju , Sorry i was kind of busy, Well i think as you said i have some networking issues on my cluster , i'll try to work with an aks cluster , althought i need to find a solution for the old one

oussexist commented 2 months ago

Ok so , Hello again and sorry for being a bit late , anyway i initialized karmada on my local cluster and put it with a public ip and assured the 32443 port is accesible tho ( this is kind of important ) , and then after fixing the cloud cluster from network issues it connected fine . Althought now i'll pass to the propagation thing so even if one of my clusters go down the other holds the deployments until the other one is up , i hope this will be working fine ! PS : as i told u am using only 2 clusters so , the local one is at the same time a controle plane and a member , ( i cant have another cluster as controlplane due to ressource limitation)

chaosi-zju commented 2 months ago

and then after fixing the cloud cluster from network issues it connected fine .

Hi, does the network issue refers to run.go:74] "command failed" err="context deadline exceeded"?

If yes, then I'm curious how you fixed this network issue at last, haha

oussexist commented 2 months ago

Hello again , I was on holidays . So i kind of forgot what i did exactly haha , but as i remember i assured that the cloud cluster uses the public ip for the kubeconfig with the advertise flag ( cuz i am initializing the kubernetes cluster with an ansible playbook so in the task of the kubeadm init i added the flag of advertise to give the public ip there because by default it'll get the 10.xx.xx.xx , and also as i told you , the control plane initialized on the local was not accessible from outside since it's local so i assured thats it's accesible and opened the karmada port. So thats it as i remember haha , anyway i tried the propagation and it finely created the deployement in both clusters , but i am curious about high availability ,what will happen if the local one goes down normally we won't have a high availablity since it's the control plane and also what if the cloud cluster goes down , idk exactly i am kind of not mastering this at all , so should i acces to each deployement seperaly or on the same endpoint or what.. can you please just guide me a bit .

Edited : Also i have a little problem , when i create a deployment through propagation file , and get deployments i found the local deployment is not ready !

oussexist commented 2 months ago

and then after fixing the cloud cluster from network issues it connected fine .

Hi, does the network issue refers to run.go:74] "command failed" err="context deadline exceeded"?

If yes, then I'm curious how you fixed this network issue at last, haha

i think by adding the --control-plane-endpoint flag in the kubeadm init command and put an accessible ip there will fix this so it could access outside , but am not sure , cuz i did the init command on the local one and since it worked i didnt try again on cloud one.

oussexist commented 2 months ago

I think we can Close this issue , the propagation problem was just a lack of concentration from me , All i need to do was when applying the deployment and propagation file i should've added the kubeconfig flag of karmada api since am using the cluster a host and a control plane at the same time. Regards.