openyurtio / openyurt

OpenYurt - Extending your native Kubernetes to edge(project under CNCF)
https://openyurt.io
Apache License 2.0
1.7k stars 401 forks source link

[BUG]Yurtctl not be able to convert a k8s cluster setup by kubeadm #680

Closed yixingjia closed 2 years ago

yixingjia commented 2 years ago

Note: This may be not a bug but would like to share what I have encountered in case someone knows the root cause. The reason I say may not a bug because yurtctl not officially support v1.22.3. I manually modify the yurtctl/util/kubernetes/util.go and add the 1.22 support.

Steps:

  1. Get the latest code and edit yurtctl/util/kubernestes/utils.go add 1.22 support "1.22", "1.22+"}
  2. Set up a k8s cluster with 2 nodes with version v1.22.3 by Kubeadm (yc01 for cloud node, ye01 for edge node)
  3. run the follow comand:

root@yc00:/home/ecl/openyurt# _output/bin/yurtctl convert --deploy-yurttunnel --cloud-nodes yc01 --provider kubeadm I1214 06:20:20.132582 38352 convert.go:380] mark yc01 as the cloud-node I1214 06:20:20.144388 38352 convert.go:388] mark ye01 as the edge-node I1214 06:20:20.151448 38352 convert.go:395] mark ye01 as autonomous I1214 06:21:00.231709 38352 util.go:543] servant job(yurtctl-disable-node-controller-yc01) has succeeded I1214 06:21:00.231793 38352 convert.go:418] complete disabling node-controller I1214 06:21:00.292876 38352 convert.go:433] yurt-tunnel-server is deployed I1214 06:21:00.322629 38352 convert.go:441] yurt-tunnel-agent is deployed I1214 06:21:00.330859 38352 convert.go:522] kube-public/cluster-info configmap already exists, skip to prepare it I1214 06:21:00.367253 38352 convert.go:485] deploying the yurt-hub and resetting the kubelet service on edge nodes... E1214 06:23:00.375187 38352 util.go:540] fail to run servant job(node-servant-convert-ye01): wait for job to be complete timeout I1214 06:23:00.375290 38352 convert.go:492] complete deploying yurt-hub on edge nodes I1214 06:23:00.375315 38352 convert.go:496] deploying the yurt-hub and resetting the kubelet service on cloud nodes E1214 06:25:00.390571 38352 util.go:540] fail to run servant job(node-servant-convert-yc01): wait for job to be complete timeout I1214 06:25:00.390693 38352 convert.go:503] complete deploying yurt-hub on cloud nodes

Some wired issues:

  1. It try to start yurt-hub on both cloud and edge node check the log : I1214 06:23:00.375315 38352 convert.go:496] deploying the yurt-hub and resetting the kubelet service on cloud nodes.

It can be verified by : root@yc00:/home/ecl/openyurt# kubectl get pods -A -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system calico-kube-controllers-9c86b8cb4-zgzjx 1/1 Running 0 48m 192.16.70.1 yc01 kube-system calico-node-trjsc 1/1 Running 0 48m 10.221.33.199 yc01 kube-system calico-node-xp7pn 1/1 Running 0 46m 10.221.33.198 ye01 kube-system coredns-78fcd69978-2hnwp 1/1 Running 0 48m 192.16.70.2 yc01 kube-system coredns-78fcd69978-hps72 1/1 Running 0 48m 192.16.70.3 yc01 kube-system etcd-yc01 1/1 Running 1 48m 10.221.33.199 yc01 kube-system kube-apiserver-yc01 1/1 Running 1 48m 10.221.33.199 yc01 kube-system kube-controller-manager-yc01 1/1 Running 0 43m 10.221.33.199 yc01 kube-system kube-proxy-5d64d 1/1 Running 0 46m 10.221.33.198 ye01 kube-system kube-proxy-9g9bh 1/1 Running 0 48m 10.221.33.199 yc01 kube-system kube-scheduler-yc01 1/1 Running 1 48m 10.221.33.199 yc01 kube-system yurt-controller-manager-77b97fd47b-4pqj4 1/1 Running 0 43m 10.221.33.199 yc01 kube-system yurt-hub-yc01 1/1 Running 7 (8m18s ago) 34m 10.221.33.199 yc01 kube-system yurt-hub-ye01 0/1 CrashLoopBackOff 7 (3m27s ago) 33m 10.221.33.198 ye01 kube-system yurt-tunnel-agent-kqkjc 1/1 Running 0 43m 10.221.33.198 ye01 kube-system yurt-tunnel-server-777b664598-q99xl 1/1 Running 0 43m 10.221.33.199 yc01

is this expected?

  1. Edge node cannot convert successfully. I guess this maybe caused by the node-servant-convert job not run successfully

Logs on for yurt-hub on edge side: 2021-12-14T07:08:57.133786997Z stderr F E1214 07:08:57.133698 1 certificate_manager.go:434] Failed while requesting a signed certificate from the master: cannot create certificate signing request: Post "https://10.221.33.199:6443/apis/certificates.k8s.io/v1beta1/certificatesigningrequests": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes") 2021-12-14T07:08:57.133918863Z stderr F E1214 07:08:57.133819 1 certificate_manager.go:318] Reached backoff limit, still unable to rotate certs: timed out waiting for the condition 2021-12-14T07:09:01.571236431Z stderr F I1214 07:09:01.571045 1 certificate.go:83] waiting for preparing client certificate 2021-12-14T07:09:06.57139345Z stderr F I1214 07:09:06.571135 1 certificate.go:83] waiting for preparing client certificate ......

  1. Try to rever the convert but not success root@yc00:/home/ecl/openyurt# _output/bin/yurtctl revert I1214 07:15:56.786158 46377 revert.go:160] yurt controller manager is removed I1214 07:15:56.790652 46377 revert.go:169] serviceaccount for yurt controller manager is removed I1214 07:15:56.796609 46377 revert.go:178] clusterrole for yurt controller manager is removed I1214 07:15:56.805611 46377 revert.go:187] clusterrolebinding for yurt controller manager is removed I1214 07:15:56.906209 46377 revert.go:347] deployment for yurt app manager is removed I1214 07:15:56.914157 46377 revert.go:357] Role for yurt app manager is removed I1214 07:15:56.920171 46377 revert.go:366] ClusterRole for yurt app manager is removed I1214 07:15:56.923024 46377 revert.go:375] ClusterRoleBinding for yurt app manager is removed I1214 07:15:56.925334 46377 revert.go:385] RoleBinding for yurt app manager is removed I1214 07:15:56.927620 46377 revert.go:395] secret for yurt app manager is removed I1214 07:15:57.123257 46377 revert.go:405] Service for yurt app manager is removed I1214 07:15:57.124898 46377 revert.go:415] MutatingWebhookConfiguration for yurt app manager is removed I1214 07:15:57.126873 46377 revert.go:425] ValidatingWebhookConfiguration for yurt app manager is removed E1214 07:15:57.324652 46377 revert.go:70] fail to revert yurt to kubernetes: fail to remove the yurt app manager: fail to delete the NodePoolCRD/nodepoolcrd: no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beta1"

  2. The yurt-hub pods still there, this may due to the failures in revert steps. root@yc00:/home/ecl/openyurt# kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system calico-kube-controllers-9c86b8cb4-zgzjx 1/1 Running 0 63m kube-system calico-node-trjsc 1/1 Running 0 63m kube-system calico-node-xp7pn 1/1 Running 0 61m kube-system coredns-78fcd69978-2hnwp 1/1 Running 0 63m kube-system coredns-78fcd69978-hps72 1/1 Running 0 63m kube-system etcd-yc01 1/1 Running 1 63m kube-system kube-apiserver-yc01 1/1 Running 1 63m kube-system kube-controller-manager-yc01 1/1 Running 0 59m kube-system kube-proxy-5d64d 1/1 Running 0 61m kube-system kube-proxy-9g9bh 1/1 Running 0 63m kube-system kube-scheduler-yc01 1/1 Running 1 63m kube-system yurt-hub-yc01 1/1 Running 9 (5m10s ago) 49m kube-system yurt-hub-ye01 0/1 CrashLoopBackOff 9 (19s ago) 48m

rambohe-ch commented 2 years ago

/assign @adamzhoul @Peeknut

Congrool commented 2 years ago

I also encountered the samiliar issue of certification when I converted k8s v1.22 to openyurt cluster. We should figure out the reason.

Peeknut commented 2 years ago

(1) It's true that Yurthub will also be deployed on cloud nodes during conversion. Because Yurthub adds nodepool-related functions, cloud nodes are also needed. However, the cloud yrthub runs in cloud mode and does not enable the caching function. (2) This may be a problem on Yurthub. According to the pod resources, the pod related to node-servant-convert seems to have run successfully (although the job is not recycled due to timeout, you can manually delete the job).

root@yc00:/home/ecl/openyurt# kubectl get pods -A -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system calico-kube-controllers-9c86b8cb4-zgzjx 1/1 Running 0 48m 192.16.70.1 yc01 kube-system calico-node-trjsc 1/1 Running 0 48m 10.221.33.199 yc01 kube-system calico-node-xp7pn 1/1 Running 0 46m 10.221.33.198 ye01 kube-system coredns-78fcd69978-2hnwp 1/1 Running 0 48m 192.16.70.2 yc01 kube-system coredns-78fcd69978-hps72 1/1 Running 0 48m 192.16.70.3 yc01 kube-system etcd-yc01 1/1 Running 1 48m 10.221.33.199 yc01 kube-system kube-apiserver-yc01 1/1 Running 1 48m 10.221.33.199 yc01 kube-system kube-controller-manager-yc01 1/1 Running 0 43m 10.221.33.199 yc01 kube-system kube-proxy-5d64d 1/1 Running 0 46m 10.221.33.198 ye01 kube-system kube-proxy-9g9bh 1/1 Running 0 48m 10.221.33.199 yc01 kube-system kube-scheduler-yc01 1/1 Running 1 48m 10.221.33.199 yc01 kube-system yurt-controller-manager-77b97fd47b-4pqj4 1/1 Running 0 43m 10.221.33.199 yc01 kube-system yurt-hub-yc01 1/1 Running 7 (8m18s ago) 34m 10.221.33.199 yc01 kube-system yurt-hub-ye01 0/1 CrashLoopBackOff 7 (3m27s ago) 33m 10.221.33.198 ye01 kube-system yurt-tunnel-agent-kqkjc 1/1 Running 0 43m 10.221.33.198 ye01 kube-system yurt-tunnel-server-777b664598-q99xl 1/1 Running 0 43m 10.221.33.199 yc01

yurt-hub-ye01 pod has been pulled up, this should be a certificate issue when yurthub is started, please take a look.@qclc

(3) There is indeed a problem with the yurtctl revert. Because yurt-app-manager is not deployed in convert (by passing in the parameter --enable-app-manager ), revert deletes it directly without judging whether to deploy, so it will fail. (4) Yes, Yurthub did not pull up because revert did not execute successfully.

yixingjia commented 2 years ago

It's true that Yurthub will also be deployed on cloud nodes during conversion.

So should we update the manual document ? Since it only requires deploy the yurt hub on edge node.

Besides that, when try to manually patch the cluster. the Yurthub throw the follow error 2021-12-14T08:28:47.864213804Z stderr F I1214 08:28:47.864029 1 certificate.go:83] waiting for preparing client certificate 2021-12-14T08:28:52.74110528Z stderr F I1214 08:28:52.740702 1 connrotation.go:110] forcibly close 1 connections on 10.221.33.199:6443 for hub certificate manager dialer 2021-12-14T08:28:52.742733406Z stderr F I1214 08:28:52.742513 1 connrotation.go:48] close connection from 10.221.33.198:54592 to 10.221.33.199:6443 for hub certificate manager dialer, remain 0 connections 2021-12-14T08:28:52.743642506Z stderr F I1214 08:28:52.743539 1 connrotation.go:145] create a connection from 10.221.33.198:54610 to 10.221.33.199:6443, total 1 connections in hub certificate manager dialer 2021-12-14T08:28:52.777793843Z stderr F E1214 08:28:52.777498 1 certificate_manager.go:434] Failed while requesting a signed certificate from the master: cannot create certificate signing request: the server could not find the requested resource 2021-12-14T08:28:52.864220166Z stderr F I1214 08:28:52.863930 1 certificate.go:83] waiting for preparing client certificate 2021-12-14T08:28:57.864229933Z stderr F I1214 08:28:57.864013 1 certificate.go:83] waiting for preparing client certificate

Peeknut commented 2 years ago

Yes, maybe we should update the doc.

Besides that, when try to manually patch the cluster. the Yurthub throw the follow error 2021-12-14T08:28:47.864213804Z stderr F I1214 08:28:47.864029 1 certificate.go:83] waiting for preparing client certificate 2021-12-14T08:28:52.74110528Z stderr F I1214 08:28:52.740702 1 connrotation.go:110] forcibly close 1 connections on 10.221.33.199:6443 for hub certificate manager dialer 2021-12-14T08:28:52.742733406Z stderr F I1214 08:28:52.742513 1 connrotation.go:48] close connection from 10.221.33.198:54592 to 10.221.33.199:6443 for hub certificate manager dialer, remain 0 connections 2021-12-14T08:28:52.743642506Z stderr F I1214 08:28:52.743539 1 connrotation.go:145] create a connection from 10.221.33.198:54610 to 10.221.33.199:6443, total 1 connections in hub certificate manager dialer 2021-12-14T08:28:52.777793843Z stderr F E1214 08:28:52.777498 1 certificate_manager.go:434] Failed while requesting a signed certificate from the master: cannot create certificate signing request: the server could not find the requested resource 2021-12-14T08:28:52.864220166Z stderr F I1214 08:28:52.863930 1 certificate.go:83] waiting for preparing client certificate 2021-12-14T08:28:57.864229933Z stderr F I1214 08:28:57.864013 1 certificate.go:83] waiting for preparing client certificate

This may be the problem of yurthub startup, since there will be certificate problems for manually patch the cluster or using yurtctl convert.@qclc

Peeknut commented 2 years ago

@yixingjia For the problem of yurtctl convert/revert, preflight check will be added to yurtctl convert/revert recently to reduce the probability of conversion failure.

qclc commented 2 years ago

I also encountered the samiliar issue of certification when I converted k8s v1.22 to openyurt cluster. We should figure out the reason.

It seems that yurthub has successfully sent the CSR request, and then there is a problem in the server side processing. It may be due to the k8s version. I will take a look.

yixingjia commented 2 years ago

It's true that Yurthub will also be deployed on cloud nodes during conversion. Because Yurthub adds nodepool-related functions, cloud nodes are also needed. However, the cloud yrthub runs in cloud mode and does not enable the caching function.

@rambohe-ch In case cloud nodes also need the Yurthub now, which is a big change for the OpenYurt topology. Currently, the Openyurt architecture not reflect this change. Do we have a plan to update it accordingly?

qclc commented 2 years ago

I also encountered the samiliar issue of certification when I converted k8s v1.22 to openyurt cluster. We should figure out the reason.

It seems that yurthub has successfully sent the CSR request, and then there is a problem in the server side processing. It may be due to the k8s version. I will take a look.

Starting from version 1.22.x, apiserver no longer supports the beta version of CertificateSigningRequest (K8S CHANGELOG), and yurthub currently uses v1beta1 version of CertificateSigningRequest, so errors will occur when sending CSR requests. Therefore, yurthub may not yet support the 1.22.x apiserver version.

rambohe-ch commented 2 years ago

It's true that Yurthub will also be deployed on cloud nodes during conversion. Because Yurthub adds nodepool-related functions, cloud nodes are also needed. However, the cloud yrthub runs in cloud mode and does not enable the caching function.

@rambohe-ch In case cloud nodes also need the Yurthub now, which is a big change for the OpenYurt topology. Currently, the Openyurt architecture not reflect this change. Do we have a plan to update it accordingly?

@yixingjia Thank you for your suggestion, I think we need to add yurthub in cloud nodes of architecture. I will fix it before release v0.6.0 version.

hhstu commented 2 years ago

crds app noodpool is apiextensions.k8s.io/v1beta1, need change to apiextensions.k8s.io/v1

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

adamzhoul commented 2 years ago

crds app noodpool is apiextensions.k8s.io/v1beta1, need change to apiextensions.k8s.io/v1

this is indeed a problem for v1.22 changeLog

projects :

should update for v1.22 I think. As we say "OpenYurt supports Kubernetes versions up to 1.22."

rambohe-ch commented 2 years ago

crds app noodpool is apiextensions.k8s.io/v1beta1, need change to apiextensions.k8s.io/v1

this is indeed a problem for v1.22 changeLog

  • The beta CustomResourceDefinition API (apiextensions.k8s.io/v1beta1)

projects :

should update for v1.22 I think. As we say "OpenYurt supports Kubernetes versions up to 1.22."

@adamzhoul Thank you for you reply. would you like to update yurt-app-manager all_in_one.yaml for adapting K8s v1.22? and yurtcluster-operator will be deprecated in the future, so we keep it unchanged now.

adamzhoul commented 2 years ago

if yurtcluster-operator will be deprecated how can we install openyurt to an existed cluster?

rambohe-ch commented 2 years ago

if yurtcluster-operator will be deprecated how can we install openyurt to an existed cluster?

@adamzhoul based on the discussion of this issue: https://github.com/openyurtio/openyurt/issues/846, end user will use helm install to install OpenYurt on a existed K8s cluster.

rambohe-ch commented 2 years ago

yurtctl covert/revert command is removed, how to install OpenYurt, you can reference the link here: https://openyurt.io/zh/docs/installation/summary

and i will close this issue.