nwcdheap / kops-cn

AWS中国宁夏区域/北京区域,快速Kops部署K8S集群
Apache License 2.0
121 stars 74 forks source link

Validation failed in the Kops cluster #128

Closed frsh-augustin closed 3 years ago

frsh-augustin commented 3 years ago

请注意,kops-cn是一个开源项目帮助用户更容易部署kops在AWS中国北京与宁夏区,kops-cn没有侵入式修改上游kops源代码,并且保持跟上游kops版本一致,因此大部分kops-cn遇到的功能性问题都会存在上游kops专案当中,在发布问题的时候请务必确定查看并搜寻kops上游是否有人发布过同样的的问题,这里无法解决kops本身存在的问题或issue,如果它是一个kops本身的issue,请务必发布到上游kops专案的issue当中。

如果你很肯定这个issue只跟kops-cn有关,跟上游kops无关,请填写以下信息帮助我们定位问题与改进这个项目,并且尽可能提供截图给我们。

1. What kops version are you running? The command kops version, will display this information. Version 1.12.1 (git-e1c317f9c)

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.8", GitCommit:"a89f8c11a5f4f132503edbc4918c98518fd504e3", GitTreeState:"clean", BuildDate:"2019-04-23T04:41:47Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

3. What AWS region are you using(Beijing or Ningxia)? Beijing

4. What commands did you run? What is the simplest way to reproduce this issue? make validate-cluster

5. What happened after the commands executed?

VALIDATION ERRORS KIND NAME MESSAGE ComponentStatus etcd-0 component "etcd-0" is unhealthy ComponentStatus etcd-1 component "etcd-1" is unhealthy Machine i-0daa77b306798422b machine "i-0daa77b306798422b" has not yet joined cluster

I launch the cluster with three master nodes, and four work nodes. The cluster has run for couple of month, I notice two of the etcd are unhealthy, one master node leave the cluster.

6. What did you expect to happen? I'd like to know how to address the validation errors that are genented by the command "make validate-cluster"

7. Please provide the content of your Makefile and how did you run the make command You may want to remove your cluster name and other sensitive information. Please refer to the attachment. Makefile.txt

8. Anything else do we need to know? No

xfangfang commented 3 years ago

I think you've got this problem in the link: https://github.com/kubernetes/kops#2020-05-06-etcd-manager-certificate-expiration-advisory

I may have encountered this problem the other day, so I updated the Kops version to 1.15.0 and the k8s version to 1.15.10.

frsh-augustin commented 3 years ago

@xfangfang Thank you for your advice.

I take the Hack/walkaround steps described in the following URL, and the issue is gone. https://github.com/kubernetes/kops/blob/master/docs/advisories/etcd-manager-certificate-expiration.md