Open wskulley opened 3 years ago
We recently did the same upgrade from kops 1.18 to 1.19 and encountered the same error message. This however happened on our second etcd-manager node, and setting the value of GODEBUG=x509ignoreCN=0
on the replacement node did not allow etcd to start, which blocked kube-apiserver from starting, and so on.
In order to get etcd to start we had to perform a rolling update on the third node (which had the ip address from the error message) with the --cloudonly
option specified. Once the third node was replaced (and hadn't necessarily rejoined the cluster), the second node started etcd and joined both the etcd and kubernetes clusters. The third node joined both clusters without issue.
⚠️ +1
Since etcd-manager
upgrade to Go 1.15 (CommonName deprecation) all upgrades to kOps 1.19 are breaking (first master never joins the etcd clusters). The problem is that the certs being generated contains this field that has been deprecated for 20 years already, Go enforce this since 1.15 and it refuses to connect even if you have a AltNames
field ( #362 added the field but it should have removed the CN too).
Until a proper fix is implemented you need to use the workaround to rollback to the old behaviour in Go. I think the proper solution is to stop generating certificates with CN on etcd-manager
and rotate certs in all masters later on (1.20, 1.21?). But I'm not sure if there are some second-order effects issues by removing it.
Update: In our case, the issue was not related to this. We had a config mistake by binding both etcd
and etcd-events
to the same metrics port. The Go deprecation log is still appearing during startup but it was noise about this underlying issue.
During upgrade from 'default' kops 1.18 to 'default' kops 1.19 encountered the following error on the first etcd-manager node to roll:
unable to grpc-ping discovered peer 10.28.114.172:3996: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0
The replacement etcd would not join the existing cluster. Was able to bypass the issue by adding