rancher / turtles

Rancher CAPI extension
https://turtles.docs.rancher.com
Apache License 2.0
51 stars 16 forks source link

[SURE-9138] rke2-control-plane-system: unable to lookup or create cluster certificates, external certificate not found: secrets "cluster-etcd" not found #774

Open kkaempf opened 3 weeks ago

kkaempf commented 3 weeks ago

SURE-9138

Issue description:

The customer is seeing the following error:

E0916 12:43:13.582966 1 workload_cluster.go:118] "Collecting etcd key pair from remote" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="rke2test/rke2test-master" namespace="rke2test" name="rke2test-master" reconcileID="b94f1a99-df21-4d73-aaa5-cb7e1a69a1a3"
E0916 12:43:13.603661 1 management_cluster.go:171] "unable to lookup or create cluster certificates" err="external certificate not found: secrets \"cluster-etcd\" not found" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="rke2test/rke2test-master" namespace="rke2test" name="rke2test-master" reconcileID="b94f1a99-df21-4d73-aaa5-cb7e1a69a1a3" 

We told them that this should be a transient error during the provisioning phase [1] but they are reporting that this issue not only during the provisioning phase, but also in stable clusters even weeks after the cluster was deployed.

kkaempf commented 3 weeks ago

/cc @Danil-Grigorev - since you did the initial assessment when this bug was initially reported via Slack 😉

Danil-Grigorev commented 1 week ago

@kkaempf As per discussion in slack - this is fixed in 0.7.1 CAPRKE2 release with combination of https://github.com/rancher/cluster-api-provider-rke2/pull/451 and https://github.com/rancher/cluster-api-provider-rke2/pull/453. Users will need to upgrade to this version later, we might need the version pinned in turtles release also.

furkatgofurov7 commented 20 hours ago

Turtles release v0.13.0 which under the hood uses CAPRKE2 v0.8.0 (includes bug-fixes that 0.7.1 provides) was released and this could be closed or hold also until it is tested and verified it fixes their issue.