Closed rudoi closed 4 years ago
cc @amy
I'm fairly certain this is because the cert Secrets have the first controlplane's KubeadmConfig as an owner reference. Deleting the first controlplane Machine deleted the KubeadmConfig, which in turn deleted the Secrets, which in turn made it so the 2nd new controlplane Machine had nothing to look up. This is a guess though.
Yes, this would likely be the issue... The Secrets should likely be owned by the Cluster
rather than the KubeadmConfig
.
/priority critical-urgent /milestone v0.1.x
@detiber excellent, that's what I was thinking. I'll PR it.
@rudoi @amy and I worked on this issue on friday. Please use her new branch and this will not happen. The owner refs get modified to be owned by the new kubeconfig.
the ownership gets complicated if CABPK is creating stuff for a machine for a cluster that is then owned by the cluster. The certs are generated by data from the config not from the cluster. This means if you want to keep a cluster around but scale down to 0 control planes without deleting the cluster, you can end up with stale certificates that require manual intervention. If it's owned by the kubeadmconfig then when scaled to 0 the secrets go away.
edit: that should read the certs are generated from the config and the cluster, not just the cluster
What part can become stale? Aren't we generating certs with 10 year lifespans?
the serving cert SANs
Given that the linked PR has been closed, is there anyone working on a fix for this issue?
I think we should close this. If we need to, we can open a separate issue to debate if the KubeadmConfig or the Cluster should own the Secrets. I think, however, that I agree with Chuck that the way it's implemented with KubeadmConfig as the owner is correct.
/close
@chuckha: Closing this issue.
/kind bug
What steps did you take and what happened: [A clear and concise description of what the bug is.]
What did you expect to happen:
A flawless upgrade, obviously!Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.] I'm fairly certain this is because the cert Secrets have the first controlplane's KubeadmConfig as an owner reference. Deleting the first controlplane Machine deleted the KubeadmConfig, which in turn deleted the Secrets, which in turn made it so the 2nd new controlplane Machine had nothing to look up. This is a guess though.
/assign /lifecycle active