Open jzink-tss opened 3 years ago
Thank you for reporting this to us.
Just as past-me, I just stumbled upon this, while fixing a faulty control plane using the manual cluster repair guide.
I got almost the same error (its just not containerd this time, but k8s itself), but cannot seem to find why the fix which was applied the first time I found this doesn't work anymore.
I'm currently on the release/v1.7
branch (commit 3edc498), because I had this issue before and needed both fixes to be included in the KubeOne version I used.
I also tried using --force-upgrade
, because I had a look at the fix and found this condition to might cause the problem now.
OS is Ubuntu 22.04, the control plane in question is freshly created (because the old one was faulty and the guide tells you to delete and re-create it). The remaining 2 control planes are on the same OS and kubernetes version, but have of course been provisioned by an old version of KubeOne, if that helps.
Log: log.txt
PS: Unfotunately, I'll be away for a month, so I cannot test proposed solutions in the meantime.
Thanks for reporting! I'll reopen the issue so we can verify it on our side /reopen
@xmudrii: Reopened this issue.
I got the workaround working again (unholding the pkgs manually, on all 3 Control Planes this time). But as the underyling problem seems to persist, I'll leave this issue open.
Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale
What happened: After one of our control planes in the staging cluster failed, I followed the cluster repair guide in order to set up a new node:
kubectl exec
'd into a working nodeNotReady
node from etcd ringterraform apply
so that a new server was createdkubeone apply
in order to install kubernetes on it and let it join the node into the clusterBut then, on the existing control planes, the following error occured (error msg unescaped and shortened for better readability):
This seems to happen because there are several packages held back by default:
When I "unheld" them by running
apt-mark unhold containerd.io kubeadm kubectl kubelet kubernetes-cni
, everything worked as expected (and described in the guide).You should either add this step to the guide or implement toleration of held packages (e.g. use
--allow-change-held-packages
).Anything else we need to know?
containerd.io
was1.4.9
before, which is matched by'containerd.io=1.4.*'
. Maybe that caused the problem in the first place?Information about the environment: KubeOne version (
kubeone version
):Operating system: Alpine (Docker image:
hashicorp/terraform:1.0.8
+ kubeone installed withwget
) Provider you're deploying cluster on: Hetzner Operating system you're deploying on: Ubuntu 20.04