Open containerguy opened 3 years ago
This is an issue with K3s and not K3os; please see k3s-io/k3s#2996. You must make sure the server is the same version or newer than the agent. When upgrading, servers must ALL be upgraded before upgrading agents.
Thanks for the info @brandond! I ran into the exact same error, and after forcing the master to upgrade, everything came back online!
The root cause of my issue was that I had set my entire cluster to auto-upgrade via the system-upgrade-controller - both masters and agents. Since there is a single plan, and an agent was the first attempted upgrade, the whole process stopped when that node failed to come back online. Still thinking of potential solutions, but maybe easier to just mention in the Readme for upgrades from 1.19 to 1.20 (since this seems related to the implementation of network-policy in 1.20, if I'm following the issues correctly in my head)
Prior upgrades may have been more forgiving about this, but it is a standard part of the Kubernetes version skew policy that servers always need to be upgraded before agents. If you're using the upgrade controller, you should definitely use separate plans for servers and agents, and upgrade the servers first - ESPECIALLY when upgrading between minor versions.
https://kubernetes.io/docs/setup/release/version-skew-policy/#supported-component-upgrade-order
Thanks for the info! Based on this, I wonder if the default k3os-latest plan for the system-upgrade-controller bundled with K3OS should be adjusted, as the way it's configured now it won't pay attention to if controllers are upgraded first. The k3os.io/upgrade label could be adjusted to manually control the upgrade process, but kind of defaults the point of an automatic upgrade controller. At a minimum this should probably be mentioned in the Upgrade and Maintenance section of the readme.
@dweomer the ask above regarding the default plan seems reasonable, is this something that could be accommodated without too much work?
You are totally right, after manually upgrading server and afterwards the workers, it is working fine. I suggest at least the documentation for using an upgrade plan dedicated for servers / agents could be updated to reflect this more clearly.
@dweomer the ask above regarding the default plan seems reasonable, is this something that could be accommodated without too much work?
i never got around to it but the k3os-latest plan needs to be deprecated in favor of 2 plans, one for servers and one for all other agents. the thing about the k3os-latest plan, though, is that it was meant to be an example but it's basically the defacto upgrade descriptor.
Version (k3OS / kernel) k3os version v0.20.4-k3s1r0 5.4.0-70-generic #78 SMP Fri Mar 26 17:09:23 UTC 2021
Architecture x86_64
Describe the bug
kube-proxy not reachable due to a broken ip tables rule regarding /var/log/k3s-service.log
The same error occurs if I upgrade from v0.19.5 to v0.20.4 To Reproduce
Upgrade one K3OS Node to v0.20.4-k3s1r0
Expected behavior IPTables rules work with the previous k3os versions and should also work with a newer version as this is managed by k3s.
Actual behavior IPTables rules are broken, cluster tries to start kube-proxy several times (as seen in the kubernetes events) Node unusable until downgrade to previous version
Additional context Multi Master HA Cluster with 3x master und 3x worker nodes, virtualized on VMware k3s-service.log