Closed ieugen closed 5 years ago
/assign @liztio @timothysc
@ieugen I'd recommend using the configuration migrate utility prior to attempting to upgrade. The configuration file format has significantly changed from v1.10 -> v1.11 but folks have done a good job in testing that migration.
@timothysc I've installed 1.11 and I am upgrading to 1.11.1 so there should not be much to upgrade. I did use the utility and I got there results:
kubeadm config view > kubeadm-old.yaml
kubeadm config migrate --old-config kubeadm-old.yaml > kubeadm-new.yaml
diff kubeadm-old.yaml kubeadm-new.yaml
10d9
< oidc-issuer-url: https://REDACTED
12a12
> oidc-issuer-url: https://REDACTED
17a18,25
> bootstrapTokens:
> - groups:
> - system:bootstrappers:kubeadm:default-node-token
> token: REDACTED
> ttl: 24h0m0s
> usages:
> - signing
> - authentication
137c145,150
< nodeRegistration: {}
---
> nodeRegistration:
> criSocket: /var/run/dockershim.sock
> name: m01
> taints:
> - effect: NoSchedule
> key: node-role.kubernetes.io/master
Confirming. In my case (1.11.0 -> 1.11.1) it looses apiServerExtraArgs like etcd-cafile, feature-gates etc... and replaces them with some defaults.
I can find expected values inside of configmap (key: MasterConfiguration) like this kubectl get configmap -n kube-system kubeadm-config -oyaml
I've made the upgrade and it wen't smooth so I am a bit confused about this. I also rebooted the cluster (one node at a time, starting with master) to see if there are any issues and I did not see any.
I don't remember having to change anything after the upgrade and I did not document it :(.
Regards,
We lost networking to the pods after 1.11.1 upgrade from 1.10.6. It looks like --cluster-cidr is no longer working as all our pods came up with IPs from 172.17.x.x and not 10.244.x.x which is configured for flanel. How can we resolve this situation?
Even better: ATM I'm at v1.11.0 kubeadm upgrade diff v1.11.0 gives me same broken result.
--- /etc/kubernetes/manifests/kube-apiserver.yaml
+++ new manifest
@@ -14,17 +14,16 @@
- command:
- kube-apiserver
- --authorization-mode=Node,RBAC
- - --etcd-cafile=/opt/etcd/ca.pem
- - --etcd-certfile=/opt/etcd/staging-cluster2node.pem
- - --etcd-keyfile=/opt/etcd/staging-cluster2node-key.pem
- - --feature-gates=PodPriority=false
- --advertise-address=192.168.6.161
- --allow-privileged=true
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --disable-admission-plugins=PersistentVolumeLabel
- --enable-admission-plugins=NodeRestriction
- --enable-bootstrap-token-auth=true
- - --etcd-servers=https://192.168.6.161:2379,https://192.168.6.162:2379,https://192.168.6.163:2379
+ - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
+ - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
+ - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
+ - --etcd-servers=https://127.0.0.1:2379
- --insecure-port=0
- --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
....
@wizard580 In my case the upgrade went ok. No issues with the cluster (and I'm also running on top of wireguard VPN)
I'll try tomorrow after backup. But anyway broken diff is a bug. From my perspective - major.
For us this is not only broken diff as node cidr really seems to get lost.
Upgrade with kubeadm upgrade apply v1.11.1
worked fine, configs are not broken as far as I can see.
It generated unneeded etcd certs, but they are ignored by our configs/setup
For us upgrade also did not seem broken at first but after uncordoning the upgraded nodes and draining the old nodes our application went down right away because the pods all used wrong IPs.
Confirming. Found similar issues... in our case IPVS was stuck at old service:pods mappings. You can check for kube-proxy logs and probably you'll find a lot of errors about ipset. Reboot (of nodes) helped us. Observing...
Can still reproduce this in the latest v1.12.0 alpha. Gonna see if I can if I can't sort this out for the code freeze.
ETOOCOMPLICATED, punting to 1.13
Some updates,
I've made the upgrade to 1.11.2 and 1.11.3 without any issue. Every time I performed the upgrade the diff showed it was dropping the information however that does not seem to happen. At this point I believe it is bad reporting.
@ieugen Minor Upgrades were also not affected here, but every major upgrade (1.10.x -> 1.11.x) was!
We lost networking to the pods after 1.11.1 upgrade from 1.10.6. It looks like --cluster-cidr is no longer working as all our pods came up with IPs from 172.17.x.x and not 10.244.x.x which is configured for flanel. How can we resolve this situation?
@mkretzer In my case worker nodes kubelet loses its network parameters during upgrades, my personal fix is
echo "KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni" > /var/lib/kubelet/kubeadm-flags.env
since 1.11, /var/lib/kubelet/kubeadm-flags.env
is a file that kubeadm init
and join
generate automatically on runtime each time:
https://kubernetes.io/docs/setup/independent/kubelet-integration/#the-kubelet-drop-in-file-for-systemd
if you write it:
init/join
, kubeadm will overwrite it and discard it's contents.init/join
, kubeadm or the kubelet will not use it.since 1.11,
/var/lib/kubelet/kubeadm-flags.env
is a file thatkubeadm init
andjoin
generate automatically on runtime each time: https://kubernetes.io/docs/setup/independent/kubelet-integration/#the-kubelet-drop-in-file-for-systemdif you write it:
* before `init/join`, kubeadm will overwrite it and discard it's contents. * after `init/join`, kubeadm or the kubelet will not use it.
It's great, but kubeadm init/join wasn't run during cluster upgrade and cgroup/cni args were lost on worker nodes, that's why pods had 172.0.0.x IPs
It's great, but kubeadm init/join wasn't run during cluster upgrade and cgroup/cni args were lost on worker nodes, that's why pods had 172.0.0.x IPs
that makes the issue valid.
On it.
@mkretzer In my case worker nodes kubelet loses its network parameters during upgrades, my personal fix is
echo "KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni" > /var/lib/kubelet/kubeadm-flags.env
That helped, thank you very much! For all our clusters: Its upgrade time! :-)
@neolit123
that makes the issue valid.
I've added my notes about this issue (upgrading the cluster) over here: https://github.com/kubernetes/kubeadm/issues/1347#issuecomment-456739287
@adoerler it seems like the unit file issue you outlined here is a separate one: https://github.com/kubernetes/kubeadm/issues/1347#issuecomment-456739287
but you are right, we do recommend to use package managers in recent versions and by using a package manager a unit file will be updated as well. i guess that was a problem in the ->1.12 upgrade doc.
as far as this issue goes we are pushing a fix for a certain bug in our library for DIFF: https://github.com/kubernetes/kubernetes/pull/73941
but this will only land in 1.14 and cannot be backported to older releases.
i'm going to have to close this issue, but if anyone finds a problem related to DIFF in 1.13 -> 1.14 upgrades please feel free to open a new ticket.
What keywords did you search in kubeadm issues before filing this one?
diff, upgrade
Is this a BUG REPORT or FEATURE REQUEST?
Choose one: BUG REPORT
Versions
kubeadm version (use
kubeadm version
):Environment:
kubectl version
):uname -a
): Linux s03 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/LinuxWhat happened?
I'm planning the upgrade 1.11.0 -> 1.11.1 . I upgraded deb packages for all nodes in cluster and then I did kubeadm upgrade diff to see the differences. I've noticed some configuration options change in a way that will break the cluster, and some I don't know about:
What you expected to happen?
Upgrade to be performed with minimal/no configuration changes.
How to reproduce it (as minimally and precisely as possible)?
Make 1.11 cluster with oidc values and custom advertise IP and then try to upgrade.
Anything else we need to know?
You are awesome ! :)