kontena / pharos-host-upgrades

Kube DaemonSet for host OS upgrades
Apache License 2.0
41 stars 1 forks source link

Overlapping upgrades break the kube locks #3

Closed SpComb closed 6 years ago

SpComb commented 6 years ago

If the upgrade --schedule causes a new cron task to start while the previous upgrade is still running, the follow-up upgrade should fail, and the initial upgrade should be allowed to run to completion.

This should already work for the systemd exec mechanism, but the kube locking is presumably broken: the overlapping cron task will acquire the lock since it already has the correct value, fail on the systemd exec, and then release the lock... leaving the initial upgrade running with the lock cleared.

2018/05/22 15:30:59 hosts/ubuntu probe success: systemd.HostInfo{KernelName:"Linux", Hostname:"ubuntu-xenial", OperatingSystemPrettyName:"Ubuntu 16.04.4 LTS", KernelVersion:"#153-Ubuntu SMP Sat May 19 10:58:46 UTC 2018", KernelRelease:"4.4.0-127-generic"}
2018/05/22 15:30:59 Probed host: hosts.HostInfo{OperatingSystem:"Ubuntu", OperatingSystemRelease:"16.04.4", Kernel:"Linux", KernelRelease:"4.4.0-127-generic"}
2018/05/22 15:30:59 Using --kube-namespace=kube-system --kube-daemonset=host-upgrades --kube-node=ubuntu-xenial
2018/05/22 15:30:59 kube/lock kube-system/daemonsets/host-upgrades: get
2018/05/22 15:30:59 kube/lock kube-system/daemonsets/host-upgrades: test pharos-host-upgrades.kontena.io/lock=: free
2018/05/22 15:30:59 Using kube lock kube-system/daemonsets/host-upgrades (acquired=false, value=)
2018/05/22 15:30:59 Using --schedule="@every 1s", first upgrade at: 2018-05-22 15:31:00 +0000 UTC m=+0.800165803 (in 758.923058ms)
2018/05/22 15:31:00 Acquiring kube lock...
2018/05/22 15:31:00 kube/lock kube-system/daemonsets/host-upgrades: wait
2018/05/22 15:31:00 kube/lock kube-system/daemonsets/host-upgrades: get
2018/05/22 15:31:00 kube/lock kube-system/daemonsets/host-upgrades: test pharos-host-upgrades.kontena.io/lock=: free
2018/05/22 15:31:00 kube/lock kube-system/daemonsets/host-upgrades: acquire
2018/05/22 15:31:00 kube/lock kube-system/daemonsets/host-upgrades: set pharos-host-upgrades.kontena.io/lock=ubuntu-xenial
2018/05/22 15:31:00 kube/lock kube-system/daemonsets/host-upgrades: update
2018/05/22 15:31:00 Running host upgrades...
2018/05/22 15:31:00 hosts/ubuntu upgrade: [/usr/bin/unattended-upgrade -v]
2018/05/22 15:31:00 systemd/exec host-upgrades.service: cmd=[/usr/bin/unattended-upgrade -v]
2018/05/22 15:31:00 systemd/exec host-upgrades.service: reset
2018/05/22 15:31:00 systemd/exec host-upgrades.service: start []dbus.Property{dbus.Property{Name:"ExecStart", Value:dbus.Variant{sig:dbus.Signature{str:"a(sasb)"}, value:[]dbus.execStart{dbus.execStart{Path:"/usr/bin/unattended-upgrade", Args:[]string{"/usr/bin/unattended-upgrade", "-v"}, UncleanIsFailure:false}}}}, dbus.Property{Name:"Type", Value:dbus.Variant{sig:dbus.Signature{str:"s"}, value:"oneshot"}}}
2018/05/22 15:31:00 systemd/exec host-upgrades.service: wait
2018/05/22 15:31:01 Acquiring kube lock...
2018/05/22 15:31:01 kube/lock kube-system/daemonsets/host-upgrades: wait
2018/05/22 15:31:01 kube/lock kube-system/daemonsets/host-upgrades: get
2018/05/22 15:31:01 kube/lock kube-system/daemonsets/host-upgrades: test pharos-host-upgrades.kontena.io/lock=ubuntu-xenial: acquired
2018/05/22 15:31:01 kube/lock kube-system/daemonsets/host-upgrades: acquire
2018/05/22 15:31:01 kube/lock kube-system/daemonsets/host-upgrades: set pharos-host-upgrades.kontena.io/lock=ubuntu-xenial
2018/05/22 15:31:01 kube/lock kube-system/daemonsets/host-upgrades: update
2018/05/22 15:31:01 Running host upgrades...
2018/05/22 15:31:01 hosts/ubuntu upgrade: [/usr/bin/unattended-upgrade -v]
2018/05/22 15:31:01 systemd/exec host-upgrades.service: cmd=[/usr/bin/unattended-upgrade -v]
2018/05/22 15:31:01 systemd/exec host-upgrades.service: reset
2018/05/22 15:31:01 systemd/exec host-upgrades.service: start []dbus.Property{dbus.Property{Name:"ExecStart", Value:dbus.Variant{sig:dbus.Signature{str:"a(sasb)"}, value:[]dbus.execStart{dbus.execStart{Path:"/usr/bin/unattended-upgrade", Args:[]string{"/usr/bin/unattended-upgrade", "-v"}, UncleanIsFailure:false}}}}, dbus.Property{Name:"Type", Value:dbus.Variant{sig:dbus.Signature{str:"s"}, value:"oneshot"}}}
2018/05/22 15:31:01 kube/lock kube-system/daemonsets/host-upgrades: get
2018/05/22 15:31:01 kube/lock kube-system/daemonsets/host-upgrades: release
2018/05/22 15:31:01 kube/lock kube-system/daemonsets/host-upgrades: clear pharos-host-upgrades.kontena.io/lock=ubuntu-xenial
2018/05/22 15:31:01 kube/lock kube-system/daemonsets/host-upgrades: update
2018/05/22 15:31:01 exec [/usr/bin/unattended-upgrade -v]: dbus.StartTransientUnit host-upgrades.service: Unit host-upgrades.service already exists.

The log is missing the failing lock release for the initial cron task, because the pod crashes with the log.Fatalf for the overlapping cron task.