vmware-archive / kops

Kubernetes Operations (kops) - Production Grade K8s Installation, Upgrades, and Management
Apache License 2.0
3 stars 3 forks source link

Nodeup package installation fails often for Ubuntu 16.04 image. #24

Closed prashima closed 7 years ago

prashima commented 7 years ago

Selected output from 'journalctl -u kops-configuration.service'

Mar 29 23:16:42 master-us-west-2a nodeup[1785]: W0329 23:16:42.426974 1785 executor.go:109] error running task "package/socat" (8759h59m58s remaining to succeed): error installing package "socat": exit status 100: E: Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily unavailable) Mar 29 23:16:42 master-us-west-2a nodeup[1785]: E: Unable to lock the administration directory (/var/lib/dpkg/), is another process using it? Mar 29 23:16:42 master-us-west-2a nodeup[1785]: W0329 23:16:42.427391 1785 executor.go:109] error running task "package/nfs-common" (8759h59m58s remaining to succeed): error installing package "nfs-common": exit status 100: E: Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily unavailable) Mar 29 23:16:42 master-us-west-2a nodeup[1785]: E: Unable to lock the administration directory (/var/lib/dpkg/), is another process using it? Mar 29 23:16:42 master-us-west-2a nodeup[1785]: W0329 23:16:42.427734 1785 executor.go:109] error running task "Package/bridge-utils" (8759h59m58s remaining to succeed): error installing package "bridge-utils": exit status 100: E: Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily unavailable) Mar 29 23:16:42 master-us-west-2a nodeup[1785]: E: Unable to lock the administration directory (/var/lib/dpkg/), is another process using it?

prashima commented 7 years ago

Did some googling around this issue and found some similar problems- http://unix.stackexchange.com/questions/315502/how-to-disable-apt-daily-service-on-ubuntu-cloud-vm-image https://github.com/boxcutter/ubuntu/issues/73

I am working on a cloud-init based solution to kill the apt-daily.timer and apt-daily.service. But the problem is that if cloud-init executes before apt-daily.service gets started, and the service killing script has missed its chance.

If this does not work out, I will try to go to nodeup code and trigger the kill from nodeup.

luomiao commented 7 years ago

Is this issue specific for us or is it also a problem for other kops provider?

prashima commented 7 years ago

This is an ubuntu issue. But I checked on kops channel and people haven't seen this issue on for existing images yet.

I am also looking at option to disable apt-daily.service in the image itself.

prashima commented 7 years ago

Based on some more reading and analysis, I am going ahead with the option of updating the image so that apt does not run unattended upgrades.

Following were the options considered and reason why they were dropped:

Tested new image, deployed 10 clusters on vSphere. I don't see any package installation related issues any longer. kubectl get nodes is showing master and all nodes in ready state.

Also ran ps aux | grep 'apt' multiple times on master, while the configuration was going on. No process corresponding to '/usr/lib/apt/apt.systemd.daily' was running and no log updates for /var/log/unattended-upgrades/*log

References: https://help.ubuntu.com/lts/serverguide/automatic-updates.html http://askubuntu.com/questions/172524/how-can-i-check-if-automatic-updates-are-enabled http://ask.xmodulo.com/disable-automatic-updates-ubuntu.html