opencb / opencga

An Open Computational Genomics Analysis platform for big data genomics analysis. OpenCGA is maintained and develop by its parent company Zetta Genomics. Please contact support@zettagenomics.com for bug report and feature requests.
Apache License 2.0
166 stars 97 forks source link

When Linux VM extensions are running cloud-init scripts using apt fail. #1059

Closed marrobi closed 5 years ago

marrobi commented 5 years ago

Cloud-init script fails due to:

E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?

This is as the Linux diagnostics extension has a lock.

Full logs:

Cloud-init v. 18.4-0ubuntu1~18.04.1 running 'modules:config' at Mon, 21 Jan 2019 15:14:23 +0000. Up 43.50 seconds.
Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [83.2 kB]
Hit:2 http://azure.archive.ubuntu.com/ubuntu bionic InRelease
Get:3 http://azure.archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Get:4 http://azure.archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
Get:5 http://security.ubuntu.com/ubuntu bionic-security/universe Sources [29.4 kB]
Get:6 http://security.ubuntu.com/ubuntu bionic-security/multiverse Sources [1336 B]
Get:7 http://security.ubuntu.com/ubuntu bionic-security/main Sources [70.2 kB]
Get:8 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [242 kB]
Get:9 http://security.ubuntu.com/ubuntu bionic-security/main Translation-en [91.6 kB]
Get:10 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [113 kB]
Get:11 http://security.ubuntu.com/ubuntu bionic-security/universe Translation-en [64.2 kB]
Get:12 http://azure.archive.ubuntu.com/ubuntu bionic/restricted Sources [5324 B]
Get:13 http://azure.archive.ubuntu.com/ubuntu bionic/multiverse Sources [181 kB]
Get:14 http://azure.archive.ubuntu.com/ubuntu bionic/universe Sources [9051 kB]
Get:15 http://azure.archive.ubuntu.com/ubuntu bionic/main Sources [829 kB]
Get:16 http://azure.archive.ubuntu.com/ubuntu bionic-updates/restricted Sources [2064 B]
Get:17 http://azure.archive.ubuntu.com/ubuntu bionic-updates/multiverse Sources [3820 B]
Get:18 http://azure.archive.ubuntu.com/ubuntu bionic-updates/main Sources [231 kB]
Get:19 http://azure.archive.ubuntu.com/ubuntu bionic-updates/universe Sources [123 kB]
Get:20 http://azure.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [489 kB]
Get:21 http://azure.archive.ubuntu.com/ubuntu bionic-updates/main Translation-en [182 kB]
Get:22 http://azure.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [713 kB]
Get:23 http://azure.archive.ubuntu.com/ubuntu bionic-updates/universe Translation-en [176 kB]
Get:24 http://azure.archive.ubuntu.com/ubuntu bionic-backports/universe Sources [2068 B]
Get:25 http://azure.archive.ubuntu.com/ubuntu bionic-backports/universe amd64 Packages [3472 B]
Fetched 12.9 MB in 4s (3226 kB/s)
Reading package lists...
E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)
E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?
Cloud-init v. 18.4-0ubuntu1~18.04.1 running 'modules:final' at Mon, 21 Jan 2019 15:14:26 +0000. Up 46.73 seconds.
2019-01-21 15:14:41,921 - util.py[WARNING]: Package upgrade failed
2019-01-21 15:14:47,455 - cc_package_update_upgrade_install.py[WARNING]: 1 failed with exceptions, re-raising the last one
2019-01-21 15:14:47,456 - util.py[WARNING]: Running module package-update-upgrade-install (<module 'cloudinit.config.cc_package_update_upgrade_install' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_package_update_upgrade_install.py'>) failed
Hit:1 http://azure.archive.ubuntu.com/ubuntu bionic InRelease
Get:2 http://azure.archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Get:3 http://azure.archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
Hit:4 http://security.ubuntu.com/ubuntu bionic-security InRelease
Fetched 163 kB in 0s (363 kB/s)
Reading package lists...
# Executing docker install script, commit: 4957679
+ sh -c apt-get update -qq >/dev/null
+ sh -c apt-get install -y -qq apt-transport-https ca-certificates curl >/dev/null
E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)
E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?
Warning: apt-key output should not be parsed (stdout is not a terminal)
OK

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)
E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Reading package lists...
Building dependency tree...
Reading state information...
cifs-utils is already the newest version (2:6.8-1).
0 upgraded, 0 newly installed, 0 to remove and 20 not upgraded.
Mounting type: azurefiles
Mounting data: azurefileszy5svtumoaclm,opencgashare,HwfQT00ErQZp8173BVpuwRuFhuIOgHCiXDf8+dQUokUfkGUZ3tnHcHeP2yEjzdK9+Q1Lt+ZYnmU4KZOewJyjuA==
Attempting, with retries, to: install cifs-utils
Attempt #1
Attempt to install cifs-utils
Failed install cifs-utils error: Command '['apt', 'install', 'cifs-utils', '-y']' returned non-zero exit status 100.
Failed:Command '['apt', 'install', 'cifs-utils', '-y']' returned non-zero exit status 100.
retrying in 3secs
Attempt #2
Attempt to install cifs-utils
Install completed successfully
Succeeded to: install cifs-utils after 2 retries
Mounting primary
Done editing fstab ... attempting mount
Attempting, with retries, to: mount shares
Attempt #1
Succeeded to: mount shares after 1 retries
/var/lib/cloud/instance/scripts/runcmd: 6: /var/lib/cloud/instance/scripts/runcmd: docker: not found
/var/lib/cloud/instance/scripts/runcmd: 7: /var/lib/cloud/instance/scripts/runcmd: docker: not found
2019-01-21 15:14:58,359 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/runcmd [127]
2019-01-21 15:14:58,360 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
2019-01-21 15:14:58,360 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from
'/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed
Cloud-init v. 18.4-0ubuntu1~18.04.1 finished at Mon, 21 Jan 2019 15:14:59 +0000. Datasource DataSourceAzure [seed=/dev/sr0].  Up 80.17 seconds
marrobi commented 5 years ago

https://msftstack.wordpress.com/2016/05/12/extension-sequencing-in-azure-vm-scale-sets/

marrobi commented 5 years ago

@lawrencegripper suggests wrapping in python using functions similar to mount.py to retry apt installs. This will be an issue with all VMs requiring apt commands in cloud-init.

marrobi commented 5 years ago

Added - "- while ( fuser /var/lib/dpkg/lock >/dev/null 2>&1 ); do sleep 5; done;", to cloud-init

This has enabled deployment, but maybe isn't a great long term solution.

marrobi commented 5 years ago

Docker must be running before you install the Log Analytics agent for Linux on your container hosts. If you've already installed the agent before installing Docker, you need to reinstall the Log Analytics agent for Linux. For more information about Docker, see the Docker website.

lawrencegripper commented 5 years ago

So are we thinking about adding the extension as a final step of the deployment via an ACI container?

marrobi commented 5 years ago

I guess. Set all monitoring up later, but sort of defeats point, you want the logs when cloud-init commands run! Even with custom script extensions can't do dependencies within a VMSS... Messy. Cough... Kubernetes... Cough...

lawrencegripper commented 5 years ago

That would still leave Solr, Zookeeper, Daemon VM and Mongo to monitor though. As solr runs in docker wouldn't it have the same problem?

marrobi commented 5 years ago

Daemon would be easy to put on a cluster, Mongo - Atlas, or https://github.com/helm/charts/tree/master/stable/mongodb-replicaset, Zookeeper - https://github.com/helm/charts/tree/master/incubator/zookeeper, and can see people have done Solr on K8s. This just feels very painful and lots of custom scripts and hacks, that is going to be hard to maintain/understand for anyone who hasn't done the work with us. I didn't think (and don't really feel) Kubernetes is right, but it just solves many of the problems - I'm sure it would have its own - also presents potential marketplace challenges.

marrobi commented 5 years ago

Feels dirty, but to get moving: wget https://raw.githubusercontent.com/Microsoft/OMS-Agent-for-Linux/master/installer/scripts/onboard_agent.sh && sh onboard_agent.sh -w <YOUR OMS WORKSPACE ID> -s <YOUR OMS WORKSPACE PRIMARY KEY>

lawrencegripper commented 5 years ago

I like it, we then have deterministic approach with cloud init, in terms of timing. We have a lot of other stuff currently pulling down and executing script.

One change would be to pick a commitSHA or tag for the URL to isolate against changes on the master branch

marrobi commented 5 years ago

closing this, as will work round to not use vm extension