Closed hloeung closed 2 years ago
How do you know it's hung? How long are you waiting? Amulet tries to wait until the deployment has settled (all agents idle) before sentry.wait() returns.
So it seems to complete after about 2 hours. I've added some debugging output to the test as follows:
print("{} running cls.d.setup()".format(datetime.datetime.now().strftime("%d %b %Y %H:%M:%S")))
try:
cls.d.setup(timeout=900)
print("{} done running cls.d.setup(), running cls.d.sentry.wait()".format(datetime.datetime.now().strftime("%d %b %Y %H:%M:%S")))
cls.d.sentry.wait()
print("{} done cls.d.sentry.wait()".format(datetime.datetime.now().strftime("%d %b %Y %H:%M:%S")))
except amulet.helpers.TimeoutError:
# Setup didn't complete before timeout
pass
print("{} cls.elasticsearch = cls.d.sentry...".format(datetime.datetime.now().strftime("%d %b %Y %H:%M:%S")))
From that, the bulk of the time seems to be within amulet.Deployment.setup():
10 Aug 2017 10:14:18 running cls.d.setup() 10 Aug 2017 12:25:14 done running cls.d.setup(), running cls.d.sentry.wait() 10 Aug 2017 12:25:16 done cls.d.sentry.wait() 10 Aug 2017 12:25:16 cls.elasticsearch = cls.d.sentry...
Full output in pastebin - http://paste.ubuntu.com/25280876/ Juju status output in a loop in pastebin - http://paste.ubuntu.com/25280878/
I'm using snaps for juju:
[hloeung@dharkan ~]$ snap list Name Version Rev Developer Notes charm 2.2 15 charms classic core 16-2.26.14 2462 canonical - juju 2.2.2 2142 canonical classic
This is with the latest amulet from GitHub (so commit f1a5bad262668cdb5c6793584198b7c4bf568120).
I added a wrapper to juju in a path before the snap path and it seems to be hanging on this:
2017-08-10 15:11:02 ssh -m canonistack-haw-lcy02:admin/default elasticsearch/35 -v mkdir -p -m a=rwx /tmp/amulet
From that, I found it's hanging on subprocess.communicate and was able to reproduce it:
[hloeung@dharkan graylog]$ python3
import subprocess cmd = ['juju', 'ssh', '-m', 'canonistack-haw-lcy02:admin/default', 'mongodb/33', '-v', 'mkdir', '-p', '-m', 'a=rwx', '/tmp/haw-test'] p = subprocess.Popen(cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE,) stdout, stderr = p.communicate()
I think we could add a timeout to subprocess.communicate() calls, but that breaks with earlier versions of python (timeout was introduced in 3.3)
This isn't a fix for why the juju commands are hanging, so will still need to look into that.
Hi,
Trying to run the amulet tests against changes I made to the graylog charm[1], amulet hangs inside deployer.setup() and I'm not sure why.
| [hloeung@dharkan graylog]$ python3 tests/10-basic.py | 2017-08-09 17:50:10 Starting deployment of canonistack-haw-lcy02:admin/default | 2017-08-09 17:50:15 Deploying applications... | 2017-08-09 17:50:15 Deploying application apache2 using cs:xenial/apache2-21 | 2017-08-09 17:50:22 Deploying application elasticsearch using cs:~elasticsearch-charmers/xenial/elasticsearch-0 | 2017-08-09 17:50:29 Deploying application graylog using /tmp/charmexvatjhf/xenial/graylog | 2017-08-09 17:50:51 Deploying application mongodb using cs:xenial/mongodb-43 | 2017-08-09 17:56:40 Adding relations... | 2017-08-09 17:56:40 Adding relation graylog:mongodb <-> mongodb:database | 2017-08-09 17:56:40 Adding relation graylog:elasticsearch <-> elasticsearch:client | 2017-08-09 17:56:41 Adding relation graylog:website <-> apache2:reverseproxy | 2017-08-09 17:57:46 Deployment complete in 455.78 seconds (hangs here)
Process listing shows a hung juju process:
| 21901 pts/6 S+ 0:00 | _ python3 tests/10-basic.py | 23540 pts/6 Z+ 0:00 | _ [juju]
I've had someone else in my team also able to reproduce this so it's not my local set up.
[1]https://code.launchpad.net/~hloeung/graylog-charm/+git/graylog-charm