amulet run hangs - Githubissues

hloeung commented 7 years ago

Hi,

Trying to run the amulet tests against changes I made to the graylog charm[1], amulet hangs inside deployer.setup() and I'm not sure why.

| [hloeung@dharkan graylog]$ python3 tests/10-basic.py | 2017-08-09 17:50:10 Starting deployment of canonistack-haw-lcy02:admin/default | 2017-08-09 17:50:15 Deploying applications... | 2017-08-09 17:50:15 Deploying application apache2 using cs:xenial/apache2-21 | 2017-08-09 17:50:22 Deploying application elasticsearch using cs:~elasticsearch-charmers/xenial/elasticsearch-0 | 2017-08-09 17:50:29 Deploying application graylog using /tmp/charmexvatjhf/xenial/graylog | 2017-08-09 17:50:51 Deploying application mongodb using cs:xenial/mongodb-43 | 2017-08-09 17:56:40 Adding relations... | 2017-08-09 17:56:40 Adding relation graylog:mongodb <-> mongodb:database | 2017-08-09 17:56:40 Adding relation graylog:elasticsearch <-> elasticsearch:client | 2017-08-09 17:56:41 Adding relation graylog:website <-> apache2:reverseproxy | 2017-08-09 17:57:46 Deployment complete in 455.78 seconds (hangs here)

Process listing shows a hung juju process:

| 21901 pts/6 S+ 0:00 | _ python3 tests/10-basic.py | 23540 pts/6 Z+ 0:00 | _ [juju]

I've had someone else in my team also able to reproduce this so it's not my local set up.

[1]https://code.launchpad.net/~hloeung/graylog-charm/+git/graylog-charm

tvansteenburgh commented 7 years ago

How do you know it's hung? How long are you waiting? Amulet tries to wait until the deployment has settled (all agents idle) before sentry.wait() returns.

hloeung commented 7 years ago

So it seems to complete after about 2 hours. I've added some debugging output to the test as follows:

print("{} running cls.d.setup()".format(datetime.datetime.now().strftime("%d %b %Y %H:%M:%S")))
try:
    cls.d.setup(timeout=900)
    print("{} done running cls.d.setup(), running cls.d.sentry.wait()".format(datetime.datetime.now().strftime("%d %b %Y %H:%M:%S")))
    cls.d.sentry.wait()
    print("{} done cls.d.sentry.wait()".format(datetime.datetime.now().strftime("%d %b %Y %H:%M:%S")))
except amulet.helpers.TimeoutError:
    # Setup didn't complete before timeout
    pass
print("{} cls.elasticsearch = cls.d.sentry...".format(datetime.datetime.now().strftime("%d %b %Y %H:%M:%S")))

From that, the bulk of the time seems to be within amulet.Deployment.setup():

10 Aug 2017 10:14:18 running cls.d.setup() 10 Aug 2017 12:25:14 done running cls.d.setup(), running cls.d.sentry.wait() 10 Aug 2017 12:25:16 done cls.d.sentry.wait() 10 Aug 2017 12:25:16 cls.elasticsearch = cls.d.sentry...

Full output in pastebin - http://paste.ubuntu.com/25280876/ Juju status output in a loop in pastebin - http://paste.ubuntu.com/25280878/

I'm using snaps for juju:

[hloeung@dharkan ~]$ snap list Name Version Rev Developer Notes charm 2.2 15 charms classic core 16-2.26.14 2462 canonical - juju 2.2.2 2142 canonical classic

This is with the latest amulet from GitHub (so commit f1a5bad262668cdb5c6793584198b7c4bf568120).

hloeung commented 7 years ago

I added a wrapper to juju in a path before the snap path and it seems to be hanging on this:

2017-08-10 15:11:02 ssh -m canonistack-haw-lcy02:admin/default elasticsearch/35 -v mkdir -p -m a=rwx /tmp/amulet

From that, I found it's hanging on subprocess.communicate and was able to reproduce it:

[hloeung@dharkan graylog]$ python3

import subprocess cmd = ['juju', 'ssh', '-m', 'canonistack-haw-lcy02:admin/default', 'mongodb/33', '-v', 'mkdir', '-p', '-m', 'a=rwx', '/tmp/haw-test'] p = subprocess.Popen(cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE,) stdout, stderr = p.communicate()

I think we could add a timeout to subprocess.communicate() calls, but that breaks with earlier versions of python (timeout was introduced in 3.3)

hloeung commented 7 years ago

This isn't a fix for why the juju commands are hanging, so will still need to look into that.

juju / amulet

amulet run hangs #184