nickstenning / honcho

Honcho: a python clone of Foreman. For managing Procfile-based applications.
http://pypi.python.org/pypi/honcho
MIT License
1.6k stars 146 forks source link

Intermittent test failures on OS X #101

Closed msabramo closed 9 years ago

msabramo commented 9 years ago

I don't know yet if it has anything to do with OS X or my local system or if it's a bug in honcho, but I'm periodically getting test failures on the master branch:

❯ tox -e py27
GLOB sdist-make: /Users/marca/dev/git-repos/honcho/setup.py
py27 inst-nodeps: /Users/marca/dev/git-repos/honcho/.tox/dist/honcho-0.5.0.zip
py27 runtests: PYTHONHASHSEED='3869475230'
py27 runtests: commands[0] | pip install -q -e .[export]
py27 runtests: commands[1] | nosetests
......FFF.............................................................................
======================================================================
FAIL: test_get_port_from_dot_env (honcho.test.integration.test_ports.TestPorts)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/marca/dev/git-repos/honcho/honcho/test/integration/test_ports.py", line 40, in test_get_port_from_dot_env
    self.assertRegexpMatches(out, r'web\.1    \| (....)?PORT=8000')
AssertionError: Regexp didn't match: 'web\\.1    \\| (....)?PORT=8000' not found in '14:53:43 system   | es.1 started (pid=87007)\n14:53:43 system   | worker.1 started (pid=87009)\n14:53:43 system   | redis.1 started (pid=87010)\n14:53:43 system   | web.1 started (pid=87011)\n\x1b[35m14:53:43 es.1     | \x1b[0mPORT=8300\n\x1b[33m14:53:43 worker.1 | \x1b[0mPORT=8100\n14:53:43 system   | es.1 stopped (rc=0)\n14:53:43 system   | sending SIGTERM to worker.1 (pid 87009)\n14:53:43 system   | sending SIGTERM to redis.1 (pid 87010)\n14:53:43 system   | sending SIGTERM to web.1 (pid 87011)\n14:53:43 system   | worker.1 stopped (rc=-15)\n14:53:43 system   | web.1 stopped (rc=-15)\n14:53:43 system   | redis.1 stopped (rc=-15)\n'

======================================================================
FAIL: test_get_port_from_env (honcho.test.integration.test_ports.TestPorts)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/marca/dev/git-repos/honcho/honcho/test/integration/test_ports.py", line 50, in test_get_port_from_env
    self.assertRegexpMatches(out, r'web\.1    \| (....)?PORT=3000')
AssertionError: Regexp didn't match: 'web\\.1    \\| (....)?PORT=3000' not found in '14:53:44 system   | es.1 started (pid=87017)\n14:53:44 system   | redis.1 started (pid=87020)\n14:53:44 system   | worker.1 started (pid=87019)\n\x1b[35m14:53:44 es.1     | \x1b[0mPORT=3300\n14:53:44 system   | web.1 started (pid=87021)\n14:53:44 system   | es.1 stopped (rc=0)\n14:53:44 system   | sending SIGTERM to worker.1 (pid 87019)\n14:53:44 system   | sending SIGTERM to redis.1 (pid 87020)\n14:53:44 system   | sending SIGTERM to web.1 (pid 87021)\n14:53:44 system   | web.1 stopped (rc=-15)\n14:53:44 system   | worker.1 stopped (rc=-15)\n14:53:44 system   | redis.1 stopped (rc=-15)\n'

======================================================================
FAIL: test_proctype_increment (honcho.test.integration.test_ports.TestPorts)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/marca/dev/git-repos/honcho/honcho/test/integration/test_ports.py", line 15, in test_proctype_increment
    self.assertRegexpMatches(out, r'web\.1    \| (....)?PORT=5000')
AssertionError: Regexp didn't match: 'web\\.1    \\| (....)?PORT=5000' not found in '14:53:44 system   | es.1 started (pid=87027)\n14:53:44 system   | worker.1 started (pid=87028)\n14:53:44 system   | redis.1 started (pid=87029)\n14:53:44 system   | web.1 started (pid=87030)\n\x1b[35m14:53:44 es.1     | \x1b[0mPORT=5300\n\x1b[33m14:53:44 worker.1 | \x1b[0mPORT=5100\n14:53:44 system   | es.1 stopped (rc=0)\n14:53:44 system   | sending SIGTERM to worker.1 (pid 87028)\n14:53:44 system   | sending SIGTERM to redis.1 (pid 87029)\n14:53:44 system   | sending SIGTERM to web.1 (pid 87030)\n14:53:44 system   | worker.1 stopped (rc=-15)\n14:53:44 system   | redis.1 stopped (rc=-15)\n14:53:44 system   | web.1 stopped (rc=-15)\n'

----------------------------------------------------------------------
Ran 86 tests in 6.038s

FAILED (failures=3)
ERROR: InvocationError: '/Users/marca/dev/git-repos/honcho/.tox/py27/bin/nosetests'
___________________________________________________________________________________ summary ____________________________________________________________________________________
ERROR:   py27: commands failed

It's intermittent though. Sometimes all of the tests pass and sometimes 1-3 of them fail.

msabramo commented 9 years ago

Here's what output looks like during one such failure:

❯ tox -e py27 -- -s --pdb honcho.test.integration.test_ports:TestPorts.test_proctype_increment
GLOB sdist-make: /Users/marca/dev/git-repos/honcho/setup.py
py27 inst-nodeps: /Users/marca/dev/git-repos/honcho/.tox/dist/honcho-0.5.0.zip
py27 runtests: PYTHONHASHSEED='1856189262'
py27 runtests: commands[0] | pip install -q -e .[export]
py27 runtests: commands[1] | nosetests -s --pdb honcho.test.integration.test_ports:TestPorts.test_proctype_increment
> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/case.py(1002)assertRegexpMatches()
-> raise self.failureException(msg)
(Pdb) u
> /Users/marca/dev/git-repos/honcho/honcho/test/integration/test_ports.py(19)test_proctype_increment()
-> self.assertRegexpMatches(out, r'web\.1    \| (....)?PORT=5000')
(Pdb) print out
15:18:13 system   | es.1 started (pid=92392)
15:18:13 system   | redis.1 started (pid=92393)
15:18:13 system   | web.1 started (pid=92394)
15:18:13 system   | worker.1 started (pid=92395)
15:18:13 es.1     | PORT=5300
15:18:13 system   | es.1 stopped (rc=0)
15:18:13 system   | sending SIGTERM to redis.1 (pid 92393)
15:18:13 system   | sending SIGTERM to web.1 (pid 92394)
15:18:13 system   | sending SIGTERM to worker.1 (pid 92395)
15:18:13 system   | web.1 stopped (rc=-15)
15:18:13 system   | redis.1 stopped (rc=-15)
15:18:13 system   | worker.1 stopped (rc=-15)
msabramo commented 9 years ago

I suspect a race condition where it takes some time for processes to start up and have the output match the expected state. Sometimes it happens before the assertion and sometimes it doesn't.

msabramo commented 9 years ago

Success looks like this:

❯ tox -e py27 -- -s honcho.test.integration.test_ports:TestPorts.test_get_port_from_env
GLOB sdist-make: /Users/marca/dev/git-repos/honcho/setup.py
py27 inst-nodeps: /Users/marca/dev/git-repos/honcho/.tox/dist/honcho-0.5.0.zip
py27 runtests: PYTHONHASHSEED='2418797409'
py27 runtests: commands[0] | pip install -q -e .[export]
py27 runtests: commands[1] | nosetests -s honcho.test.integration.test_ports:TestPorts.test_get_port_from_env
> /Users/marca/dev/git-repos/honcho/honcho/test/integration/test_ports.py(51)test_get_port_from_env()
-> self.assertRegexpMatches(out, r'web\.1    \| (....)?PORT=3000')
(Pdb) print out
15:32:47 system   | es.1 started (pid=95009)
15:32:47 system   | web.1 started (pid=95010)
15:32:47 system   | redis.1 started (pid=95011)
15:32:47 system   | worker.1 started (pid=95012)
15:32:47 es.1     | PORT=3300
15:32:47 web.1    | PORT=3000
15:32:47 system   | es.1 stopped (rc=0)
15:32:47 system   | sending SIGTERM to web.1 (pid 95010)
15:32:47 system   | sending SIGTERM to redis.1 (pid 95011)
15:32:47 system   | sending SIGTERM to worker.1 (pid 95012)
15:32:47 system   | web.1 stopped (rc=-15)
15:32:47 system   | redis.1 stopped (rc=-15)
15:32:47 system   | worker.1 stopped (rc=-15)

(Pdb) c
.
----------------------------------------------------------------------
Ran 1 test in 8.299s

OK
___________________________________________________________________________________ summary ____________________________________________________________________________________
  py27: commands succeeded
  congratulations :)

Failure looks like this:

❯ tox -e py27 -- -s honcho.test.integration.test_ports:TestPorts.test_get_port_from_env
GLOB sdist-make: /Users/marca/dev/git-repos/honcho/setup.py
py27 inst-nodeps: /Users/marca/dev/git-repos/honcho/.tox/dist/honcho-0.5.0.zip
py27 runtests: PYTHONHASHSEED='2812475943'
py27 runtests: commands[0] | pip install -q -e .[export]
py27 runtests: commands[1] | nosetests -s honcho.test.integration.test_ports:TestPorts.test_get_port_from_env
> /Users/marca/dev/git-repos/honcho/honcho/test/integration/test_ports.py(51)test_get_port_from_env()
-> self.assertRegexpMatches(out, r'web\.1    \| (....)?PORT=3000')
(Pdb) print out
15:35:39 system   | es.1 started (pid=96232)
15:35:39 system   | web.1 started (pid=96233)
15:35:39 system   | redis.1 started (pid=96234)
15:35:39 system   | worker.1 started (pid=96235)
15:35:39 es.1     | PORT=3300
15:35:39 system   | es.1 stopped (rc=0)
15:35:39 system   | sending SIGTERM to web.1 (pid 96233)
15:35:39 system   | sending SIGTERM to redis.1 (pid 96234)
15:35:39 system   | sending SIGTERM to worker.1 (pid 96235)
15:35:39 system   | redis.1 stopped (rc=-15)
15:35:39 system   | web.1 stopped (rc=-15)
15:35:39 system   | worker.1 stopped (rc=-15)

(Pdb) c
F
======================================================================
FAIL: test_get_port_from_env (honcho.test.integration.test_ports.TestPorts)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/marca/dev/git-repos/honcho/honcho/test/integration/test_ports.py", line 51, in test_get_port_from_env
    self.assertRegexpMatches(out, r'web\.1    \| (....)?PORT=3000')
AssertionError: Regexp didn't match: 'web\\.1    \\| (....)?PORT=3000' not found in '15:35:39 system   | es.1 started (pid=96232)\n15:35:39 system   | web.1 started (pid=96233)\n15:35:39 system   | redis.1 started (pid=96234)\n15:35:39 system   | worker.1 started (pid=96235)\n\x1b[35m15:35:39 es.1     | \x1b[0mPORT=3300\n15:35:39 system   | es.1 stopped (rc=0)\n15:35:39 system   | sending SIGTERM to web.1 (pid 96233)\n15:35:39 system   | sending SIGTERM to redis.1 (pid 96234)\n15:35:39 system   | sending SIGTERM to worker.1 (pid 96235)\n15:35:39 system   | redis.1 stopped (rc=-15)\n15:35:39 system   | web.1 stopped (rc=-15)\n15:35:39 system   | worker.1 stopped (rc=-15)\n'

----------------------------------------------------------------------
Ran 1 test in 3.745s

FAILED (failures=1)
ERROR: InvocationError: '/Users/marca/dev/git-repos/honcho/.tox/py27/bin/nosetests -s honcho.test.integration.test_ports:TestPorts.test_get_port_from_env'
___________________________________________________________________________________ summary ____________________________________________________________________________________
ERROR:   py27: commands failed
msabramo commented 9 years ago

Hmmm, there's some non-determinism going on here.

Most of the time I see (SIGTERM message appears after PORT=5000 message):

❯ honcho start -f Procfile.ports | grep 'web\.1.*'
15:43:51 system   | web.1 started (pid=390)
15:43:51 web.1    | PORT=5000
15:43:51 system   | sending SIGTERM to web.1 (pid 390)
15:43:51 system   | web.1 stopped (rc=-15)

but occasionally I get (SIGTERM message appears before PORT=5000 message):

❯ honcho start -f Procfile.ports | grep 'web\.1.*'
15:49:36 system   | web.1 started (pid=3787)
15:49:36 system   | sending SIGTERM to web.1 (pid 3787)
15:49:36 web.1    | PORT=5000
15:49:36 system   | web.1 stopped (rc=-15)

or (PORT=5000 never appears in output`):

❯ honcho start -f Procfile.ports | grep 'web\.1.*'
15:44:09 system   | web.1 started (pid=849)
15:44:09 system   | sending SIGTERM to web.1 (pid 849)
15:44:09 system   | web.1 stopped (rc=-15)
msabramo commented 9 years ago

I thought that bumping up the sleep interval in Procfile.ports might help:

web: echo PORT=$PORT && sleep 10
worker: echo PORT=$PORT && sleep 10
redis: echo PORT=$PORT && sleep 10
es: echo PORT=$PORT

but it doesn't:

❯ honcho start -f Procfile.ports | grep 'web\.1.*'
15:45:55 system   | web.1 started (pid=2217)
15:45:55 system   | sending SIGTERM to web.1 (pid 2217)
15:45:55 system   | web.1 stopped (rc=-15)
msabramo commented 9 years ago

I have a theory -- that it's happening because the last process in Procfile.ports (the es process) has no sleep.

According to https://github.com/nickstenning/honcho/blob/master/honcho/manager.py#L83:

If one process terminates, all the others will be terminated by Honcho, and #loop() will return.

Thus, if es has no sleep and exits right away, everything else will be terminated too, pretty quickly, and possibly sometimes terminated before the echo gets written out.

I'll test with:

# Procfile.ports
web: echo PORT=$PORT && sleep 1
worker: echo PORT=$PORT && sleep 1
redis: echo PORT=$PORT && sleep 1
es: echo PORT=$PORT && sleep 1

Note the sleep 1 that I added to the es job. Hopefully, this makes things happier...

msabramo commented 9 years ago

With https://github.com/nickstenning/honcho/pull/102, I haven't seen a test failure yet. Crossing fingers... :smile: