juju / amulet

Testing harness and tools for Juju Charms
Other
17 stars 33 forks source link

add_unit times out with error #105

Open mbruzek opened 9 years ago

mbruzek commented 9 years ago

The etcd charm fails in automated testing on an add_unit() amulet call on Azure.

A timeout is being reached and there appears to be no way to add timeout to add_unit()

Here is the log file with the error:

2015-10-02 09:30:30 Starting deployment of charm-testing-azure
2015-10-02 09:30:31 Deploying services...
2015-10-02 09:30:33  Deploying service etcd using /tmp/charmwx4Gpn/trusty/etcd
2015-10-02 09:37:13 Adding relations...
2015-10-02 09:37:14 Deployment complete in 404.21 seconds
.Timeout occurred, printing juju status...environment: charm-testing-azure
machines:
  "0":
    agent-state: started
    agent-version: 1.24.6
    dns-name: juju-charm-testing-azure-qjw8y4c6mz.cloudapp.net
    instance-id: juju-charm-testing-azure-qjw8y4c6mz-jujuivwksn2z80zortgzx3mvskweoiym3e2634lad5h94msyby
    instance-state: ReadyRole
    series: trusty
    hardware: arch=amd64 cpu-cores=2 mem=7168M root-disk=130048M
    state-server-member-status: has-vote
  "5":
    agent-state: started
    agent-version: 1.24.6
    dns-name: juju-charm-testing-azure-i4k45usv5k.cloudapp.net
    instance-id: juju-charm-testing-azure-i4k45usv5k-juju5vb1pch8yv2gkocvfycpqeh1nsobdp4hyqs1b832lpzkbx
    instance-state: ReadyRole
    series: trusty
    hardware: arch=amd64 cpu-cores=1 mem=3584M root-disk=130048M
  "6":
    agent-state: down
    agent-state-info: (started)
    agent-version: 1.24.6
    dns-name: juju-charm-testing-azure-em27xw1ha1.cloudapp.net
    instance-id: juju-charm-testing-azure-em27xw1ha1-jujuyk5kdzkfeauo2e0u0tgrpmu480jg0eza3qi3it2ttytw12
    instance-state: ReadyRole
    series: trusty
    hardware: arch=amd64 cpu-cores=1 mem=3584M root-disk=130048M
services:
  etcd:
    charm: local:trusty/etcd-2
    exposed: false
    service-status:
      current: maintenance
      message: installing charm software
      since: 02 Oct 2015 09:48:09Z
    relations:
      cluster:
      - etcd
    units:
      etcd/0:
        workload-status:
          current: unknown
          since: 02 Oct 2015 09:39:46Z
        agent-status:
          current: idle
          since: 02 Oct 2015 09:43:01Z
          version: 1.24.6
        agent-state: started
        agent-version: 1.24.6
        machine: "5"
        open-ports:
        - 4001/tcp
        public-address: juju-charm-testing-azure-i4k45usv5k.cloudapp.net
      etcd/1:
        workload-status:
          current: maintenance
          message: installing charm software
          since: 02 Oct 2015 09:48:09Z
        agent-status:
          current: executing
          message: running install hook
          since: 02 Oct 2015 09:48:09Z
          version: 1.24.6
        agent-state: pending
        agent-version: 1.24.6
        machine: "6"
        public-address: juju-charm-testing-azure-em27xw1ha1.cloudapp.net
E
======================================================================
ERROR: test_two_node_scale (__main__.TestDeployment)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/tmpnaYXLh/etcd/tests/10-deploy", line 26, in test_two_node_scale
    self.deployment.add_unit('etcd')
  File "/usr/lib/python2.7/dist-packages/amulet/deployer.py", line 182, in add_unit
    self.sentry = Talisman(self.services)
  File "/usr/lib/python2.7/dist-packages/amulet/sentry.py", line 202, in __init__
    status = self.wait_for_status(self.juju_env, services, timeout)
  File "/usr/lib/python2.7/dist-packages/amulet/sentry.py", line 340, in wait_for_status
    for i in helpers.timeout_gen(timeout):
  File "/usr/lib/python2.7/dist-packages/amulet/helpers.py", line 108, in timeout_gen
    raise TimeoutError()
TimeoutError

----------------------------------------------------------------------
Ran 2 tests in 904.803s

Here is the code that is running that test.

#!/usr/bin/python

import amulet
import unittest

class TestDeployment(unittest.TestCase):
    @classmethod
    def setUpClass(cls):
        cls.deployment = amulet.Deployment(series='trusty')
        cls.deployment.add('etcd')
        try:
            cls.deployment.setup(timeout=900)
            cls.deployment.sentry.wait()
        except amulet.helpers.TimeoutError:
            msg = "Environment wasn't stood up in time"
            amulet.raise_status(amulet.SKIP, msg=msg)
        except:
            raise

    def test_single_service(self):
        status = self.deployment.sentry['etcd/0'].run('service etcd status')
        self.assertTrue("running" in status[0])

    def test_two_node_scale(self):
        self.deployment.add_unit('etcd')
        self.deployment.sentry.wait()

        status1 = self.deployment.sentry['etcd/0'].run('service etcd status')
        status2 = self.deployment.sentry['etcd/1'].run('service etcd status')
        self.assertTrue("running" in status1[0])
        self.assertTrue("running" in status2[0])

if __name__ == '__main__':
    unittest.main()
marcoceppi commented 9 years ago

As an aside, a better way to access running units is to use the following notation:

status1 = self.deployment.sentry['etcd'][0].run('service etcd status')
status2 = self.deployment.sentry['etcd'][1].run('service etcd status')

This will always get the first and second unit, where unit numbers may have drifted or not consistent

lazypower commented 9 years ago

@marcoceppi wasn't this issue released/fixed with the latest amulet release? Also thanks for the pointer on unit accessors. I'll update the tests to reflect this pattern.