juju-solutions / matrix

Automatic testing of big software deployments under various failure conditions
Other
8 stars 9 forks source link

Glitch can get us into a state where we cannot reset #28

Closed pengale closed 7 years ago

pengale commented 7 years ago

Take the following glitch plan, and run it on wiki-simple to reproduce:

actions:
- action: destroy_machine
  selectors:
  - {selector: machines}
  - {selector: one}
- action: remove_unit
  selectors:
  - {application: mysql, selector: units}
  - {selector: leader, value: true}
  - {selector: one}
- action: kill_juju_agent
  selectors:
  - {application: wiki, selector: units}
  - {selector: leader, value: true}
  - {selector: one}
- action: kill_juju_agent
  selectors:
  - {application: wiki, selector: units}
  - {selector: leader, value: true}
  - {selector: one}
- action: destroy_machine
  selectors:
  - {selector: machines}
  - {selector: one}

You might get this Exception:

matrix:331:exception_handler: Traceback (most recent call last):

  File "/usr/lib/python3.5/asyncio/tasks.py", line 239, in _step
    result = coro.send(None)

  File "/home/petevg/Code/matrix/.tox/py35/lib/python3.5/site-packages/juju/application.py", line 164, in destroy
    return await app_facade.Destroy(self.name)

  File "/home/petevg/Code/matrix/.tox/py35/lib/python3.5/site-packages/juju/client/facade.py", line 317, in wrapper
    reply = await f(*args, **kwargs)

  File "/home/petevg/Code/matrix/.tox/py35/lib/python3.5/site-packages/juju/client/_client.py", line 7633, in Destroy
    reply = await self.rpc(msg)

  File "/home/petevg/Code/matrix/.tox/py35/lib/python3.5/site-packages/juju/client/facade.py", line 436, in rpc
    result = await self.connection.rpc(msg, encoder=TypeEncoder)

  File "/home/petevg/Code/matrix/.tox/py35/lib/python3.5/site-packages/juju/client/connection.py", line 93, in rpc
    raise JujuAPIError(result)

juju.errors.JujuAPIError: cannot destroy application "wiki": state changing too quickly; try again soon

The simplest thing to do might be to just catch that error, and retry the reset.

pengale commented 7 years ago

Bleh. It looks like that glitch plan won't consistently create the issue. I have seen it a couple of times, though.

pengale commented 7 years ago

Addressed in https://github.com/juju-solutions/matrix/pull/43