openstack-charmers / zaza

A Python3-only functional test framework for Charms
Apache License 2.0
11 stars 47 forks source link

Juju controller connection resilience: [ERROR] Task exception was never retrieved #299

Closed fnordahl closed 5 years ago

fnordahl commented 5 years ago
2019-10-02 06:30:19 [WARNING] RPC: Connection closed, reconnecting
2019-10-02 06:30:19 [WARNING] Receiver: Connection closed, reconnecting
2019-10-02 06:30:29 [ERROR] Task exception was never retrieved
future: <Task finished coro=<Connection.reconnect() done, defined at /tmp/tmp.Afq1xXiq7A/func/lib/python3.5/site-packages/juju/client/connection.py:563> exception=OSError(113, "Connect call failed ('172.17.112.25', 17070)")>
Traceback (most recent call last):
  File "/usr/lib/python3.5/asyncio/tasks.py", line 239, in _step
    result = coro.send(None)
  File "/tmp/tmp.Afq1xXiq7A/func/lib/python3.5/site-packages/juju/client/connection.py", line 571, in reconnect
    await self._connect_with_login([(self.endpoint, self.cacert)])
  File "/tmp/tmp.Afq1xXiq7A/func/lib/python3.5/site-packages/juju/client/connection.py", line 631, in _connect_with_login
    await self._connect(endpoints)
  File "/tmp/tmp.Afq1xXiq7A/func/lib/python3.5/site-packages/juju/client/connection.py", line 591, in _connect
    result = await task
  File "/usr/lib/python3.5/asyncio/tasks.py", line 492, in _wait_for_one
    return f.result()  # May raise f.exception().
  File "/usr/lib/python3.5/asyncio/futures.py", line 274, in result
    raise self._exception
  File "/usr/lib/python3.5/asyncio/tasks.py", line 241, in _step
    result = coro.throw(exc)
  File "/tmp/tmp.Afq1xXiq7A/func/lib/python3.5/site-packages/juju/client/connection.py", line 580, in _try_endpoint
    return await self._open(endpoint, cacert)
  File "/tmp/tmp.Afq1xXiq7A/func/lib/python3.5/site-packages/juju/client/connection.py", line 334, in _open
    max_size=self.max_frame_size,
  File "/tmp/tmp.Afq1xXiq7A/func/lib/python3.5/site-packages/websockets/py35/client.py", line 12, in __await_impl__
    transport, protocol = await self._creating_connection
  File "/usr/lib/python3.5/asyncio/base_events.py", line 695, in create_connection
    raise exceptions[0]
  File "/usr/lib/python3.5/asyncio/base_events.py", line 682, in create_connection
    yield from self.sock_connect(sock, address)
  File "/usr/lib/python3.5/asyncio/selector_events.py", line 402, in sock_connect
    return (yield from fut)
  File "/usr/lib/python3.5/asyncio/futures.py", line 361, in __iter__
    yield self  # This tells Task to wait for completion.
  File "/usr/lib/python3.5/asyncio/tasks.py", line 296, in _wakeup
    future.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 274, in result
    raise self._exception
  File "/usr/lib/python3.5/asyncio/selector_events.py", line 432, in _sock_connect_cb
    raise OSError(err, 'Connect call failed %s' % (address,))
OSError: [Errno 113] Connect call failed ('172.17.112.25', 17070)
2019-10-02 06:30:29 [ERROR] RPC: Automatic reconnect failed
Traceback (most recent call last):
  File "/tmp/tmp.Afq1xXiq7A/func/bin/functest-run-suite", line 10, in <module>
    sys.exit(main())
  File "/tmp/tmp.Afq1xXiq7A/func/lib/python3.5/site-packages/zaza/charm_lifecycle/func_test_runner.py", line 162, in main
    bundle=args.bundle)
  File "/tmp/tmp.Afq1xXiq7A/func/lib/python3.5/site-packages/zaza/charm_lifecycle/func_test_runner.py", line 107, in func_test_runner
    run_env_deployment(env_deployment, keep_model=preserve_model)
  File "/tmp/tmp.Afq1xXiq7A/func/lib/python3.5/site-packages/zaza/charm_lifecycle/func_test_runner.py", line 53, in run_env_deployment
    model_ctxt=model_aliases)
  File "/tmp/tmp.Afq1xXiq7A/func/lib/python3.5/site-packages/zaza/charm_lifecycle/deploy.py", line 294, in deploy
    test_config.get('target_deploy_status', {}))
  File "/tmp/tmp.Afq1xXiq7A/func/lib/python3.5/site-packages/zaza/__init__.py", line 48, in _wrapper
    return run(_run_it())
  File "/tmp/tmp.Afq1xXiq7A/func/lib/python3.5/site-packages/zaza/__init__.py", line 36, in run
    return task.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 274, in result
    raise self._exception
  File "/usr/lib/python3.5/asyncio/tasks.py", line 239, in _step
    result = coro.send(None)
  File "/tmp/tmp.Afq1xXiq7A/func/lib/python3.5/site-packages/zaza/__init__.py", line 47, in _run_it
    return await f(*args, **kwargs)
  File "/tmp/tmp.Afq1xXiq7A/func/lib/python3.5/site-packages/zaza/model.py", line 855, in async_wait_for_application_states
    timeout=timeout)
  File "/tmp/tmp.Afq1xXiq7A/func/lib/python3.5/site-packages/juju/model.py", line 713, in block_until
    raise websockets.ConnectionClosed(1006, 'no reason')
websockets.exceptions.ConnectionClosed: WebSocket connection is closed: code = 1006 (connection closed abnormally [internal]), reason = no reason

Full CI artifact set: https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline_func_full/openstack/charm-neutron-api/684348/5/3934/index.html

fnordahl commented 5 years ago

As I had three other test runs die with other variations of a juju controller not responding, it may be that this issue is a red herring and that the underlying issue was an actual controller operational issue.