juju / python-libjuju

Python library for the Juju API
Apache License 2.0
59 stars 100 forks source link

Look into cancellation safety for (e.g.) reconnection #1112

Open james-garner-canonical opened 3 weeks ago

james-garner-canonical commented 3 weeks ago

Description

1103 fixes the inability to reconnect on 3.12 due to a change to asyncio, but also revealed that there was an intent to shield reconnect from cancellation. It's not clear whether (a) this is currently needed and (b) whether it's currently protected at all by the current version of asyncio.wait. Let's look into this.

Urgency

Casually opening an issue

james-garner-canonical commented 1 week ago

Curious traceback in a passing(!) integration test -- related to #1078 ?

https://github.com/juju/python-libjuju/actions/runs/11267536635/job/31332843965

tests/integration/test_unit.py::test_destroy_unit 
Task was destroyed but it is pending!
task: <Task pending name='Task_Pinger' coro=<Connection._pinger() running at /home/runner/work/python-libjuju/python-libjuju/juju/client/connection.py:617> wait_for=<Future pending cb=[Task.task_wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task_Receiver' coro=<Connection._receiver() running at /home/runner/work/python-libjuju/python-libjuju/juju/client/connection.py:567> wait_for=<Future pending cb=[Task.task_wakeup()]>>
Fatal error on SSL transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x7f6a01c[232](https://github.com/juju/python-libjuju/actions/runs/11267536635/job/31332843965#step:5:233)20>
transport: <_SelectorSocketTransport closing fd=15>
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/asyncio/selector_events.py", line 924, in write
    n = self._sock.send(data)
OSError: [Errno 9] Bad file descriptor

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/asyncio/sslproto.py", line 690, in _process_write_backlog
    self._transport.write(chunk)
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/asyncio/selector_events.py", line 930, in write
    self._fatal_error(exc, 'Fatal write error on socket transport')
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/asyncio/selector_events.py", line 725, in _fatal_error
    self._force_close(exc)
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/asyncio/selector_events.py", line 737, in _force_close
    self._loop.call_soon(self._call_connection_lost, exc)
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/asyncio/base_events.py", line 753, in call_soon
    self._check_closed()
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
Error:     raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
Task was destroyed but it is pending!
task: <Task pending name='Task-9208' coro=<WebSocketCommonProtocol.transfer_data() running at /home/runner/work/python-libjuju/python-libjuju/.tox/py3/lib/python3.10/site-packages/websockets/legacy/protocol.py:953> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[Task.task_wakeup(), _wait.<locals>._on_completion() at /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/asyncio/tasks.py:475]>
Task was destroyed but it is pending!
task: <Task pending name='Task-9210' coro=<WebSocketCommonProtocol.close_connection() running at /home/runner/work/python-libjuju/python-libjuju/.tox/py3/lib/python3.10/site-packages/websockets/legacy/protocol.py:1287> wait_for=<Task pending name='Task-9208' coro=<WebSocketCommonProtocol.transfer_data() running at /home/runner/work/python-libjuju/python-libjuju/.tox/py3/lib/python3.10/site-packages/websockets/legacy/protocol.py:953> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[Task.task_wakeup(), _wait.<locals>._on_completion() at /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/asyncio/tasks.py:475]>>
Exception ignored in: <coroutine object WebSocketCommonProtocol.close_connection at 0x7f6a01b95620>
Traceback (most recent call last):
  File "/home/runner/work/python-libjuju/python-libjuju/.tox/py3/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 1323, in close_connection
    await self.close_transport()
  File "/home/runner/work/python-libjuju/python-libjuju/.tox/py3/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 1341, in close_transport
    if await self.wait_for_connection_lost():
  File "/home/runner/work/python-libjuju/python-libjuju/.tox/py3/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 1364, in wait_for_connection_lost
    async with asyncio_timeout(self.close_timeout):
  File "/home/runner/work/python-libjuju/python-libjuju/.tox/py3/lib/python3.10/site-packages/websockets/asyncio/async_timeout.py", line 85, in timeout
    loop = asyncio.get_running_loop()
RuntimeError: no running event loop
Task was destroyed but it is pending!
task: <Task pending name='Task-9209' coro=<WebSocketCommonProtocol.keepalive_ping() running at /home/runner/work/python-libjuju/python-libjuju/.tox/py3/lib/python3.10/site-packages/websockets/legacy/protocol.py:1[242](https://github.com/juju/python-libjuju/actions/runs/11267536635/job/31332843965#step:5:243)> wait_for=<Future pending cb=[Task.task_wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-11872' coro=<WebSocketCommonProtocol.recv() running at /home/runner/work/python-libjuju/python-libjuju/.tox/py3/lib/python3.10/site-packages/websockets/legacy/protocol.py:546> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[create_task_with_handler.<locals>._task_result_exp_handler(task_name='tmp', logger=<Logger juju....ion (WARNING)>)() at /home/runner/work/python-libjuju/python-libjuju/juju/jasyncio.py:39]>
Task was destroyed but it is pending!
task: <Task pending name='Task-11873' coro=<Event.wait() running at /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/asyncio/locks.py:214> wait_for=<Future pending cb=[Task.task_wakeup()]>>
[gw0] [100%] PASSED tests/integration/test_unit.py::test_destroy_unit