juju / python-libjuju

Python library for the Juju API
Apache License 2.0
59 stars 100 forks source link

[Bug]: websockets / ssl error #802

Closed jnsgruk closed 1 year ago

jnsgruk commented 1 year ago

Description

Over the last few days, I've seen a bunch of errors like this:

Task was destroyed but it is pending!
task: <Task pending name='Task-50' coro=<WebSocketCommonProtocol.keepalive_ping() running at /home/runner/work/parca-operator/parca-operator/.tox/integration/lib/python3.10/site-packages/websockets/legacy/protocol.py:1259> wait_for=<Future pending cb=[Task.task_wakeup()]>>
Fatal error on SSL transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x7fc1941d1ae0>
transport: <_SelectorSocketTransport closing fd=10>
Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 916, in write
    n = self._sock.send(data)
OSError: [Errno 9] Bad file descriptor

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/sslproto.py", line 690, in _process_write_backlog
    self._transport.write(chunk)
  File "/usr/lib/python3.10/asyncio/selector_events.py", line [92](https://github.com/jnsgruk/parca-operator/actions/runs/4192573114/jobs/7268330702#step:4:93)2, in write
    self._fatal_error(exc, 'Fatal write error on socket transport')
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 717, in _fatal_error
    self._force_close(exc)
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 729, in _force_close
    self._loop.call_soon(self._call_connection_lost, exc)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 750, in call_soon
    self._check_closed()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
Task was destroyed but it is pending!
task: <Task pending name='Task-49' coro=<WebSocketCommonProtocol.transfer_data() running at /home/runner/work/parca-operator/parca-operator/.tox/integration/lib/python3.10/site-packages/websockets/legacy/protocol.py:968> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[Task.task_wakeup(), _wait.<locals>._on_completion() at /usr/lib/python3.10/asyncio/tasks.py:475]>
Task was destroyed but it is pending!
task: <Task pending name='Task-51' coro=<WebSocketCommonProtocol.close_connection() running at /home/runner/work/parca-operator/parca-operator/.tox/integration/lib/python3.10/site-packages/websockets/legacy/protocol.py:1314> wait_for=<Task pending name='Task-49' coro=<WebSocketCommonProtocol.transfer_data() running at /home/runner/work/parca-operator/parca-operator/.tox/integration/lib/python3.10/site-packages/websockets/legacy/protocol.py:968> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[Task.task_wakeup(), _wait.<locals>._on_completion() at /usr/lib/python3.10/asyncio/tasks.py:475]>>
Exception ignored in: <coroutine object WebSocketCommonProtocol.close_connection at 0x7fc1[94](https://github.com/jnsgruk/parca-operator/actions/runs/4192573114/jobs/7268330702#step:4:95)163610>
Traceback (most recent call last):
  File "/home/runner/work/parca-operator/parca-operator/.tox/integration/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 1354, in close_connection
    await self.close_transport()
  File "/home/runner/work/parca-operator/parca-operator/.tox/integration/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 1372, in close_transport
    if await self.wait_for_connection_lost():
  File "/home/runner/work/parca-operator/parca-operator/.tox/integration/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 13[97](https://github.com/jnsgruk/parca-operator/actions/runs/4192573114/jobs/7268330702#step:4:98), in wait_for_connection_lost
    await asyncio.wait_for(
  File "/usr/lib/python3.10/asyncio/tasks.py", line 405, in wait_for
    loop = events.get_running_loop()
RuntimeError: no running event loop
Task was destroyed but it is pending!
task: <Task pending name='Task-57' coro=<Connection._pinger() running at /home/runner/work/parca-operator/parca-operator/.tox/integration/lib/python3.10/site-packages/juju/client/connection.py:584> wait_for=<Future pending cb=[Task.task_wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-52' coro=<Connection._receiver() running at /home/runner/work/parca-operator/parca-operator/.tox/integration/lib/python3.10/site-packages/juju/client/connection.py:536> wait_for=<Future pending cb=[Task.task_wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-81' coro=<WebSocketCommonProtocol.recv() running at /home/runner/work/parca-operator/parca-operator/.tox/integration/lib/python3.10/site-packages/websockets/legacy/protocol.py:551> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[create_task_with_handler.<locals>._task_result_exp_handler(task_name='tmp', logger=<Logger juju....ion (WARNING)>)() at /home/runner/work/parca-operator/parca-operator/.tox/integration/lib/python3.10/site-packages/juju/jasyncio.py:50]>
Task was destroyed but it is pending!
task: <Task pending name='Task-82' coro=<Event.wait() running at /usr/lib/python3.10/asyncio/locks.py:214> wait_for=<Future pending cb=[Task.task_wakeup()]>>
integration: exit 1 (18.69 seconds) /home/runner/work/parca-operator/parca-operator> pytest -v -s --tb native --ignore=/home/runner/work/parca-operator/parca-operator/tests/unit --ignore=/home/runner/work/parca-operator/parca-operator/tests/functional --log-cli-level=INFO --model controller --keep-models pid=8832
  integration: FAIL code 1 (39.03=setup[20.33]+cmd[18.69] seconds)
  evaluation failed :( (39.10 seconds)

The error is intermittent, and retrying the job normally fixes it.

Urgency

Annoying bug in our test suite

Python-libjuju version

3.1.0.1

Juju version

3.1/stable

Reproduce / Test

Github Actions on jnsgruk/parca-operator or jnsgruk/zinc-k8s-operator
cderici commented 1 year ago

I don't know the exact cause, but I suspect that the logic in https://github.com/juju/python-libjuju/blob/d998298beee4c9332a8e0425ec05d9952d82c509/juju/utils.py#L161 needs to be revised, will take a closer look

github-actions[bot] commented 1 year ago

This issue is marked as incomplete because it has been open 30 days with no activity. Please remove incomplete label or comment or this will be closed in 5 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 5 days with no activity.