jupyterhub / kubespawner

Kubernetes spawner for JupyterHub
https://jupyterhub-kubespawner.readthedocs.io
BSD 3-Clause "New" or "Revised" License
543 stars 304 forks source link

Object has no attribute 'cancel' - a secondary error when handling a timeout error #796

Closed consideRatio closed 11 months ago

consideRatio commented 11 months ago

From this forum post by @yambottle, I saw that we have a bug that shows during another erroring situation.

[E 2023-10-18 19:00:12.626 JupyterHub user:884] Unhandled error starting test's server: 'coroutine' object has no attribute 'cancel'
    Traceback (most recent call last):
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/user.py", line 798, in spawn
        url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/kubespawner/spawner.py", line 2669, in _start
        future.cancel()
        ^^^^^^^^^^^^^
    AttributeError: 'coroutine' object has no attribute 'cancel'

The related code is here:

https://github.com/jupyterhub/kubespawner/blob/6d9d9a36df3077b6ba22e4a94107ffe96fc2b18c/kubespawner/spawner.py#L2660-L2670

consideRatio commented 11 months ago

@minrk I think you have the asyncio experience to more clearly understand what is going on and how to handle it properly, can you help?

I figure there is a difference between a future and a coroutine, and we are using future logic on coroutines here. Do we need the coroutines to be made into futures? Do we need to do .cancel() if one future of them has errored?

consideRatio commented 11 months ago

Based on https://superfastpython.com/asyncio-gather/#Example_of_Canceling_All_Tasks_in_gather, is the action to first stash the future returned by asyncio.gather in a variable, and then await it, allowing us to do .cancel() on the future variable?

Hmm... I can't think clearly about this. What is the key motivation for doing the .cancel() logic in the first place before re-raising? If these are futures, won't we then also see additional "CancellationErrors" raised from them when doing .cancel() on them?

minrk commented 11 months ago

Yup, I got it. The short answer is that we are trying to cancel coroutines, which are unscheduled tasks, when only Futures can be cancelled. The quick fix is to call task = ensure_future(task) to ensure the coroutines are actually scheduled tasks which can be cancelled (this would be a no-op if they were already Futures).

The bigger question is whether we should be cancelling them in the first place. It might actually make more sense to wait for them all to complete before raising. But that could end up delaying reporting of an error.

minrk commented 11 months ago

If these are futures, won't we then also see additional "CancellationErrors" raised from them when doing .cancel() on them?

You don't generally see CancelledError messages unless you await a future that has been cancelled (usually a bug, since once you've cancelled it, you shouldn't be awaiting it). If you're inside a task being cancelled, there will be no traceback.

For example:

import asyncio
async def printer(name):
    for i in range(10):
        await asyncio.sleep(1)
        print(name)

async def main():
    tasks = [
        asyncio.ensure_future(
            printer(name))
            for name in ("1", "2", "3")
        ]
    # create single task wrapping subtasks
    gather_task = asyncio.gather(*tasks)
    print("waiting")
    try:
        await asyncio.wait_for(gather_task, timeout=1.5)
    except asyncio.TimeoutError:
        print("timeout waiting (expected)")
    # ensure all the sub-tasks are cancelled
    # when wait_for hits timeout, it cancels gather_task
    # when gather_task is cancelled, it cancels all of its sub-tasks
    assert gather_task.done()
    for task in tasks:
        assert task.done()
        assert task.cancelled()

    # any of `await task` here would raise CancelledError,
    # but no other
    print("all done and cancelled")

if __name__ == "__main__":
    asyncio.run(main())

will produce the output:

waiting
1
2
3
timeout waiting (expected)
all done and cancelled

No tracebacks or anything. If any code were to await task for one of the cancelled tasks after it was cancelled, it would raise CancelledError

consideRatio commented 11 months ago

Ah, thank you again @minrk - now I feel quite confident about the cancellation errors!!