__del__ is not a valid async context

touilleMan commented 5 years ago

With triopg, I sometime encounter this kind of errors:

Exception ignored in: <bound method Connection.__del__ of <asyncpg.connection.Connection object at 0x7f908d309480>>
Traceback (most recent call last):
  File "/home/travis/virtualenv/python3.6.3/lib/python3.6/site-packages/asyncpg/connection.py", line 124, in __del__
    self.terminate()
  File "/home/travis/virtualenv/python3.6.3/lib/python3.6/site-packages/asyncpg/connection.py", line 1059, in terminate
    self._abort()
  File "/home/travis/virtualenv/python3.6.3/lib/python3.6/site-packages/asyncpg/connection.py", line 1086, in _abort
    self._protocol.abort()
  File "asyncpg/protocol/protocol.pyx", line 546, in asyncpg.protocol.protocol.BaseProtocol.abort
  File "/opt/python/3.6.3/lib/python3.6/asyncio/selector_events.py", line 603, in abort
    self._force_close(None)
  File "/opt/python/3.6.3/lib/python3.6/asyncio/selector_events.py", line 658, in _force_close
    self._loop.call_soon(self._call_connection_lost, exc)
  File "/home/travis/virtualenv/python3.6.3/lib/python3.6/site-packages/trio_asyncio/base.py", line 399, in call_soon
    return self._queue_handle(Handle(callback, args, self, context=context, is_sync=True))
  File "/home/travis/virtualenv/python3.6.3/lib/python3.6/site-packages/trio_asyncio/async_.py", line 19, in _queue_handle
    self._q.put_nowait(handle)
  File "/home/travis/virtualenv/python3.6.3/lib/python3.6/site-packages/trio/_core/_ki.py", line 162, in wrapper
    return fn(*args, **kwargs)
  File "/home/travis/virtualenv/python3.6.3/lib/python3.6/site-packages/trio/_sync.py", line 906, in put_nowait
    _core.reschedule(task, outcome.Value(obj))
  File "<string>", line 7, in wrapper
RuntimeError: must be called from async context

Basically asyncpg's Connection object has a __del__ that wants to do an asyncio operation (so translated by trio-asyncio into doing a trio) but the weird place __del__ execute from doesn't seems to have a trio context :(

njsmith commented 5 years ago

Huh, so just from glancing at the traceback there, it looks like this is a bug in asyncpg. There's just a general fact about all async libraries that it's not safe to call into them from __del__ methods, because they can run at any random time in any random thread. But asyncpg is just ... ignoring that and making the unsafe call.

I guess they might happen to get away with it in regular asyncio because by chance, the call stack ultimately hits call_soon, and IIRC, on the default asyncio loops, call_soon and call_soon_threadsafe are the same.

So:

You might be able to work around the issue in your program by making sure that you close connections explicitly (e.g. using with) rather than letting the GC do it.

The real fix is probably for asyncpg to make their __del__ method do call_soon_threadsafe(self.terminate), I guess? Or something like that. (Actually technically even this isn't guaranteed to work, because "threadsafe" is actually a weaker condition than "__del__safe", but it's the closest thing asyncio has.)

And, even though trio-asyncio is technically in the right here, it's possible we might be forced to figure out a more graceful way to tolerate this kind of situation, if it turns out to be needed for real-world compatibility.

smurfix commented 5 years ago

call_soon_threadsafe is indeed the preferred solution.

asyncio's threadsafe version writes a wake-up byte to asyncio's internal buffer, which is required in this situation – garbage collection might be triggered from a different thread, and even if not this could deadlock.

njsmith commented 5 years ago

Going down the rabbit-hole... I just filed:

https://github.com/MagicStack/asyncpg/issues/376 to make asyncpg use loop.call_soon_threadsafe
bpo-34968 to make it legal for asyncpg to use loop.call_soon_threadsafe

If we really have to we could make trio-asyncio's version of call_soon instead call call_soon_threadsafe, but I'm a bit hesitant to do this since call_soon is such a fundamental primitive that gets used all over the place (e.g. it's used to schedule task steps), and for trio-asyncio, call_soon_threadsafe is substantially more heavy-weight than call_soon. Correctness beats purity, so if it turns out there's an epidemic of people running into this kind of problem then we should consider it, but I'd like to see more evidence first.

For one piece of evidence: @touilleMan, are you able to avoid the issue by being more careful to use with on your connections?

Another useful piece of evidence would be to check how expensive it would actually be to make call_soon an alias for call_soon_threadsafe.

smurfix commented 5 years ago

Another useful piece of evidence would be to check how expensive it would actually be to make call_soon an alias for call_soon_threadsafe.

A microbenchmark on my dev VM reports 49 µsec for call_soon vs. 56 µsec for -threadsafe vs. 0.4 µsec for calling the thing directly.

By comparison, using asyncio directly, the numbers are 2.3 vs. 4.3 vs. 0.27 µsec. :-/

Edit to add: The 0.27 vs 0.4 µsec are measurement error (obviously).

njsmith commented 5 years ago

Well, sounds like our call_soon has some room for optimization :-)

Did your macrobenchmark just measure the call_soon invocation itself, or did it measure all the way to the callback invocation? For the threadsafe version especially, the costs are kind of distributed across a few different contexts (the caller, the entry queue processor, the trio-asyncio mainloop).

smurfix commented 5 years ago

I did measure the whole thing:

import trio
import trio_asyncio
import asyncio
from time import process_time as time

class N:
    def __init__(self, n):
        self.n = n
        self.e = asyncio.Event()
    def __repr__(self):
        return str(n)
    def run(self):
        self.n -= 1
        if not self.n:
            self.e.set()

async def t(n):
    l = asyncio.get_event_loop()
    for x in range(n.n):
        l.call_soon(n.run)
        #l.call_soon_threadsafe(n.run)
        #n.run()
    await n.e.wait()
    return

async def clock(proc, n):
    t1 = time()
    await proc(N(n))
    t2 = time()
    print((t2-t1)/n)

l = asyncio.get_event_loop()
l.run_until_complete(clock( t, 1000))
trio_asyncio.run(trio_asyncio.aio_as_trio(clock),t,1000)

smurfix commented 5 years ago

asyncio's call_soon just appends the resulting handle to asyncio's _ready list. That's reasonably cheap. trio_asyncio, however, uses a trio.Queue, which is a lot more expensive. I should probably switch to an unbounded queue …

njsmith commented 5 years ago

Well, maybe... trio.Queue isn't much more than a collections.deque. It wouldn't be hard to replace it with something cheaper, but we should profile or something before jumping to conclusions about how much it will help :-).

smurfix commented 5 years ago

Part of the problem obviously isn't the queue, but the dance you need to get there.

replace KI protection with a no-op: 42 µsec
replace cancel_shielded_checkpoint with a no-op: 30 µsec
both of the above: 24 µsec

smurfix commented 5 years ago

… though we should probably move this somewhere else …

touilleMan commented 5 years ago

Well this issue inspired you a lot :smiley:

You might be able to work around the issue in your program by making sure that you close connections explicitly (e.g. using with) rather than letting the GC do it.

Good point, I will modify triopg to have connection&connection pools only work with async context manager.

touilleMan commented 5 years ago

Triopg v3.0.0 released ;-)

python-trio / trio-asyncio

del is not a valid async context #44

python-trio / trio-asyncio

__del__ is not a valid async context #44

del is not a valid async context #44