python-trio / trio

Trio – a friendly Python library for async concurrency and I/O
https://trio.readthedocs.io
Other
6.06k stars 330 forks source link

Can we make forgetting an await be an error? #79

Open njsmith opened 7 years ago

njsmith commented 7 years ago

[This issue was originally an umbrella list of all the potential Python core changes that we might want to advocate for, but a big discussion about await sprouted here so I've moved the umbrella issue to: #103]

Original comment:

Can we do anything about how easy it is to forget an await? In retrospect async functions shouldn't implement __call__ IMO... but probably too late to fix that. Still, it kinda sucks that one of first things in our tutorial is a giant warning about this wart, so worth at least checking if anyone has any ideas...

ncoghlan commented 7 years ago

OK, I think I see your argument now, and given the leading await you should be able to make it work even within the constraints of the LL(1) parsing restriction (it isn't substantially different from def ...(..., ..., ...) in that regard, especially if you restrict the intervening subexpression to being a name lookup).

That said, while it doesn't read as well, you could likely more easily experiment with an await from foo(1, 2, 3) formulation that switched on the from keyword, rather than the await subexpression being a call construct.

dabeaz commented 7 years ago

One the features I most like about Python is its flexibility. I would not want Python to be modified in a way that forces me to put an await on every instantiation of a coroutine. Specifically, I want this to work:

extracted = foo()
...
await extracted
ncoghlan commented 7 years ago

@dabeaz @njsmith isn't proposing breaking that, he's just proposing to have it mean something different from await foo(), just as extracted = 1, 2, 3; foo(extracted) already means something slightly different from foo(1, 2, 3).

Extending the analogy to other forms:

Comparable equivalents for @njsmith's proposal might then look like:

And putting it that way, I do think the right place to start is to figure out what the "arg expansion" equivalent for @njsmith's proposal would actually look like, and I think await from expr with a suitably updated functools.partial implementation is a promising candidate (due to the yield from precedent).

njsmith commented 7 years ago

I see that my plan of leaving those notes at the bottom of this thread so I can find them again easily when I revisit this next year is not happening :-)

@dabeaz: even in my ideal world, the only thing that code would have to change is that you'd write extracted = foo.__acall__() (or however we ended up spelling it) to explicitly signal that you intentionally left off the await and really do want the coroutine object rather than the result of calling foo. But the proposal described above doesn't even go that far: all it proposes is to let people writing async functions opt in to requiring the explicit .__acall__() if they want, so it wouldn't affect you at all unless you decided to add this to curio as a feature. (And I guess if it's implemented and is wildly successful then we might get consensus to do a long deprecation period and then switch the default to requiring the explicit __acall__, and even then libraries could still opt-out. But I don't think you need to worry about that right now!)

Really, the proposal is just to provide a simple and reliable way to let async functions know whether or not they got called with await, and then do whatever they want with this information. It's actually more flexible than what we have now, and tbh seems like the sort of thing you might be able to find some terrifying use for...

ncoghlan commented 7 years ago

@njsmith It's probably the opposite of comforting, but my writing PEP 432 was originally motivated by getting __main__ to play nice with the pure Python import system in 3.3, and we've only just begun to move forward with the implementation as a private startup refactoring for 3.7 :)

njsmith commented 7 years ago

Stopgap measure design notes

I was thinking some more about a possible stopgap measure that we might be able to sneak into 3.7 to make something like #176 fast. Basically the idea would be to get just enough help from the interpreter to make checking for unawaited coroutines fast enough that we can afford to do it at every context switch.

(There was some previous discussion of these ideas in #176 starting here: https://github.com/python-trio/trio/pull/176#issuecomment-304490423)

In more detail, the requirements would be:

There are two fairly natural ways to do this that come to mind. They both involve the same basic strategy: adding two pointers to each coroutine object, which we can use to create a double-linked list holding the coroutines of interest (with the list head stored in the threadstate). The nice thing about an intrusive double-linked list like this is that it's a collection where adding and removing are both O(1) and very cheap (just updating a few pointers).

API option 1

Keep a thread-local list of live, unawaited coroutines. So coroutines always insert themselves into the list when created, and then remove themselves again when they're either iterated or deallocated.

Then we need:

def unawaited_coroutine_gc_hook(coro):
    _gced_unawaited_coros.add(coro)

def barrier():
    live_unawaited_coros = sys.get_and_clear_unawaited_coros()
    if _gced_unawaited_coros or live_unawaited_coros:
        # do expensive stuff here, checking if this task is hosting asyncio, etc.
        ...

(We need to have both hooks to handle the two cases of (a) a coroutine is created and then immediately garbage collected in between barriers, (b) a coroutine is created and then still live when we hit the barrier.)

For speed, we might want to also have a sys.has_unawaited_coros() that returns a bool, so that we don't need to bother calling get_and_clear_unawaited_coros every time and pay the price of allocating an empty list object just to throw it away. %timeit bool([]) says 127 ns and %timeit get_coroutine_wrapper() (used here as a proxy for a trivial function that just returns a value from the threadstate) says 37 ns, so it's faster but they're both fast enough I don't know whether it matters without some kind of testing.

A minor awkward question here is whether the regular unawaited coro warning would be issued at the same time our hook is called, or whether we'd have some way to suppress it, or...

Oh wait, there's also a more worrisome problem: if we see a coroutine is unawaited and live, and decide that it's OK... what happens if it later gets garbage collected and is still unawaited? We'll see it again, but now in some random other context. I guess what we could do is that when we detect live unawaited coroutines and we're in asyncio-friendly mode, we put them into a WeakSet (or even a WeakDict to track which task created them), and then our gc hook checks to see if they're in the WeakSet. Except... I think __del__ might run after weakrefs are cleared? If so then this whole approach is probably sunk. Or maybe we could somehow mark the coroutines that we've seen before? You can't attach arbitrary attributes to coroutines. I suppose we could do some nasty hack involving the coroutine's locals... those should still be accessible from __del__.

API option 2

In the design above, the need to handle GCed unawaited coroutines is the cause of lots of different kinds of awkwardness. What we could do instead is make our tracking list a strong reference, so that coroutines insert themselves when created, and then stay there either until they're iterated or else we remove them manually.

Obviously this can't be enabled by default, because it would mean that instead of warning about unawaited coroutines, the interpreter would just leak them. So the API would be something like:

sys.enable_unawaited_coroutine_tracking(), sys.disable_unawaited_coroutine_tracking(), sys.get_and_clear_unawaited_coroutines()

And then everything else is pretty straightforward and obvious. (Though we still have the question about whether we'd want a sys.has_unawaited_coroutines() for speed.)

PyPy's opinion

I asked @arigo about how these options look from the point of view of PyPy. He said that if sys.get_and_clear_unawaited_coros() was written carefully, then it would probably be possible for the JIT to inline and optimize out the unnecessary empty list, so that's a useful data point. (Though I guess it might still be better to save it the trouble of needing to, esp. since you're not always in JIT mode.)

He also had a strong preference for the second API on the general grounds that doing anything from __del__ methods is just asking for trouble and creates all kinds of problems for alternative interpreter implementations. Obviously they support it, but it's not cheap. In this particular case it's not a huge practical difference currently, b/c coroutines already have a __del__ method (the one that prints the coroutine '...' was never awaited message, or potentially throws GeneratorExit – though PyPy is clever enough to avoid generating a __del__ method just for the latter if there are no yield points with exception handlers). But he felt that option 2 was just simple from an abstract "let's try to make it easier rather than harder to write a python implementation" perspective. (And who knows, maybe there's some chance we could someday get rid of coroutine '...' was never awaited.)

Other possible consumers

Nick points out that this kind of list structure might also be useful for more general introspection, similar to threading.enumerate(): https://github.com/python-trio/trio/pull/176#issuecomment-304492602

Curio, possibly. Not sure, would have to check with Dave.

I think pytest-asyncio and similar libraries might find this useful to easily and reliably catch unawaited coroutines and attribute them to the right test. [Edit: asked pytest-asyncio here: https://github.com/pytest-dev/pytest-asyncio/issues/67 ]

dabeaz commented 7 years ago

Forgetting to put await on coroutines is not a problem that I'm concerned about. It is unlikely that I would use this in Curio.

ncoghlan commented 7 years ago

As far as the __del__ comments go, coroutine objects are hashable, so WeakSet seems like it would be a better fit for this purpose than doing custom state manipulation in __del__.

I'll also note that everything except the "already GC'ed before the next check for unawaited coroutines" case can already be handled by calling sys.set_coroutine_wrapper with something like (not tested, but should be in the right general direction):

_coro_tracking = threading.local()
_coro_tracking.created = weakref.WeakSet()
_coro_tracking.started = weakref.WeakSet()

def track_coroutines(coro_def):
    @functools.wraps(coro_def)
    def make_tracked_coro(*args, **kwds):
        inner_coro = coro_def(*args, **kwds)
        _coro_tracking.created.add(inner_coro)
        @types.coroutine
        def tracked_coro():
            _coro_tracking.started.add(inner_coro)
            _coro_tracking.created.remove(inner_coro)
            yield from inner_coro
        return tracked_coro(*args, **kwds)
    return make_tracked_coro

That way, the interaction with the GC would just be normal WeakSet interaction, while the interaction with coroutine definitions would use the existing wrapper machinery (which trio is presumably hooking into already).

If you instead want something more like your second case, then switch the created set to being a regular set such that created coroutines can't be GC'ed before they're seen by the event loop. Whether or not to configure that by default would be up to the author of the wrapper function.

njsmith commented 7 years ago

@ncoghlan Yeah, but my concern is that we won't be able to make the speed workable like that. Since this adds per-call and per-context-switch overhead it's probably worthwhile trying to get it as low as possible. But one reason for working through the thoughts here before taking something to bpo is to figure out exactly what semantics we'd want so we can prototype it with set_coroutine_wrapper and see what happens :-).

ncoghlan commented 7 years ago

yield from should get your regular context switch overhead back to near-zero, so you're mainly looking at a bit of extra function call overhead per coroutine created. You also have the option of putting the enablement of the wrapper behind an if __debug__: guard.

Anyway, probably the most useful feedback is that your suggested API option 2 is likely the most viable approach, where the default behaviour is to use a WeakSet (so only non-GC'ed references are checked, which is sufficient for the result = unawaited_coro() case), and you have an opt-in toggle to switch to using a regular set instead (allowing you to also check for the unawaited_coro() case).

njsmith commented 7 years ago

Meanwhile back in the subthread about possibly making await ...(...) a single piece of syntax:

OK, I think I see your argument now, and given the leading await you should be able to make it work even within the constraints of the LL(1) parsing restriction

I was nervous about this b/c I don't know a lot about parsing, but actually it looks trivial: the grammar rule that currently handles await expressions is atom_expr: [AWAIT] atom trailer*, where AWAIT is the await token and trailer* is the function call parentheses (among other things). So in fact await ...(...) already is a single piece of syntax as far as the parser is concerned; it's only split into await ... and ...(...) during the concrete syntax → AST transformation.

njsmith commented 6 years ago

Another reason it would be useful to allow foo to tell whether it was called synchronously foo(...) or asynchronously await foo(...): it would make it possible to transition a function from synchronous to asynchronous or vice-versa with an intermediate period where both work and one emits a DeprecationWarning. Right now this is mostly impossible outside of some special cases. (I think you can do it iff you have a function that has no return value and is going async→sync.)

(I ran into this with trio's bind method, see #241, but I expect it will be much much more common in downstream libraries.)

njsmith commented 6 years ago

Note to self: think through a version of the __awaitcall__ idea where there's something like a context-local set_coroutine_wrapper, but it's only invoked by __call__, not __awaitcall__, so await asyncfn(...) stays just as fast, but the coroutine runner can make async def functions called normally return Futures on asyncio and Deferreds on twisted – thus bringing their async/await usability in line with that of C#/Javascript – and trio can make async def functions called normally raise an error.

Some things to watch out for: wrapper functions, functools.partial; preserving await fut + await sync_fn_returning_future(); preserving asyncio.run(asyncfn()); fast access to __awaitcall__ from C and Python (the latter requiring the preservation of funcall optimizations) – maybe __awaitcall__ has an unusual type slot like PyObject* (*awaitcall)(void)?

vlad0337187 commented 5 years ago

I think, that in more than 90% of code, we just write await before launching async functions. Think, it would be better to just omit await and wait for result from async functions automatically.

If task needs to be spawned - nursery can be used.

I think, that it could be implemented somehow. If call to async function returns just coroutine, maybe, there could be created self-executed coroutines to launch just after they were created.

maxfischer2781 commented 4 years ago

I think it is important to note that a major reason why most async is just a regular await coro() may be that the ecosystem for async is seriously underdeveloped at this point. In other words, the bare await coro() may not stay as dominant as it is now.

For functions, methods and generators, it is accepted practice to treat them as first-class objects -- passing them around, wrapping them, storing them. People generally learn pretty quickly that func, obj.meth and gen() are "things" that fit well into partial, sorted, enumerate, producing yet again "things". This is helped by the standard library shipping with all of these wrappers and things that make it natural to experience functions, methods and generators as "things" en par with other "things".

Right now, everything async is very clearly a different breed of "thing". The most obvious case is that enumerate, map, filter do not work on async iterables, which makes common sync patterns very painful to do async. There is the subtle case that coroutines do not work as map/reduce/... operations, or property setters, they don't have literals (lambda), and so on -- which means there is simply not much else to do with coroutines than await them.

The only case where async and sync are on equal footing is () -- which trio also relies on for every case where partial is recommended. Further special-casing await ...() versus await ... and ...() will likely make it even more difficult to treat coroutines like things equal to other constructs.