python-trio / trio

Trio – a friendly Python library for async concurrency and I/O
https://trio.readthedocs.io
Other
6.01k stars 325 forks source link

Can we make forgetting an await be an error? #79

Open njsmith opened 7 years ago

njsmith commented 7 years ago

[This issue was originally an umbrella list of all the potential Python core changes that we might want to advocate for, but a big discussion about await sprouted here so I've moved the umbrella issue to: #103]

Original comment:

Can we do anything about how easy it is to forget an await? In retrospect async functions shouldn't implement __call__ IMO... but probably too late to fix that. Still, it kinda sucks that one of first things in our tutorial is a giant warning about this wart, so worth at least checking if anyone has any ideas...

brettcannon commented 7 years ago

The tricky bit with the "forgetting to call await" is passing an async function around as any old object (including through a return statement) is a totally legitimate thing to do. About the only thing I can think of is if you call an async function and don't store its value then you obviously screwed up since you won't even get side-effects for that and so you could have the interpreter raise a warning/exception at that point instead of at destruction time of the object (which should get you the proper call line in the traceback and will work immediately in e.g. PyPy when they don't use refcounting).

The issue in this case is how to reliably detect that an unstored async function was never started (i.e. is having POP_TOP check whether an async function has started good enough to catch this situation and not have any false-positives?).

njsmith commented 7 years ago

@brettcannon: in a perfect world IMO we would have made await foo(...) expand to something like yield from foo.__acall__(...).__await__(), so you could get the coroutine object if you wanted, but you'd have to write foo.__acall__(...) instead of just foo(...). Obviously this would break a bunch of asyncio and curio code at this point :-( (though curio is switching to the trio style of never passing coroutine objects around as well). The problem is that there's no situation where you need to pass around coroutine objects unless you're implementing a library like trio, but with the way asyncio started with generator-based-coroutines this wasn't obvious. And aiohttp for example definitely does "clever" things like having a nominally-regular function call an async function and return the coroutine object, but all this does is make the code more confusing and mess up tracebacks.

Nominally async/await is still provisional so I guess technically we could make a switch like this, and we could account for it in @asyncio.coroutine to handle most cases, but the aiohttp-style cases would definitely break, which would be pretty brutal...

Or, I think they would break? It's possible they're still using yield from and would be okay?

brettcannon commented 7 years ago

https://aiohttp.readthedocs.io/ says they are still supporting Python 3.4.

njsmith commented 7 years ago

Maybe there's some hope then? I dunno :-).

It occurred to me that perhaps I should offer to present this list as a talk at the language summit this year – don't know if that would be useful/of interest, but maybe.

1st1 commented 7 years ago

Nominally async/await is still provisional

Not anymore :)

can we do anything about how easy it is to forget an await? In retrospect async functions shouldn't implement call IMO...

We discussed this at length on python-dev. Here's a summary: https://www.python.org/dev/peps/pep-0492/#pep-3152. The most important point is this one:

# The following code:

  await fut
  await function_returning_future()
  await asyncio.gather(coro1(arg1, arg2), coro2(arg1, arg2))

# would look like:

  cocall fut()  # or cocall costart(fut)
  cocall (function_returning_future())()
  cocall asyncio.gather(costart(coro1, arg1, arg2),
                        costart(coro2, arg1, arg2))

I'd say changing core async/await semantics at this point is not an option. However, we can make it an opt-it (probably).

Idea

One way to implement this is to add a new attribute to frame objects f_inawait and a new opcode BEFORE_AWAIT to set it. So coroutine functions like the one below

async def foo():
    print('hello')
    await bar()

would produce the following opcode:

  2           0 LOAD_GLOBAL              0 (print)
              2 LOAD_CONST               1 ('hello')
              4 CALL_FUNCTION            1
              6 POP_TOP

  3           8 BEFORE_AWAIT
             10 LOAD_GLOBAL              1 (bar)
             12 CALL_FUNCTION            0
             14 GET_AWAITABLE
             16 LOAD_CONST               0 (None)
             18 YIELD_FROM
             20 POP_TOP
             22 LOAD_CONST               0 (None)
             24 RETURN_VALUE

Where BEFORE_AWAIT would set f_inawait to 1 for the current frame, and GET_AWAITABLE would reset it back to 0.

Now, we have sys.set_corotuine_wrapper() that allows to intercept creation of coroutines. We can use it to raise an error if a coroutine was constructed outside of an await expression:

def coro_wrapper(coro):
    if not sys._getframe(1).f_inawait:
        raise RuntimeError(
            'creating a coroutine outside of await expression')
    return coro

sys.set_coroutine_wrapper(coro_wrapper)

I have a PoC implementation here: https://github.com/1st1/cpython/tree/coro_nocall. And here's a script illustrating how it works: https://gist.github.com/1st1/1cc67287654cc575ea41e8e623ea8c71

If you and @dabeaz think that this would significantly improve trio/curio usability, I can draft a PEP.

Maybe we can even enable this for asyncio code, but I'll need to think more about it. Perhaps we can have some APIs to asyncio for coroutine creation without await and slowly begin to use it in frameworks like aiohttp.

/cc @ncoghlan @ambv @asvetlov

brettcannon commented 7 years ago

I know I would be interested in a talk on this at the language summit. 😄

njsmith commented 7 years ago

@1st1

If you and @dabeaz think that this would significantly improve trio/curio usability, I can draft a PEP.

I'm not sure if this exact proposal is ideal, but you've at least convinced me that something useful might be doable and so it's worth having the discussion and hashing out our options :-).

I don't have brain to give a proper critique, but as a mini-critique: the two main concerns that pop to mind are: (1) well, it's pretty ugly isn't it, (2) is the frame introspection thing going to kill pypy. Throwing out another idea: maybe await foo(...) can de-sugar to

try:
    corofn = foo.__acall__
except AttributeError:
    corofn = foo
coro = corofn()
await coro

and then there are a few options for how we could arrange for async functions to provide __acall__ while erroring on __call__, like a __future__. Or asyncio could provide a set_coroutine_wrapper class that adds in a __call__ method that forwards to __acall__.

1st1 commented 7 years ago

(1) well, it's pretty ugly isn't it

I don't find it ugly to be honest. Maybe a bit unusual, but we have many other "unusual" fields on frames and generator objects. Maybe you can find a less ugly variation of this idea?

(2) is the frame introspection thing going to kill pypy.

I envisioned this as a debug-mode (or call it "development mode") only check. In production, turn the debug mode off and buckle up.

Or asyncio could provide a set_coroutine_wrapper class that adds in a call method that forwards to acall.

Now this is pretty ugly and will make asyncio code slower under PyPy ;) Frankly, I don't think we can implement any revision of the__acall__ idea at this point.

FWIW I don't think that my idea with f_inawait is beautiful, but at least it allows to implement the check you want with minimal changes/complications to the interpreter/existing programs. It is simple, so it's very likely we could add it (or a variation of it) to 3.7.

njsmith commented 7 years ago

I envisioned this as a debug-mode (or call it "development mode") only check. In production, turn the debug mode off and buckle up.

A debug mode would be better than nothing, but if possible I'd really prefer something that we can leave on in general. Imagine if functions in "production mode" responded to getting the wrong number of arguments by silently returning some weird object instead of raising an error – it's kind of the same thing. And debug modes impose a substantial cognitive overhead – you have to document them, every beginner has to remember to turn them on 100% of the time, some of them will fail or forget and we're back to getting confusing errors, etc.

Now this is pretty ugly and will make asyncio code slower under PyPy ;) Frankly, I don't think we can implement any revision of the __acall__ idea at this point.

Conceptually the f_inawait and __acall__ proposals are very similar: in both cases we're passing an extra bit of information down to the coroutine object ("is this call in await context, true/false"), and then it can respond to that bit however. The __acall__ method of passing this information seems more explicit and consistent with how we pass information around in Python normally, that's all...

Re: PyPy, we're talking about a simple wrapper object that has no state and one method that unconditionally calls another method. This is the sort of thing that their JIT eats for breakfast :-). On CPython it's worse because you can't optimize out the allocation of the wrapper object. But if this is a serious problem then instead of a wrapper it could be a flag that we set on the coroutine object (sorta similar to how @types.coroutine works – it could return a wrapper but for efficiency it doesn't). Or we could have a global flag instead, and have async functions do something like:

def __call__(self, *args, **kwargs):
    if not sys._check_special_thread_local_flag():
        raise TypeError("async function called without await")
    return self.__acall__(*args, **kwargs)

I just thought it was a bit nicer to re-use set_coroutine_wrapper rather than introduce a new global flag :-). (Also, for all of these approaches we have flexibility about the default – we could just as easily make it trio/curio that have to set some special flag or whatever we go with.) But in general my point is that the "how do we configure the policy async functions use to handle being called outside await context" part is orthogonal to the "how do async functions tell whether they are in await context or not" part.

1st1 commented 7 years ago

Now this is pretty ugly and will make asyncio code slower under PyPy ;) Frankly, I don't think we can implement any revision of theacall idea at this point.

To elaborate: any change to PEP 492 semantics will have to be fully backwards compatible for asyncio programs. Any significant drop in performance wouldn't be acceptable either.

What you are proposing is to make parenthesis a part of the await expression. For that you'll need new opcodes. To make your proposal backwards compatible, those new opcodes could execute different versions of async protocol depending on a special flag in the interpreter state. This could really work for code like await a(), or even await a.b(). But what about await a().b()? You'll have to use CALL_FUNCTION at some point, or you will have to reinvent all CALL_* opcodes.

Maybe it's possible to come up with some clever idea here, but the patch (and the PEP) will be bigger than the one for PEP 492.

njsmith commented 7 years ago

We should probably move this discussion to async-sig or similar?

1st1 commented 7 years ago

I envisioned this as a debug-mode (or call it "development mode") only check. In production, turn the debug mode off and buckle up. A debug mode would be better than nothing, but if possible I'd really prefer something that we can leave on in general. Imagine if functions in "production mode" responded to getting the wrong number of arguments by silently returning some weird object instead of raising an error – it's kind of the same thing.

Yeah, I understand it's not an ideal situation. But debug modes isn't something new, a lot of frameworks have them. Generally, people learn that they exist and that they are helpful.

And debug modes impose a substantial cognitive overhead – you have to document them, every beginner has to remember to turn them on 100% of the time, some of them will fail or forget and we're back to getting confusing errors, etc.

Just make the debug mode default. And print to stdout that it's ON, and that it should be turned off in production.

The acall method of passing this information seems more explicit and consistent with how we pass information around in Python normally, that's all...

I think you don't see the full picture of how exactly this can be implemented in CPython. Maybe I'm wrong here, but I don't think this is possible. It would help if you could come up with a more detailed proposal (proto PEP) where you describe:

  1. changes to grammar/AST.
  2. changes to how await expression is compiled.
  3. changes to the __await__ protocol.

I'd be happy to help you with coding the PoC if you can show that your idea is possible to implement.

We should probably move this discussion to async-sig or similar?

I actually like the GH UI much more than my email client :) You can format your code here etc. Maybe you can just send a short email to async-sig inviting interested people to join the discussion here?

Carreau commented 7 years ago

I've hit the issue a few time in IPython while live coding, I'll be happy to write an extension that at least warn users on the REPL.

1st1 commented 7 years ago

await foo(...) can de-sugar to

try:
corofn = foo.__acall__
except AttributeError:
corofn = foo
coro = corofn()
await coro

And what will you de-sugar await foo().bar(spam=ham()) to? Would await foo().bar(spam=ham()).fut be acceptable?

1st1 commented 7 years ago

I've hit the issue a few time in IPython while live coding, I'll be happy to write an extension that at least warn users on the REPL.

Yeah, I think we'll have it in 3.7 one way or another :)

dabeaz commented 7 years ago

Honestly, forgetting to put an await on things is not something that I've lost a lot of sleep thinking about. Python already provides a warning message about unawaited coroutines. And if you forget the await, it is usually readily apparent that something is quite wrong in other ways. There are parts of Curio that instantiate coroutines and pass them around to other locations to be awaited later. So, I certainly wouldn't want to see a requirement that forces the await to be there. I'd also add that with decorators and feature such as functools.partial() it gets pretty complicated.

One feature that'd be really useful from the Curio front would be some way to know whether a callable has been triggered from within the context of a coroutine or not. For example, decorators could use something like that to know more about the context in which an operation is taking place.

1st1 commented 7 years ago

Honestly, forgetting to put an await on things is not something that I've lost a lot of sleep thinking about. Python already provides a warning message about unawaited coroutines. And if you forget the await, it is usually readily apparent that something is quite wrong in other ways.

Yeah, this has been my experience too. People do forget to put await sometimes when they begin writing async code, but usually understand why the program doesn't work pretty quickly.

If we can find a way to make missing awaits trigger an actual error - why not, but I'd be -1 if it requires a huge new PEP and a change to async/await semantics.

One feature that'd be really useful from the Curio front would be some way to know whether a callable has been triggered from within the context of a coroutine or not.

You can use sys.set_coroutine_wrapper to wrap all coroutines into some awaitable object that enables the tracking. Would only recommend to do that for some debug mode.

njsmith commented 7 years ago

@1st1: I suspect that this isn't the ideal implementation, but to demonstrate that my idea can be precisely specified: for parsing, we could parse just like we do now, and then do a post-processing pass over the AST where we replace all terms of the form Await(value=Call(...)) with a term of the form AwaitCall(...). I believe that gives exactly the semantics I'd propose. And a possible bytecode implementation would be to add a bytecode RESOLVE_AWAITCALL which does (excuse the horrible pseudocode, hopefully the intent is clear):

tmp = POP()
try:
    PUSH(tmp.__acall__)
except AttributeError:
    PUSH(tmp)

and then AwaitCall(...) compiles into the same bytecode as Call(...) would, except with a RESOLVE_AWAITCALL inserted at the appropriate place.

@dabeaz

Honestly, forgetting to put an await on things is not something that I've lost a lot of sleep thinking about.

Out of curiosity, since I know you do a fair amount of training, have you done any async/await training yet? Part of the trigger for this thread was that when I wrote the trio tutorial aiming at teaching folks who hadn't been exposed to async/await before, I felt obliged to put in a big section up front warning them about this pitfall, since my impression is that it bites basically everyone repeatedly when they're learning. Also it doesn't help that in pypy the warning is delayed for an arbitrarily long time. (In particular, if you're at the REPL you probably just won't get a warning at all).

There are parts of Curio that instantiate coroutines and pass them around to other locations to be awaited later. So, I certainly wouldn't want to see a requirement that forces the await to be there.

From this comment, I was under the impression that curio users would no longer be instantiating coroutines and passing them around. Is that right?

If curio internally wants to do that, then of course that's fine – any proposal is going to have to have some way to get at a coroutine object, or we could never start our coroutine runners at all :-). The idea is just that since this is something that only coroutine runners need to do, it should have a different spelling that users won't hit by accident.

I'd also add that with decorators and feature such as functools.partial() it gets pretty complicated.

Ugh, that's a great point. I guess in any proposal, wrappers like functools.partial() would need some explicit strategy to pass on the calling context to the wrapped function. In the __acall__ approach I can see how this would work in principle, though it's a bit cumbersome: partial objects would need an __acall__ method that defers to the wrapped object's __acall__ method. @1st1, in the frame introspection approach, how would you handle functools.partial?

1st1 commented 7 years ago

@1st1, in the frame introspection approach, how would you handle functools.partial?

Looks like it's already working in my PoC :)

async def baz():
    print('hi')

async def bar():
    await baz()  # try removing "await"

async def foo():
    coro = functools.partial(bar)
    await coro()

^-- works just fine.

1st1 commented 7 years ago

AST where we replace all terms of the form Await(value=Call(...)) with a term of the form AwaitCall(...).

But you wouldn't handle cases like this then:

await one_coroutine(another_coroutine())
njsmith commented 7 years ago

@1st1: I'm guessing that's a side-effect of functools.partial being implemented in C [and thus being invisible to the frame traceback machinery]? I'm not sure if that's a point in favor or against, given that we prefer not to have magical user-visible changes in behavior between C and Python implementations :-). But in any case, ok, how would you implement something like functools.partial in Python?

await one_coroutine(another_coroutine())

Err, that's supposed to be an error, right, and both proposals on the table would treat it as such? What do you mean by "wouldn't handle"?

1st1 commented 7 years ago

But in any case, ok, how would you implement something like functools.partial in Python?

Instead of looking at the last frame you can traverse until you see f_inawait. Isn't ideal though.

TBH I didn't think that you want both things at the same time:

I thought you want something as radical as PEP 3152, so my PoC does exactly that: it tries to enforce instantiating coroutines only in await expressions. functools.partial works by an accident.

What do you mean by "wouldn't handle"?

I mean that __acall__ will be used only by the await expression. The idea was to throw an error when a user simply calls a coroutine but doesn't await on it. But if you want to support partials (and code that creates similar wrappers) both our proposals wouldn't work.

You can only pick one thing:

njsmith commented 7 years ago

@1st1: I don't care about partial magically working; I just want it to be possible for it to work :-). Basically I think it should be possible for wrappers/partials/decorators to work, but it should require some intentional action on the part of the person writing the wrapper. (And of course functools.partial should take this action.)

Instead of looking at the last frame you can traverse until you see f_inawait. Isn't ideal though.

I don't think this works, because in practically any context where you can potentially forget an await, you'd have some frame in your callstack that has f_inawait set.

1st1 commented 7 years ago

Basically I think it should be possible for wrappers/partials/decorators to work, but it should require some intentional action on the part of the person writing the wrapper. (And of course functools.partial should take this action.)

At this point this all starts to sound very complicated to me. Given that:

I'd say that all the work we'll need to do to implement this isn't really worth it (at least in 3.7).

dabeaz commented 7 years ago

Personally, I feel like things are already rather complicated dealing with three different function types (synchronous, generators, coroutines). However, the fact that all of these things are basically "first class" and can be passed around freely opens up a lot of freedom regarding their use. All things equal, I'm not sure placing restrictions on how these functions get used in different contexts would make the situation better. If anything, it might make everything even more complicated.

Honestly, I don't have any deep thoughts regarding this with respect to Curio--I feel like the current behavior is something I can work with. It would be nice to have an easier way to know if you've been called by a coroutine for the purpose of some advanced decorator writing (Curio currently finds out via frame hacking). However, that's really a slightly different issue than this.

Out of curiosity, since I know you do a fair amount of training, have you done any async/await training yet? Part of the trigger for this thread was that when I wrote the trio tutorial aiming at teaching folks

I do not teach people async/await in training classes--although I very briefly mention it near the end of my advanced course. I'm not sure how fully people appreciate just how far beyond the day-to-day experience of most developers this async/await stuff is.

ncoghlan commented 7 years ago

As an argument against doing any special-casing here, I'll note that a similar problem exists for non-reusable context managers, and the "solution" for that is just a section in the contextlib documentation going over the problems that can arise if you attempt to reuse a context manager that doesn't support it: https://docs.python.org/3/library/contextlib.html#single-use-reusable-and-reentrant-context-managers

If we'd tried to bake any rules about ensuring that defined contexts were actually used into the language itself, then things like contextlib.ExitStack would have been much harder to write.

Heck, for folks coming from languages like MATLAB and Ruby with implicit call support, we even sometimes get complaints about statements like print "silently doing nothing" instead of being an alias for print().

That's just the nature of the beast - by making things first class entities that can be referenced without being invoked, we also make it possible for people to forget to invoke them.

Missing an await in that regard isn't really any different from missing a with or omitting trailing parentheses, it's just much newer than either of the latter.

dabeaz commented 7 years ago

Oh, that's a good on print without parens. Actually, a very common mistake I see in training classes is people forgetting to put parens on methods taking no arguments (e.g., f.close). So, forgetting to include some critical piece of syntax (with, (), etc.) is certainly something that affects many other parts of Python. Nature of the beast maybe.

njsmith commented 7 years ago

Hmm, ignore all my comments about set_coroutine_wrapper above, that's obviously useless because it only runs after the coroutine object is created (or not).

On the overall "is this useful" discussion: I guess I'll reiterate that where this all started is that I had a note that "hmm we should think about whether/how to propose something here", and I think that's still true :-)

I'm actually a bit frustrated at the argument that there are other places where Python can also let mistakes pass silently, because first, surely that just means we should think if we can fix those too [1]. And second, obviously this is a question of trade-offs: the context manager issue is fairly rare and not a big deal; OTOH it would be intolerable if Python silently accepted things like function calls where an argument was inadvertently omitted. Pointing out that one end of this spectrum exists doesn't tell us where await falls on it.

My personal suspicion is that we'll eventually come to consensus that the right conceptual model for await is as a special kind of function call, rather than the Future-based model that was inherited from C# and then sorta-halfway-rejected in the move to a yield from-based design. But this is only a prediction. It's based on some careful thought, but whether it's actually true or not is something we'll gradually learn more about over the next few months/years... and as we do then our perspective on these trade-offs may well change. In the mean time I'm happy to leave this issue here for now :-)


[1] but seriously tho – if we've got two people reporting that bare statements like print and f.close are tripping people up then surely there are lots and lots of people who are hitting this and not telling us, and isn't this is an amazing opportunity to make the world a better place by having that issue a SyntaxWarning or something? Is there a b.p.o issue for this yet?

ncoghlan commented 7 years ago

You can't emit a compile time warning about first class object references, as if you do, they're not first class anymore. From a Python compiler's point of view, f.close is no different from f.name, just as make_coroutine, make_generator, make_function, and make_value are all opaque identifiers.

Because of the REPL and try-except statements, you can't even issue a Warning, expression statement has no effect for non-call expression statements, since you may just be referencing an object to get the repr() of it (and the implicit assignment to _), or perhaps looking for a NameError, AttributeError, KeyError, or IndexError.

You can create languages which don't have first class runtime functions/classes/methods/etc, but then you need to introduce a special metasyntax for higher order metaprogramming, rather than have it be so natural that people may not even realise they're doing it (e.g. @classmethod, self.addCleanup(f.close), or sorted(seq, key=tuple).

You can potentially put more warnings into tools like mypy, since their control flow analysis can pipe up and say "this looks weird", and you're not typically going to be using them in a REPL.

But at compile time itself? No - Python deliberately defers too much to runtime to allow for that. (Even the "Missing parentheses in call to print" hack is implemented in the SyntaxError constructor for cases that are already Py3 syntax errors - it doesn't catch the bare print case)

ncoghlan commented 7 years ago

As far as await being a special kind of function call goes, yes, that's exactly what it is - it's a function call that also serves as a local suspend point for concurrency management purposes.

The equivalent to forgetting an await in synchronous code is to call a factory function (thus getting a callable back), and then forgetting to call the returned callable that actually does what you want.

The difference is that in the latter case you don't get any warning at all (see previous comment about there being no reasonable opportunity to complain about "uncalled functions"), while in the asynchronous case we can be more confident that if someone got as far as creating the coroutine (rather than just referencing the coroutine factory), they actually meant to wait for it run at some point (similar to the way interpreter shutdown auto-joins non-daemon threads).

njsmith commented 7 years ago

@ncoghlan: it's off-topic here, but I would suggest that Python should also issue a warning on a bare f.name statement, which is also probably a mistake in most cases. The REPL isn't an obstacle; the parser knows whether a bare statement is going to be passed to sys.displayhook (cf compile(... mode="single")) and of course wouldn't issue a warning on such cases b/c they obviously aren't no-ops. The try-and-catch-AttributeError-or-NameError case is legitimate (it's the only case I could think of too), but there are other ways to handle that use case that are even shorter and just as clear (e.g. using if hasattr(...)), and it'd even be possible to write an automated fixup script to convert most cases (cf golang's nice use of this strategy). If the benefit is a smoother onramp for users in general then it seems plausibly worth it to me.

1st1 commented 7 years ago

The try-and-catch-AttributeError-or-NameError case is legitimate

And I've seen a lot of code using this. Even in my code I frequently use this pattern to check if a package version supports some new-ish API. Personally I would hate Python forcing me to use hasattr (or throwing a warning if I don't).

njsmith commented 7 years ago

@1st1: the fact that the response to "hey look we have an opportunity to reduce user frustration" is "well I would hate that" with absolutely no justification says a lot about why I pick my battles with python-dev :-/

Carreau commented 7 years ago

I would +1 on such warnings (bare name and non awaited coroutines), they do not need to be visible by default (like DeprecationWarnings) but would be useful in testsuites, or could be enabled by default on some alternative repls.

1st1 commented 7 years ago

with absolutely no justification says a lot about why I pick my battles with python-dev :-/

I thought that the justification was clear, sorry :) I wrote that "I've seen a lot of code that uses this pattern" => a number of people use it currently => some of them won't like to convert their code with a tool or manually => this new feature will upset a number of people (me included, but that wasn't the point).

Now the real question is the ratio of happy people vs unhappy. I doubt you have the answer. Maybe we can ask PyCharm guys if they have some kind of stats on this subject (I don't know if they have some anonymous stats collection or not).

1st1 commented 7 years ago

FWIW, there's a trending topic on reddit/python: What are some WTFs (still) in Python 3?

Top things people seem to struggle with:

  1. is vs ==. Why 1 is 1 isn't the same as 1000 is 1000.
  2. Arguments defaults are calculated once: def a(foo=[]).
  3. Bindings in for loops: [(lambda: i) for i in range(10)] (this one bugs even advanced users).
  4. Operators chaining: "a" in "aa" in "aaa".

I believe most of those issues (including forgetting to put await or parens) should be handled by IDEs and tools like mypy.

When I proposed to add f_inawait, I thought we could add a little hack, that would enable people to write frameworks where it's not possible to call a coroutine without await. The solution isn't flawless or elegant, but it's simple. Then the discussion started to evolve around fundamentally augmenting async/await protocol, which I think is just an overkill.

ncoghlan commented 7 years ago

@njsmith Using the same runtime syntax for both "ordinary" programming and metaprogramming has consequences. The payoff is that you make it easier for people to take the step up to doing their own metaprogramming. The cost is that you lose the ability for the compiler to readily warn about certain things - it doesn't mean IDEs can't warn about it (especially beginner oriented ones like Mu that can make different assumptions from those the reference interpreter makes).

And if an instructor's reaction to that recommendation is "But I want to teach absolute beginners to program with a plain text editor, not an IDE", then they need to go read http://blog.osteele.com/posts/2004/11/ides and ask themselves some serious questions about whether or not that approach is really in the best interests of their prospective students.

kalvdans commented 7 years ago

I usually notices it on the RuntimeWarning that is triggered when the garbage collector comes around. It can be converted to error by the warnings module.

$ python3 -W error::RuntimeWarning::
>>> async def foo(): pass
... 
>>> x = foo()
>>> del x
RuntimeWarning: coroutine 'foo' was never awaited

During handling of the above exception, another exception occurred:

SystemError: PyEval_EvalFrameEx returned a result with an error set
njsmith commented 7 years ago

Huh, it looks like I started responding here way back when and then never posted... whoops.

@kalvdans: Unfortunately, the garbage collector then converts the error back into a warning! The interpreter has a hard-coded behavior that if a __del__ method raises an exception, it gets printed to the console and then ignored. There are good reasons for this, but it sharply limits our options here.

(That SystemError looks like an interpreter bug though; you might want to report that.)

And unfortunately relying on the garbage collector doesn't work very well on pypy; e.g. there if you're trying things in the REPL and forget an await you usually just don't get any warning or anything, because the GC hasn't run yet – there's an example here. Or likewise if the error makes your program crash immediately (they don't run the GC at shutdown).

@1st1: ah, I see, I thought you were saying "I insist on try/except instead of hasattr just because", rather than making a point about transition costs. Sorry for being cranky.

@ncoghlan: I kinda don't know what metaprogramming has to do with anything here, and even less how IDEs are related...? Basically my point is that there are two sensible operations on async functions: calling them, currently spelled await f(), and getting a coroutine object, currently spelled f(). The first operation is the common one that users do all the time; the second one is an extremely obscure feature that probably only 100 people in the world need to actually understand. Obviously these should both be possible; I just find it unfortunate that the one almost nobody needs gets the super-short super-ordinary-looking spelling. I think it would be better if the weird one that is "internalsy" were spelled like f.__make_coroutine__() or so. This seems like a pretty uncontroversial application of a very standard Python design principle, as far as it goes :-).

The problems are (a) asyncio has a rather complex model with explicit futures, explicit coroutines, several ways to convert coroutines into futures, await on futures, etc., so if you're coming from that perspective then you've kind of already accepted that you're going to have lots of confusing things to keep track of and await f() versus f() is just one more and doesn't stand out the way it does with simpler designs like curio/trio, (b) the ship has sailed, people are shipping code with ensure_future(f()) and similar, (c) my preferred way of doing things makes it a little more complicated to write certain kinds of simple color-agnostic wrapper functions like functools.partial (but also easier to write others, I think, and for most non-trivial wrappers it probably doesn't matter... though OTOH most wrappers are trivial).

Maybe these things mean there's no solution. But I stand by my claim that by Python standards it's a very large language wart (unfortunately only in retrospect!).

Top things people seem to struggle with:

I agree that those are annoying, but they all require relatively rare situations and are relatively easy to remember once you know them. I have run into the for loop thing and I've caught myself in time to work around the mutable arguments thing, maybe... once or twice in the last year? OTOH I've been writing async/await code pretty intensively for a few months now and I still get a "coroutine not awaited" once or twice every day. I mean, which do you do more often: attempt to apply closure capture to a for loop index, or call a function?

ncoghlan commented 7 years ago

@njsmith The whole point of the async/await syntax is to explicitly mark suspension points in an otherwise sequential piece of code. It's akin to the structural dynamics of context managers where the code layout describes the control flow independently of the specific thing being done:

async def op():
    # do sequential things
    response = await ... # wait for something
    # do more sequential things
    await ... # wait for something else

There's absolutely no semantic difference between:

await cr()

and:

x = cr()
await x

just as there's no difference between:

with cm():
    ...

and:

x = cm()
with x:
    ...

That loose protocol based coupling is precisely how systems like coroutines and context management are built atop Python's object-oriented core - the objects implementing them aren't special syntactically, they're just ordinary instances of classes that happen to implement particular protocols.

Sure, native coroutines could have been deemed "too special" to use __call__ for their standard construction interface and instead defaulted to requiring await, but that would have made them different from non-native coroutines, iterators, generators, context managers, containers, numbers, and user defined classes for nothing other than earlier diagnosis of a particular operation being a no-op rather than suspending the current coroutine as intended.

And that's where the relevance of the metaclass system comes in: even class definitions are just syntactic sugar for instantiating a particular kind of object, and object instantiation uses the call syntax, rather than hidden purpose specific protocols. Adopting a different approach for native coroutine definitions would have been entirely at odds with the overall language design.

However, this is still the kind of error that static analysers can help catch - if they know a particular API produces coroutines (whether native or non-native), then they can use control flow analysis to check whether or not that co-routine is reliably awaited before reaching the end of the coroutine that created it. This means that even if Python itself doesn't complain, an IDE can still say "Hey, you may have a problem here", just as they complain about names that aren't either visible in the current module, or else one of the standard builtins.

Carreau commented 7 years ago

Reviving this,

In trio would it be possible to have a compromise ? You might not be able to enforce that non-trio code have awaited all their coroutine, but every time you switch back into trio's internal you could check that all created coroutine since last time have a state in CORO_RUNNING,SUSPENDED, or CORO_CLOSED ? that is to say any coroutines in CORO_CREATED state is an error.

this mean that thing like:

await trio.sleep()
coro = cr()
... stuff
await coro()
await trio.sleep()

perfectly work. Alternatively you can provide users with a context manager that turns off the protection for a chunk of code and/or mark functions which are allowed to do so.

You would of course not get something perfect, but would catch forgotten await faster.

ncoghlan commented 7 years ago

@Carreau Oh, I like that. It would make it a design decision for the scheduler designer that if you create a coroutine, they expect that coroutine to be the very next thing you await. So in the above example:

coro = cr()
await trio.sleep() # Error: coroutine created, but not awaited

While you could still allow something like:

coro = await_later(cr())
await trio.sleep() # This is fine due to the "await_later()" call
await coro
Carreau commented 7 years ago

I have a prototype, though the tracebacks are deeply confusing. it mostly looking like it raises where the coroutine function is defined and not where the coroutine is created, but I believe that may still be useful. I'll try to make a patch to trio later this weekend/week.

On May 26, 2017 21:50, "Nick Coghlan" notifications@github.com wrote:

@Carreau https://github.com/carreau Oh, I like that. It would make it a design decision for the scheduler designer that if you create a coroutine, they expect that coroutine to be the very next thing you await. So in the above example:

coro = cr() await trio.sleep() # Error: coroutine created, but not awaited

While you could still allow something like:

coro = await_later(cr()) await trio.sleep() # This is fine due to the "await_later()" call await coro

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/python-trio/trio/issues/79#issuecomment-304427462, or mute the thread https://github.com/notifications/unsubscribe-auth/AAUez-DV3P50I8xSCmNwsPKQxWUt-TAsks5r96ungaJpZM4MZ14o .

njsmith commented 7 years ago

@Carreau's proposal has the nice property that the errors are prompt and don't depend on the garbage collector. It turns out that this is a real problem: I was trying to figure out how to at least teach trio's test harness to detect these warnings so it can raise a real error, and in many cases it's easy enough to do by intercepting the warning message. And then we can make it more robust by forcing the cycle collector to run at the end of each test to make sure that warnings are emitted. But... consider code like data = sock.recv(100); await sock.sendall(data)TypeError: sendall expected a <bytes>, not a <coroutine>. Here the coroutine is stored in a local variable, so the exception traceback will pin the coroutine object in memory until the test framework is done doing whatever it wants to do with the traceback, and it's tricky to write a test plugin that can change the outcome of a test after the test framework has already processed and discarded the traceback. But that's what we need to do if we want to correctly classify this test. Writing test plugins is already an exercise in delicate monkeypatching and this makes it extra super tricky.

The proposal is possible to implement now using sys.setcoroutinewrapper. The problem is that creating a wrapper object for every coroutine call probably adds some noticeable overhead, so it's hard to justify as an always-on feature... maybe the overhead is ok in a test suite though? I suppose I should measure instead of speculating, but it's certainly going to be something.

@1st1: Based on this I think I should change my request from our discussion last week: instead of having a global counter of how many times the coroutine '...' was never awaited warning was emitted, I think what I actually want is a global count of how many coroutine objects have been created but never awaited. (So __call__ increments, and the first send/throw/__next__ decrements.) At moments when everything is supposed to be finished running (e.g., the end of a test) then this should give the same results as my original idea, except that the new version is robust against the issue with tracebacks pinning coroutine objects in memory. And during normal execution, it works just as well as something for me to check occasionally inside the run loop, while being much more prompt about detecting errors. (E.g., it can actually give prompt and reliable errors when using a REPL under PyPy, which is hopeless when using the garbage collector.)

(I think by "global" I mean "thread local".)

ncoghlan commented 7 years ago

@Carreau An API worth looking at to potentially improve debuggability when running under tracemalloc: https://docs.python.org/3/library/tracemalloc.html#tracemalloc.get_object_traceback

That will let you report where the un-awaited object was allocated in addition to the location where you noticed that it hadn't been awaited yet.

njsmith commented 7 years ago

I just filed a bug w/ CPython requesting the optimized version of this, see: https://bugs.python.org/issue30491

Carreau commented 7 years ago

@ncoghlan sweet, I'll look at it.

I gave a shot at an implementation with set_coro_wrapper in #176.

njsmith commented 7 years ago

Some notes because in a year or whatever I'll probably look back at this and want to remember:

I talked about this at the language summit, and talked to Yury about it afterwards (slides). The more ambitious proposal in those slides is similar but not identical to one in the thread up above:

See, it's cleverly designed to have something for everyone!

As I understood it, Yury had two main objections to this:

  1. asyncio users don't really want async functions to return Tasks, they actually just want bare coroutine objects that they can pass to asyncio.gather or whatever. ...on further thought this doesn't make a lot of sense to me though, because asyncio.gather will just wrap any any bare coroutine objects in Tasks anyway, so my proposal doesn't hurt anything. Likely I misunderstood something. (Or maybe he just meant that this would only be a minor improvement for asyncio rather than a major one. I personally definitely find it confusing that asyncio has at least 3 different kinds of async functions – ones that return Futures, ones that return coroutine objects, and ones that return Futures but that are documented as returning coroutine objects because they want to reserve the right to change them in the future – and collapsing these down to just have one type of async function seems like a win to me. But maybe it's only a minor win, I dunno.)

  2. More importantly, he's concerned that implementing this would be really difficult, because (a) modifying the parser is annoying, (b) there are a whole bunch of opcodes for calling functions (with or without * or ** args, an optimized CALL_METHOD, etc. etc.), and this adds a whole new type of calling operation, so we'd need to double the number of call opcodes and duplicate a bunch of code and ugh.

So if/when someone decides to take this up again, I think these are the main things that would need answers.

I guess the obvious thing to try to resolve objection (2) would be the RESOLVE_AWAITCALL opcode suggested up-thread. I'm not sure if Yury has some reason in mind why it wouldn't work.

ncoghlan commented 7 years ago

There's a more fundamental objection to special casing await foo(): it breaks the notion of "in the absence of scope changes, any subexpression can be replaced with a name referring to that subexpression without altering the semantics of the overall expression"

That is:

await foo()

would no longer mean the same thing as:

extracted = foo()
await extracted

and that's simply not OK, even if you come up with some clever implementation hacks to enable them to be different.

njsmith commented 7 years ago

@ncoghlan: OK, but... that's a circular argument. My proposal only breaks substitutability of subexpressions if we decide that in await foo(1, 2, 3), the foo(1, 2, 3) part is a subexpression. But that's exactly the point under discussion.

Analogy: In modern Python, this:

foo(1, 2, 3)

does not mean the same thing as:

extracted = 1, 2, 3
foo(extracted)

It didn't have to be this way – in fact, once upon a time these were equivalent! If you search Misc/HISTORY for "sleepless night" you can see Guido agonizing about whether to change it. But the current behavior is fine and doesn't violate your dictum (even though it looks like it should), because we decided to define ...(..., ..., ...) as being a single piece of syntax, rather than being composed out of call parentheses ...(...) plus tuple constructor commas ..., ..., .... In fact it's obvious now that this change made the language better.

My contention is that Python would also be a better language – easier to understand, teach, use, etc. – if we were to decide that in the expression await foo(1, 2, 3), the await ...(..., ..., ...) part was defined as a single piece of syntax, instead of being composed out of await ... plus call parentheses ...(..., ..., ...). It's not a trivial point – the trio tutorial adopts this stance when teaching async/await, and I had multiple people seeking me out at PyCon specifically to thank me for that tutorial, saying it was the only async/await explanation they'd ever understood. And the tutorial's explanation isn't oversimplified at all; it really teaches you everything you need to know to write arbitrary programs using trio. Yet it's still 2x longer than it needs to be, because literally half of the explanation is about how to cope with the problems caused by foo(1, 2, 3) being a subexpression. (See slide 34.)

You can certainly disagree with my claim about how Python should work, or agree with it but feel that it's too late to do anything. But you can't say that that a proposal to change the definition of subexpressions is bad because it violates substitutability of subexpressions :-)