python / cpython

The Python programming language
https://www.python.org/
Other
61.15k stars 29.51k forks source link

Improve docs for await expression #83266

Open aeros opened 4 years ago

aeros commented 4 years ago
BPO 39085
Nosy @njsmith, @asvetlov, @1st1, @aeros

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = 'https://github.com/aeros' closed_at = None created_at = labels = ['3.9', 'type-feature', '3.8', 'docs', 'expert-asyncio'] title = 'Improve docs for await expression' updated_at = user = 'https://github.com/aeros' ``` bugs.python.org fields: ```python activity = actor = 'wwallace' assignee = 'aeros' closed = False closed_date = None closer = None components = ['Documentation', 'asyncio'] creation = creator = 'aeros' dependencies = [] files = [] hgrepos = [] issue_num = 39085 keywords = [] message_count = 4.0 messages = ['358610', '358621', '358660', '372594'] nosy_count = 5.0 nosy_names = ['njs', 'asvetlov', 'yselivanov', 'aeros', 'wwallace'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue39085' versions = ['Python 3.8', 'Python 3.9'] ```

aeros commented 4 years ago

For context, I decided to open this issue after receiving a substantial volume of very similar questions and misconceptions from users of asyncio and trio about what await does, mostly within a dedicated "async" topical help chat (in the "Python Discord" community). For the most part, the brief explanation provided in the language reference docs (https://docs.python.org/3/reference/expressions.html#await-expression) did not help to clear up their understanding.

Also, speaking from personal experience, I did not have a clear understanding of what await actually did until I gained some experience working with asyncio. When I read the language reference definition for the await expression for the first time, it did not make much sense to me either.

As a result, I think the documentation for the await expression could be made significantly more clear. To users that are already familiar with asynchronous programming it likely makes more sense, but I don't think it's as helpful as it could be for those who are trying to fundamentally understand how await works (without having prior experience):

Suspend the execution of coroutine on an awaitable object. Can only be used inside a coroutine function.

(https://docs.python.org/3/reference/expressions.html#await-expression)

(Also, note that there's a typo in the current version, "of coroutine" should probably be "of a coroutine")

While this explanation is technically accurate, it also looks to be the _shortest_ one out of all of the defined expressions on the page. To me, this doesn't seem right considering that the await expression is not the easiest one to learn or understand.

The vast majority of the questions and misunderstandings on await that I've seen typically fall under some variation of one of the following:

1) What exactly is being suspended? 2) When is it resumed/unsuspended? 3) How is it useful?

From what I can tell, (1) is unclear to them is partly because the awaitable object that is after the await can be a coroutine object. It's not at all uncommon to see "await some_coro()".

I think this would be much more clear if it were to instead be something along the lines of one the following (changes indicated with *):

1) "Suspend the execution of *the current coroutine function* on an awaitable object. Can only be used inside a coroutine function."

Where "the current coroutine function" is the coroutine function that contains the await expression. I think this would help to clear up the first question, "What exactly is being suspended?".

2) "Suspend the execution of *the current coroutine function on an awaitable object. *The coroutine function is resumed when the awaitable object is completed and returns its result. Can only be used inside a coroutine function."

This would likely help to clear up "When is it resumed/unsuspended?".

Optimally, this definition could also include some form of example code like several of the other expressions have. It's not particularly easy to use a demonstrable example without using an async library (such as asyncio), but using a specific async library would not make sense to have in this location of the docs because the language reference is supposed to be as implementation agnostic as possible.

However, I think a very brief visual example with some explanation could still be useful for explaining the basics of how await works:

3)

async def coro():
    # before await
    await some_awaitable
    # after await

When the coroutine function coro() is executed, it will behave roughly the same as any subroutine function in the "before await" section. However, upon reaching await some_awaitable, the execution of coro() will be suspended on some_awaitable, preventing the execution of anything in the "after await" section until some_awaitable is completed. This process is repeated with successive await expressions. Also, multiple coroutines can be suspended at the same time.

Suspension can be used to indicate that other coroutines can be executed in the meantime. This can be used to write asynchronous and concurrent programs without the usage of callbacks.

Including the brief example and explanation would likely help to further clear up all three of the questions.

The present version has a high degree of technical accuracy, but I don't think its as helpful as it could be for furthering the understanding of users or providing an introduction to the await expression. I'm sure that there will still be some questions regarding await even if any of these changes are made, but it would at least provide a good place to link to for an informative explanation of await that's entirely agnostic from any specific implementation.

I'm entirely open to any alternative suggestions, or making a change that's some combination or variation of the above three ideas. Alternatively, if there are determined to be no suitable changes that would be both technically accurate and more helpful to users, I could just apply a fix to the typo.

If any of these ideas are approved, I'll likely open a PR.

asvetlov commented 4 years ago

Thanks for raising the very interesting question!

Sorry, my English is bad; I cannot help with docs too much.

Anyway, technically an awaited coroutine *can* be suspended but the suspension is not always necessary. The most deep awaited function decides.

For example, if you want to read 16 bytes from a stream and these bytes are already fetched there is no suspension at this point (at least libraries are designed in this way usually).

Also, technical speaking about awaits is hard without telling that a coroutine is a specialized version of generator object with (partially) overlapped methods and properties, e.g. send() and throw().

To run a coroutine you need a framework which calls these methods depending on the framework rules, the rules for asyncio are different from trio.

Not sure how long should be the section but looking on yield expressions https://docs.python.org/3/reference/expressions.html#yield-expressions above I expect that awaits can take two-three times longer.

aeros commented 4 years ago

Sorry, my English is bad; I cannot help with docs too much.

No problem. Your feedback is still incredibly helpful and very much appreciated either way. (:

Anyway, technically an awaited coroutine *can* be suspended but the suspension is not always necessary. The most deep awaited function decides.

Ah, I see. It took a bit of experimentation for me to understand how this works, but I think that I get it now. Specifically, the suspension occurs when the deepest coroutine function awaits an awaitable object and a yield is reached (usually through a defined __await__ method that returns an iterator). When that awaitable object is completed and returns, the coroutine function with the await (and everything else that directly awaited it) is resumed. A good example of this is asyncio.sleep(0), as it just awaits __sleep0(), which is just a generator-based coroutine with a bare yield.

import asyncio
import inspect

tracked_coro = None

async def main():
  # loop isn't entirely needed, just used to track event loop time
  loop = asyncio.get_running_loop()
  global tracked_coro
  tracked_coro = coro(loop)
  await asyncio.gather(tracked_coro, other_coro(loop))

# This is the coroutine being tracked
async def coro(loop):
  print(loop.time())
  print("Start of coro():",
           inspect.getcoroutinestate(tracked_coro))
  await nested_coro(loop)

async def nested_coro(loop):
  print(loop.time())
  # coro() is not suspended yet, because we did not reach a `yield`
  print("Start of nested_coro():",
           inspect.getcoroutinestate(tracked_coro))
  # This will call await `__sleep0()`, reaching a `yield` which suspends `coro()` and `nested_coro()`
  await asyncio.sleep(0)
  print(loop.time())
  print("After the await, coro() is resumed:",
           inspect.getcoroutinestate(tracked_coro))

async def other_coro(loop):
  print(loop.time())
  print("Start of other_coro():",
           inspect.getcoroutinestate(tracked_coro))

asyncio.run(main())

Output:

8687.907528533
Start of coro(): CORO_RUNNING
8687.907800424
Start of nested_coro(): CORO_RUNNING
8687.912218812
Start of other_coro(): CORO_SUSPENDED
8687.912291694
After the await, coro() is resumed: CORO_RUNNING

For example, if you want to read 16 bytes from a stream and these bytes are already fetched there is no suspension at this point (at least libraries are designed in this way)

After realizing that the suspend only occurs when yield is reached, I think I understand how this works for StreamReader.read().

In sum, a yield is reached when read() is called with an empty buffer, resulting in await self._wait_for_data('read'). Specifically within _wait_for_data(), the yield is reached within await self._waiter (because _waiter is a Future, which defines an __await__ method with a yield). However, if read() is called after the bytes were fetched and are contained in the buffer, the bytes are read from the buffer and returned directly without ever reaching a yield; thus there is no suspension that occurs.

Is my interpretation mostly correct? I want to make sure that I have a good understanding of how await really works, as that will both help with improving the documentation of the await expression and improve my understanding of asyncio.

Also, technical speaking about awaits is hard without telling that a coroutine is a specialized version of generator object with (partially) overlapped methods and properties, e.g. send() and throw().

Good point. If I understand correctly, send() and throw() were specifically added to the generator API in PEP-342 for the purpose of implementing coroutines in the first place, so it makes sense to explain how they relate to await.

To run a coroutine you need a framework which calls these methods depending on the framework rules, the rules for asyncio are different from trio.

That's mainly why I added Nathaniel to the nosy list. I wanted to make sure that we describe the await expression in a way that's as accurate and informative as possible for both, as well as any other async library that uses await.

Not sure how long should be the section but looking on yield expressions https://docs.python.org/3/reference/expressions.html#yield-expressions above I expect that awaits can take two-three times longer.

That would be a great goal to move towards, but I think that might have to be completed in multiple steps over a longer period of time rather than in a single change. Even if it ends up being not quite as long as 2-3 times the length of the reference for the yield expression, I think we can still make a substantial improvement to the existing version.

fc050d42-6105-4377-a08c-d5e98d8f0879 commented 4 years ago

There are a few other places on the documentation that are imprecise or misleading for await. While the information needed is scattered around the docs, I think these can also be improved. I'm pretty sure these fit with this issue.

Developing with asyncio guide: https://docs.python.org/3/library/asyncio-dev.html#concurrency-and-multithreading - first paragraph, "When a Task executes an await expression, the running Task gets suspended, and the event loop executes the next Task." Taken by itself it isn't clear that execution of the awaitable starts immediately without intervention from the event loop. A possible fix might be to add "If the awaited expression is a coroutine its execution begins immediately and the running Task will not suspend till the awaited expression stalls waiting for a result."

The same sentence is found in the Task documentation: https://docs.python.org/3/library/asyncio-task.html#task-object

gvanrossum commented 1 year ago

Please keep me posted on any PR for this. This is a very tricky issue, since what the language promises about await is quite different from how async frameworks (asyncio and Trio, basically) require you to think about it.

It is just a fact that the language reference is generally not the place people should go to learn first about language features. But clearly the one sentence about await is also insufficient. I wonder if we should emphasize that await x is very similar to yield from x. Someone should research the differences. I also am not sure that the docs for yield from are all that much clearer -- but they are a bit more specific than those for await.

willingc commented 3 weeks ago

Related to https://github.com/python/cpython/issues/79012