python / cpython

The Python programming language
https://www.python.org
Other
62.55k stars 30.02k forks source link

Define a general "asynchronous operation introspection" protocol #75413

Open ncoghlan opened 7 years ago

ncoghlan commented 7 years ago
BPO 31230
Nosy @ncoghlan, @1st1, @syncosmic

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-feature', '3.7'] title = 'Define a general "asynchronous operation introspection" protocol' updated_at = user = 'https://github.com/ncoghlan' ``` bugs.python.org fields: ```python activity = actor = 'yselivanov' assignee = 'none' closed = False closed_date = None closer = None components = [] creation = creator = 'ncoghlan' dependencies = [] files = [] hgrepos = [] issue_num = 31230 keywords = [] message_count = 10.0 messages = ['300468', '300471', '300472', '300475', '300476', '300477', '300517', '300564', '300580', '300587'] nosy_count = 3.0 nosy_names = ['ncoghlan', 'yselivanov', 'syncosmic'] pr_nums = [] priority = 'normal' resolution = None stage = 'needs patch' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue31230' versions = ['Python 3.7'] ```

ncoghlan commented 7 years ago

As covered in bpo-31183, correctly disassembling functions, generator-iterators, coroutines, and async generator-iterators currently requires checking for 4 different attributes:

This could be simplified if the latter three all also exposed their code attributes as __code__.

58906d66-e9ec-47a6-a9f5-9ed5040dd8be commented 7 years ago

Some possibly helpful background (adapted from a discussion in PR 3077):

It looks as though gi_code was added to generators in bpo-1473257. At this time, function bytecode was still stored in f.func_code, so gi_code was a clear analogy.

My best guess is that gi_code was not chosen to deliberately create a namespace difference between functions and generators, but just because it wouldn't have made sense to add a func_code attribute to generators. In other words, if functions had had code_object instead of func_code at the time, bpo-1473257 might well have proposed code_object for generators. This doesn't, of course, mean that there might not be some post hoc reason to distinguish this part of the namespace now.

Then func_code was changed in 3.0 to __code__ as part of the f.func_X to f.__X__ renaming. The 3.0 whatsnew explained that the purpose of this change was to "free up [the f.func_X] names in the function attribute namespace for user-defined attributes". But this wasn't done for the analogous code object in generators. On a quick look, I didn't find any discussion of this at the time, but that doesn't mean there wasn't any.

58906d66-e9ec-47a6-a9f5-9ed5040dd8be commented 7 years ago

A related issue (since this issue is contemplating restructuring these objects anyway):

Other than ??_code, none of the f.func_X attributes which were eliminated in 3.0 have direct equivalents in generator-iterators, coroutines, or async generator-iterators.

However, all three of those have [gi|cr|ag]_running and [gi|cr|ag]_frame attributes. Generator-iterators also have gi_yieldfrom, and coroutines and async generator-iterators have [cr|ag]_await.

On a clean slate (with no attention paid to issues around breaking changes or how these attributes are already used in existing code), is there an argument for dundering some or all of these along with __code__? Or is special casing a better pattern when dealing with these other attributes?

ncoghlan commented 7 years ago

I think there's a strong case for a generic __frame__ attribute, since there are plenty of useful things you can do given "object with a linked frame" as a protocol (like extract the current locals namespace). We may even want to include traceback objects in that particular attribute access consolidation (since they have a tb_frame attribute).

Looking at the way inspect.getgeneratorstate() and inspect.getcoroutinestate() are implemented, I also think there may be a case for consolidating [gi|cr|ag]_running into a single __running__ attribute, such that for objects that provide a __frame__ attribute:

Then, rather than adding an inspect API specifically for async generators, we could add a general "inspect.getasyncstate()" one that covered all the object types, with the possible states:

ASYNC_RUNNING: __running__ == True, assert __frame__ is not None
ASYNC_CREATED: __running__ == False, __frame__.f_lasti == -1
ASYNC_CLOSED: __running__ == False, __frame__ is None
ASYNC_SUSPENDED: __running__ == False, neither closed nor created

Properly interpreting gi_yieldfrom and [cr|ag]_await is a bit trickier, since they require additional knowledge of how the control flow for those objects actually work. However, if we run with the "async operation introspection protocol" idea, then a suitably generic name would be __async_call__, and we could make it a doubly-linked list for the coroutine and async generator case by setting an __async_return__ attribute on the target (to say where we expect control to return to when we're done).

ncoghlan commented 7 years ago

Actually, __async_return__ would be applicable for the synchronous generator case as well, since it would link to the generator containing the yield from expression.

ncoghlan commented 7 years ago

Slight amendment: __delegated_to__ and __returns_to__ might be better names for the doubly-linked list of the async call chain.

"__async_call" and "__async_return" both have the problem that they look like imperative commands (since "call" and "return" are typically used as verbs), and collection referring to "yield from" and "await" as "asynchronous delegation" would help avoid potential confusion with regular synchronous function calls.

I've also retitled the issue to cover the broader scope that also addresses the needs of the inspect module, not just the dis module.

58906d66-e9ec-47a6-a9f5-9ed5040dd8be commented 7 years ago

I like where this is heading! Aside from the cleaner patterns for handling these objects, I think it'll make it a little easier for people who are just starting to use asynchronous objects in Python (e.g. me) to grasp what's similar about them.

+1 that __async_call__ could be confusing, for the general reason you mention, but in particular because it would look exactly analogous to function.__call__() (what the French call a "faux ami").

A not-thought-out alternative: what about noun-ing the verbs into __async_calls__ and __async_returns__ (or maybe __async_returns_to__)?

BTW, I was thinking of taking a quick run at bpo-31197 while this simmers. I don't /think/ that risks being eventually mooted by these changes; if anything, it might be easier to adapt dis to this proposal if that refactoring has already been done. LMK if you think that's the wrong order, though.

ncoghlan commented 7 years ago

I agree https://bugs.python.org/issue31197 is orthogonal - refactoring the current logic is useful regardless of what we do at the object attribute layer.

Regarding __delegated_to/returns_to__, the reason I like *not* having "async" in the attribute names is that generators are actually used as a synchronous construct (via synchronous for loops and next() calls), even though they technically define an asynchronous operation from the interpreter's point of view.

The reason I like omitting "call" (or "calls") from the name is that even though "yield from" and "await" are both sometimes described as providing syntax for asynchronous calls, that terminology is really only defensible for "await" - PEP-380 describes the "yield from" operation as delegating to a subgenerator, and I think that's a better way of framing the general concept.

ncoghlan commented 7 years ago

I started a local PR at https://github.com/ncoghlan/cpython/pull/1/files to explore what this might look like in practice.

I think that what I've done so far shows that generic __frame and __running attributes would be sufficient to extend the inspect module's state introspection support to also cover async generators, while also allowing for the soft (i.e. documentation only) deprecation of the generator and coroutine specific variants of those APIs.

bpo-31183 already showed the potential value of a __code__ attribute, since it would allow all of the asynchronous operations to be handled by the same code path that already handles functions.

I'm less sure about __delegated_to/returns_to__, since we don't have *any* code in the standard library that reads gi_yieldfrom, and the only code that reads cr_await is a Python 3.5 compatibility hack in asyncio.

1st1 commented 7 years ago

I started a local PR at https://github.com/ncoghlan/cpython/pull/1/files to explore what this might look like in practice.

Looks good to me.

I'm less sure about __delegated_to/returns_to__, since we don't have *any* code in the standard library that reads gi_yieldfrom, and the only code that reads cr_await is a Python 3.5 compatibility hack in asyncio.

I'm -1 on this too. gi_yieldfrom and cr_await are very special and rarely used things. Refactoring or unifying them isn't trivial and ultimately not as useful.