python / cpython

The Python programming language
https://www.python.org
Other
63.36k stars 30.34k forks source link

Context manager support for `contextvars.Context` #99633

Open rhansen opened 1 year ago

rhansen commented 1 year ago

Feature or enhancement

I would like to add __enter__ and __exit__ methods to contextvars.Context such that these two are mostly equivalent:

contextvars.copy_context().run(do_something_interesting)

and:

with contextvars.copy_context():
    do_something_interesting()

Pitch

This makes it possible to combine Context enter/exit with other context managers:

import contextlib
import contextvars

@contextlib.contextmanager
def foo():
    with contextvars.copy_context():
        set_some_context_vars()
        yield

My personal motivating interest is controlling ContextVar values in pytest fixtures. For example:

import contextvars
import pytest
import module_under_test

@pytest.fixture
def clean_slate():
    with contextvars.copy_context():
        module_under_test._internal_state_context_var.set(None)
        yield

def test_foo(clean_slate):
    assert not module_under_test.is_initialized()

Previous discussion

https://stackoverflow.com/q/59264158

Linked PRs

gvanrossum commented 1 year ago

The idea should probably be reviewed by @1st1.

rhansen commented 1 year ago

Friendly ping @1st1

apostolos-geyer commented 5 months ago

+1

gvanrossum commented 5 months ago

Maybe someone could contribute a PR?

rhansen commented 3 months ago

Maybe someone could contribute a PR?

@gvanrossum I opened a PR at the same time I opened this feature request, see #99634. I just rebased it onto latest main and fixed some documentation warnings.

cjw296 commented 2 months ago

This isn't just asyncio, right? ContextVar is also useful for threading and forked/subprocess applications...

I'm a little surprised a thing called a Context was landed without context manager support 😅 ...

Anyway, the PR looks great, but until Python 3.14 comes out, is there any way we can get this context manager behaviour on 3.12 (or even earlier!)?

picnixz commented 2 months ago

Anyway, the PR looks great, but until Python 3.14 comes out, is there any way we can get this context manager behaviour on 3.12 (or even earlier!)?

I'm not sure we can. Neither 3.12 and 3.13 accept features now (per PEP 719, features to 3.13 are no more accepted since May).

cjw296 commented 1 month ago

That's not what I meant: I was curious if there's any way to do in pure python, rather than doing some kind of backported pypi package containing the C changes, what's in https://github.com/python/cpython/pull/99634 such that it could be used in, say, 3.11+

picnixz commented 1 month ago

Oh sorry for misunderstanding it! Mmh, unless we can implicitly call the run method on the body of the with-statement, then it's maybe possible but otherwise I don't know =/

rhansen commented 1 month ago

I was curious if there's any way to do in pure python, rather than doing some kind of backported pypi package containing the C changes, what's in #99634 such that it could be used in, say, 3.11+

No, it's not possible. Probably the easiest solution is to use Cython to create an adapter class that calls the C functions PyContext_Enter from __enter__ and PyContext_Exit from __exit__.

Edit: This won't work; see my comment below.

1st1 commented 1 month ago

I've left a comment in the PR -- I'm not sure this will jive well with async/await. Let's start by adding tests that would torture this idea to make sure the API is properly composable.

cjw296 commented 1 month ago

@1st1 - even if it doesn't, I'd still like to see this land: in my context, pardon the pun, I'm not using async at all.

I have to admit, I find the documentation around this for async use pretty confusing: it's not at all clear to me where the scope of the context begins and ends, there's mention of "Task", but there's nothing other than a some stuff the reader has to infer about there the scope of the effect of client_addr_var.set(addr).

Ignoring any implementation details, I feel a context manager should work in the async world, and I think the following would make it very clear where the scope of the context lies:


async def handle_request(reader, writer):
    with contextvars.copy_context():
        addr = writer.transport.get_extra_info('socket').getpeername()
        client_addr_var.set(addr)

        # In any code that we call is now possible to get
        # client's address by calling 'client_addr_var.get()'.

        while True:
            line = await reader.readline()
            print(line)
            if not line.strip():
                break
            writer.write(line)

        writer.write(render_goodbye())
        writer.close()
rhansen commented 1 month ago

In its current state, PR #99634 doesn't work with async/await. Specifically, the following hangs after printing some exceptions:

$ ./python -m asyncio
>>> import asyncio
>>> import contextvars
>>> async def foo():
...     with contextvars.copy_context():
...         await asyncio.sleep(0)
... 
>>> await foo()
Exception in callback <_asyncio.TaskStepMethWrapper object at 0x7a5c946ae800>()
handle: <Handle <_asyncio.TaskStepMethWrapper object at 0x7a5c946ae800>()>
Traceback (most recent call last):
  File "/home/rhansen/floss/cpython/Lib/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: cannot exit context: thread state references a different context object
Exception in callback <_asyncio.TaskStepMethWrapper object at 0x7a5c946ae850>()
handle: <Handle <_asyncio.TaskStepMethWrapper object at 0x7a5c946ae850>()>
Traceback (most recent call last):
  File "/home/rhansen/floss/cpython/Lib/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: cannot enter context: <_contextvars.Context object at 0x7a5c9469c890> is already entered

I spent some time trying to understand coroutine internals to figure out what's going wrong. From what I learned, the issue is that each coroutine "step" (the code executed between each await) is executed via contextvars.Context.run() (called from asyncio.Handle._run), which enters a context, runs the "step", then exits the context. Everything works fine if every coroutine step has balanced enter/exit pairs, which is always true with Context.run(). Unfortunately, it is not true when Context is used as a context manager and there is an await in the with statement suite.

Specifically, in the foo coroutine above, there are two coroutine steps:

  1. The step from coroutine start to the await. This step includes a Context enter but not a matching exit.
  2. The step from the await to the coroutine's return. This step has the Context exit.

Original generators suffer from a similar problem (the current context can change arbitrarily during yield).

It seems to me that Context should be a per-coroutine property that is restored when coroutine execution resumes, and not a per-thread property. Python currently behaves as if Context was a per-coroutine property thanks to the use of Context.run() for each coroutine step, but that is an implementation kludge that causes the balanced enter/exit limitation.

I am willing to code up a fix for this, but I don't know enough about Python's internals to feel comfortable diving in. Any direction would be appreciated. In particular, I'd like some pointers on what code I should touch to make Context a true per-coroutine/generator property rather than a per-thread property.

We could continue with the PR as-is except document that with suites containing await or yield statements are not yet supported. However, I don't think the PR is all that useful without await or yield because Context.run() with a closure is equivalent. So it's probably best to hold off on the PR until we can come up with a more complete solution.

rhansen commented 1 month ago

I've been studying the CPython code and I think the following approach would work:

Would this be a significant enough change to warrant a new PEP?

The shadowing-if-not-empty behavior is somewhat complicated, but preserves backwards compatibility. Right now, the current context as observed by a generator can change arbitrarily during a yield. (This is technically true for coroutines as well, but the asyncio loop forces a particular context each time it executes a coroutine step so developers don't have to worry about the context changing during an await. See the call to contextvars.Context.run in asyncio.Handle._run.) I suspect most developers do not like this current behavior, but it's possible that some code was written with it in mind.

Example:

import contextvars

cvar = contextvars.ContextVar('cvar', default='initial')

def make_generator():
    yield cvar.get()
    yield cvar.get()
    yield cvar.get()
    yield cvar.get()
    cvar.set('updated by generator')
    yield cvar.get()

gen = make_generator()
print('1.', next(gen))

def callback():
    cvar.set('updated by callback')
    print('2.', next(gen))

contextvars.copy_context().run(callback)
print('3.', next(gen))
cvar.set('updated at top level')
print('4.', next(gen))
print('5.', next(gen))
print('6.', cvar.get())

The above prints:

1. initial
2. updated by callback
3. initial
4. updated at top level
5. updated by generator
6. updated by generator

Now change make_generator() to:

def make_generator():
    with contextvars.copy_context():
        yield cvar.get()
        yield cvar.get()
        yield cvar.get()
        yield cvar.get()
        cvar.set('updated by generator')
        yield cvar.get()

With that change, the script would output:

1. initial
2. initial
3. initial
4. initial
5. updated by generator
6. updated at top level

Your thoughts @1st1?

cjw296 commented 1 month ago

@rhansen: just wanted to say thanks for doing the hard work on this!

rhansen commented 1 month ago

I believe this proposal conflicts with #119333. IIUC, that feature assumes that context enter and exit events are always balanced and that the current context can never change except by entering or exiting a context. This proposal would break those assumptions.

I think this can be resolved by changing the semantics of the events added by #119333 from "context entered" and "context about to exit" to "the current context is about to change" and "the current context has changed".

cc @fried

fried commented 1 month ago

My PR didn't change the semantics of PyContext_Enter and _Exit.

I need to know when the threadstate->context has been modified and that happens in those two functions. About to change is as useful imho.

rhansen commented 1 month ago

My PR didn't change the semantics of PyContext_Enter and _Exit.

I know, I'm just suggesting that we change:

typedef enum {
   Py_CONTEXT_EVENT_ENTER,
   Py_CONTEXT_EVENT_EXIT,
} PyContextEvent;

to something like:

typedef enum {
    // This event is emitted just before the "current context" switches to a
    // different context.  The context that is passed to the callback is the
    // still-current context.
    Py_CONTEXT_ABOUT_TO_CHANGE,
    // This event is emitted just after the "current context" has switched to a
    // different context.  The context that is passed to the callback is the
    // context that just became current.
    Py_CONTEXT_CHANGED,
} PyContextEvent;

Changing the events like this would make the PyContext_AddWatcher feature you added compatible with the changes proposed in this issue, and I think it would still work for your use case. Would this be agreeable to you, @fried?

gvanrossum commented 1 month ago

So are you just proposing new names, or also changes in when they are called?

rhansen commented 1 month ago

Also when they are called:

gvanrossum commented 1 month ago

So it would double the number of calls. But why?

rhansen commented 1 month ago

I'm not sure how much developers would care about Py_CONTEXT_ABOUT_TO_CHANGE. I wouldn't mind if that went away.

As for the why:

gvanrossum commented 1 month ago

Oh, then I will have to read this entire issue really carefully to understand the use case and the feasibility.

fried commented 1 month ago

I could accept moving to a single event type for my use case a Py_CONTEXT_EVENT_CHANGED that gets called right after we set the context in ts would be fine. I would have had to derive the event from the two but thought the two events would have more usability outside meta's use case.

rhansen commented 1 month ago

Great, thanks. I'll open a pull request (probably some time this weekend) to make it easier to discuss the details.

rhansen commented 1 month ago

I opened issue #124872 to replace the "context enter" and "context exit" events with a "context switched" event, and opened pull request #124776 to implement the change.

1st1 commented 1 month ago

@rhansen

I've been studying the CPython code and I think the following approach would work: Give each coroutine (including generators) its own context stack (possibly empty).

This was the idea I had in https://peps.python.org/pep-0550/ -- ultimately rejected for its complexity and runtime impact. I'd say it's unlikely that we can convince enough people to do that (and even I myself don't think we need that anymore). That said given that you've already spent a lot of time on this, I recommend you to read that PEP because it did have a complete reference implementation and all of its ideas were validated to work.

That said, I'm looking at this PR's motivation and I think we can add two "semi-private" low-level APIs that would mirror the C API and allow people to build context managers if they absolutely have to. I'd just add Context._enter() and Context._exit() Python methods (mirroring PyContext_Enter() andPyContextEnter()) and explicitly document their pitfalls (not composable with asyncio out of the box). Hopefully having a leading` will emphasize that this is a special API for certain edge cases.