python / cpython

The Python programming language
https://www.python.org
Other
63.35k stars 30.34k forks source link

[document removal of] the deprecated 'loop' parameter asyncio API in 3.10 #86558

Closed 1st1 closed 3 years ago

1st1 commented 3 years ago
BPO 42392
Nosy @terryjreedy, @gpshead, @asvetlov, @1st1, @miss-islington, @aeros, @uriyyo, @Fidget-Spinner
PRs
  • python/cpython#23420
  • python/cpython#23499
  • python/cpython#23517
  • python/cpython#23521
  • python/cpython#23552
  • python/cpython#24256
  • python/cpython#26357
  • python/cpython#26390
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-feature', 'docs', '3.10', 'expert-asyncio'] title = "[document removal of] the deprecated 'loop' parameter asyncio API in 3.10" updated_at = user = 'https://github.com/1st1' ``` bugs.python.org fields: ```python activity = actor = 'gregory.p.smith' assignee = 'docs@python' closed = True closed_date = closer = 'gregory.p.smith' components = ['Documentation', 'asyncio'] creation = creator = 'yselivanov' dependencies = [] files = [] hgrepos = [] issue_num = 42392 keywords = ['patch'] message_count = 36.0 messages = ['381275', '381276', '381277', '381278', '381304', '381305', '381306', '381310', '381329', '381370', '381372', '381382', '381385', '381389', '381451', '381456', '381766', '381771', '381776', '381780', '381794', '381822', '381823', '381882', '381883', '381979', '381985', '382060', '382079', '382133', '385383', '394270', '394345', '394474', '394477', '394480'] nosy_count = 9.0 nosy_names = ['terry.reedy', 'gregory.p.smith', 'asvetlov', 'docs@python', 'yselivanov', 'miss-islington', 'aeros', 'uriyyo', 'kj'] pr_nums = ['23420', '23499', '23517', '23521', '23552', '24256', '26357', '26390'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue42392' versions = ['Python 3.10'] ```

    1st1 commented 3 years ago

    asyncio.Lock and other primitives should no longer accept the loop parameter. They should also stop storing the current loop in the self._loop attribute. Instead, they should use get_running_loop() whenever they need to access the loop.

    1st1 commented 3 years ago

    (or alternatively they can cache the running loop, but the first loop lookup should be performed with asyncio.get_running_loop())

    1st1 commented 3 years ago

    Kyle, lmk if you want to work on this.

    asvetlov commented 3 years ago

    This was on my to-do list but I very much appreciate if somebody champions this issue. I should finish sslproto PR first.

    aeros commented 3 years ago

    Sure, I would be interested in helping with this. Although if a newer contributor takes it up before I'm able to, I wouldn't be opposed to that either (my main priority at the moment is helping with PEP-594 since it's a concrete goal of my internship w/ the PSF, but I should have some time to work on this as well).

    As far as I can tell though, there's currently a similar PR open: https://github.com/python/cpython/pull/18195 . This attempts to deprecate the loop argument and creating primitives such as asyncio.Lock outside of a running event loop with the following approach:

    def __init__(self, *, loop=None):
            self._waiters = None
            self._locked = False
            if loop is None:
                self._loop = events._get_running_loop()
                if self._loop is None:
                    warnings.warn("The creation of asyncio objects without a running "
                                  "event loop is deprecated as of Python 3.9.",
                                  DeprecationWarning, stacklevel=2)
                    self._loop = events.get_event_loop()
            else:
                    warnings.warn("The loop argument is deprecated since Python 3.8, "
                              "and scheduled for removal in Python 3.10.",
                              DeprecationWarning, stacklevel=2) 

    So, do we want to add more strictness to that with always using get_running_loop() to access the event loop each time instead of accessing self._loop, and effectively ignore the user-supplied one? Presumably, this would start with a warning for passing a loop arg and then be removed entirely as a parameter ~two versions later.

    (or alternatively they can cache the running loop, but the first loop lookup should be performed with asyncio.get_running_loop())

    AFAIK, at the C-API extension level, get_running_loop() already caches the running loop in cached_running_holder. (https://github.com/python/cpython/blob/9c98e8cc3ebf56d01183c67adbc000ed19b8e0f4/Modules/_asynciomodule.c#L232). So from a performance perspective, wouldn't it effectively be the same if we repeatedly use get_running_loop() to access the same event loop? I think it also adds a nice integrity check to be certain that the primitive wasn't initially created within a running event loop, and then later accessed outside of one.

    The only concern that I can see with this approach is that users could potentially create a primitive in one running event loop and then access it in a separate loop running in a different thread (without using something like self._loop, the primitive would no longer be associated with a specific event loop and could potentially used within *any* running event loop). I'm not entirely certain if that is a real problem though, and if anything, it seems like it could prove to be useful in some multi-loop environment. I just want to make sure that it's intended.

    1st1 commented 3 years ago

    Oh my. FWIW I think that we need to implement this differently. I don't think it matters where, say, an asyncio.Lock was instantiated. It can be created anywhere. So IMO its init shouldn't try to capture the current loop -- there's no need for that. The loop can be and should be captured when the first await lock.acquire() is called.

    I'm writing a piece of code right now that would need to jump through the hoops to simply create a new asyncio.Lock() in a place where there's no asyncio loop yet.

    aeros commented 3 years ago

    Oh my. FWIW I think that we need to implement this differently. I don't think it matters where, say, an asyncio.Lock was instantiated. It can be created anywhere. So IMO its init shouldn't try to capture the current loop -- there's no need for that. The loop can be and should be captured when the first await lock.acquire() is called.

    That's good to know and I think more convenient to work with, so +1 from me. I guess my remaining question though is whether it's okay to await lock.acquire() on a single lock instance from multiple different running event loops (presumably each in their own separate threads) or if there's a need to restrict it to only one event loop.

    I'm writing a piece of code right now that would need to jump through the hoops to simply create a new asyncio.Lock() in a place where there's no asyncio loop yet.

    From what I've seen of asyncio user code, it seems reasonably common to create async primitives (lock, semaphore, queue, etc.) in the init for some class prior to using the event loop, which would fail with usage of get_running_loop() in the init for the primitives. So, if it's not an issue to wait until accessing the event loop until it's actually needed (e.g. in the lock.acquire() or queue.get/put()), I think we should definitely try to be conscious about when we call get_running_loop() going forward to ensure we're not imposing arbitrary inconveniences on users.

    1st1 commented 3 years ago

    That's good to know and I think more convenient to work with, so +1 from me. I guess my remaining question though is whether it's okay to await lock.acquire() on a single lock instance from multiple different running event loops (presumably each in their own separate threads) or if there's a need to restrict it to only one event loop.

    No, it's not OK to use one lock across multiple loops at the same time. But most asyncio code out there doesn't have protections against that, and we never advertised that having multiple loops run in parallel is a good idea. So while we could build protections against that, I'm not sure its needed.

    Andrew, thoughts?

    From what I've seen of asyncio user code, it seems reasonably common to create async primitives (lock, semaphore, queue, etc.) in the init for some class prior to using the event loop, which would fail with usage of get_running_loop() in the init for the primitives. So, if it's not an issue to wait until accessing the event loop until it's actually needed (e.g. in the lock.acquire() or queue.get/put()), I think we should definitely try to be conscious about when we call get_running_loop() going forward to ensure we're not imposing arbitrary inconveniences on users.

    Yep. This sums up how I think of this now.

    asvetlov commented 3 years ago

    My initial thought was protecting the Lock (and other primitives) creation when a loop is not running.

    Yuri insists that Lock can be created without a loop. Technically it is possible, sure. But the lock is tightly coupled with a loop instance. In other words, the loop belongs to the loop. The lock cannot be used after the loop dies (stopped and closed).

    Thus, the global scoped lock object still looks suspicious in my eyes. The lock's lifecycle should be closer than the lifecycle of the loop, isn't it? I know, asyncio.Lock() can safely live after the loop closing but should we encourage this style? There are many other asyncio objects like HTTP clients and DB drivers that should be closed before the loop finishing for graceful closing TCP transports etc.

    Another thing to consider is: whether to cache a loop inside a lock; whether to add a check when the lock is used by two loops?

    I think for the last question the answer is "yes". I recall many questions and bug reports on StackOverflow and aiohttp bug tracker when people use the multithreaded model for some reason and tries to get access to a shared object from different threads that executes each own loop.

    The check becomes extremely important for synchronization primitives like asyncio.Lock class; threading.Lock is supposed to be shared between threads and users can apply the same pattern for asyncio.Lock by oversight. Also, supporting global asyncio.Lock instances increase the chance that the lock is used by different loops.

    1st1 commented 3 years ago

    Yeah, I follow the reasoning.

    My use case:

       class Something:
          def __init__(self):
              self._lock = asyncio.Lock()
    
          async def do_something():
              async with self._lock:
                  ...

    And Something won't be created in a coroutine. So now I have to jump through the hoops to implement lazy lock instantiation.

    But the lock is tightly coupled with a loop instance. In other words, the loop belongs to the loop. The lock cannot be used after the loop dies (stopped and closed).

    I agree. Maybe the solution should be this then:

    class asyncio.Lock:
           def _get_loop(self):
               loop = asyncio.get_running_loop()
               if self._loop is None:
                    self._loop = loop
               if loop is not self._loop: raise
               if not loop.is_running(): raise
    
           async def acquire(self):
               loop = self._get_loop()
               ...

    This is what would guarantee all protections you want and would also allow to instantiate asyncio.Lock in class constructors, simplifying code.

    asvetlov commented 3 years ago

    Despite the fact that asyncio.get_running_loop() never returns None but raises RuntimeError (and maybe other tiny cleanups), I can live with the proposal.

    It doesn't make a system worse at least and backward compatible. We can return to the idea of raising a warning from the constructor later, on collecting more feedback.

    P.S. There is a subtle non-deterministic behavior in the proposal: if the lock is accessed from a concurrent thread, the exception about wrong usage will be raised later at an arbitrary code point. This is a well-known problem of the lazy initialization pattern and maybe we should do nothing with it. The programming is a compromise always.

    1st1 commented 3 years ago

    Despite the fact that asyncio.get_running_loop() never returns None but raises RuntimeError (and maybe other tiny cleanups),

    It never raises when called from inside a coroutine. And I propose to do that. So it will never fail.

    aeros commented 3 years ago

    Regarding the example _get_loop():

           def _get_loop(self):
               loop = asyncio.get_running_loop()
               if self._loop is None:
                    self._loop = loop
               if loop is not self._loop: raise
               if not loop.is_running(): raise

    Would this be added to all asyncio primitives to be called anytime a reference to the loop is needed within a coroutine?

    Also, regarding the last line "if not loop.is_running(): raise" I'm not 100% certain that I understand the purpose. Wouldn't it already raise a RuntimeError from asyncio.get_running_loop() if the event loop wasn't running?

    The only thing I can think of where it would have an effect is if somehow the event loop was running at the start of _get_loop() and then the event loop was stopped (e.g. a loop in an alternative thread was stopped by the main thread while the alternative thread was in the middle of executing _get_loop()). But to me, that seems to be enough of an edge case to simplify it to the following:

           def _get_loop(self):
               loop = asyncio.get_running_loop()
               if self._loop is None:
                    self._loop = loop
               if loop is not self._loop: raise

    (Unless you intended for the first line loop = asyncio.get_running_loop() to instead be using the private asyncio._get_running_loop(), which returns None and doesn't raise. In that case, the original would be good to me.)

    Other than that, I think the approach seems solid.

    asvetlov commented 3 years ago

    Perhaps Kyle is right, I had a misunderstanding with get_running_loop() vs _get_running_loop().

    The last version seems good except for the rare chance of race condition.

    The safe code can look like:

    global_lock = threading.Lock()  like GIL
    
           def _get_loop(self):
               loop = asyncio.get_running_loop()
               if self._loop is None:
                    # the lock is required because
                    # the thread switch can happen
                    # between `self._loop is None` check
                    # and `self._loop = loop` assignment
                    with global_lock:
                        if self._loop is not None:
                            self._loop = loop
               if loop is not self._loop: raise

    The alternative is using the fast C atomic compare_and_swap function which is executed under the hold GIL. We need the pure-Python fallback anyway.

    Multithreading is hard...

    1st1 commented 3 years ago

    The safe code can look like:

    global_lock = threading.Lock() like GIL

    SGTM. We can use this strategy for all synchronization primitives and for objects like asyncio.Queue.

    asvetlov commented 3 years ago

    Perfect!

    We have a consensus now and waiting for a champion who propose a Pull Request.

    asvetlov commented 3 years ago

    New changeset 0ec34cab9dd4a7bcddafaeeb445fae0f26afcdd1 by Yurii Karabas in branch 'master': bpo-42392: Remove loop parameter form asyncio locks and Queue (bpo-23420) https://github.com/python/cpython/commit/0ec34cab9dd4a7bcddafaeeb445fae0f26afcdd1

    1st1 commented 3 years ago

    Sorry, there are a few things in the committed patch that should be fixed. See the PR for my comments.

    uriyyo commented 3 years ago

    Is there anyone who is assigned to removing the deprecated loop parameter from asyncio codebase?

    If not I can take this task, I believe I have enough free time and curiosity to do that :)

    aeros commented 3 years ago

    Is there anyone who is assigned to removing the deprecated loop parameter from asyncio codebase?

    If not I can take this task, I believe I have enough free time and curiosity to do that :)

    You can certainly feel free to work on that and it would definitely be appreciated! However, I would recommend splitting it into several PRs, basically as "Remove *loop* parameter from x` rather than doing a massive PR that removes it from everywhere that it was deprecated. This makes the review process easier.

    Also, keep in mind that there are some uses of loop that are still perfectly valid and will remain so, such as in asyncio.run_coroutine_threadsafe(). It should only be removed in locations where there was a deprecation warning from 3.8 or sooner present (as we give two major versions of deprecation notice before most breaking changes are made -- this has been made official policy as of PEP-387).

    uriyyo commented 3 years ago

    Should I create a separate issue for every PR or they all can be done in the scope of this PR (we can update issue title to match what was done)?

    aeros commented 3 years ago

    New changeset b9127dd6eedd693cfd716a4444648864e2e00186 by Yurii Karabas in branch 'master': bpo-42392: Improve removal of *loop* parameter in asyncio primitives (GH-23499) https://github.com/python/cpython/commit/b9127dd6eedd693cfd716a4444648864e2e00186

    asvetlov commented 3 years ago

    Should I create a separate issue for every PR or they all can be done in the scope of this PR (we can update issue title to match what was done)?

    Up to you, I don't think it really matters.

    asvetlov commented 3 years ago

    New changeset f533cb80cbbb7acdf9ce1978cfba095ce5eeedaa by Yurii Karabas in branch 'master': bpo-42392: Remove loop parameter from asyncio.streams (GH-23517) https://github.com/python/cpython/commit/f533cb80cbbb7acdf9ce1978cfba095ce5eeedaa

    asvetlov commented 3 years ago

    A few functions in tasks.py a left and documentation should be updated.

    terryjreedy commented 3 years ago

    I would attach simple PRs based on the discussion here to this issue. If a particular change needs additional and particular discussion, I would open a separate PR, announce it here, and list it in the Dependencies box. Then this generic issue, 'Remove ... in all...' cannot be closed until the separate specialized issues are.

    asvetlov commented 3 years ago

    New changeset e4fe303b8cca525e97d44e80c7e53bdab9dd9187 by Yurii Karabas in branch 'master': bpo-42392: Remove loop parameter from asyncio.tasks and asyncio.subprocess (GH-23521) https://github.com/python/cpython/commit/e4fe303b8cca525e97d44e80c7e53bdab9dd9187

    asvetlov commented 3 years ago

    New changeset 86150d39c888579b65841f4391d054b7b3eff9f2 by Yurii Karabas in branch 'master': bpo-42392: Remove deprecated loop parameter from docs (GH-23552) https://github.com/python/cpython/commit/86150d39c888579b65841f4391d054b7b3eff9f2

    uriyyo commented 3 years ago

    Looks like we have done everything, we can close this issue

    asvetlov commented 3 years ago

    Thanks for your help!

    miss-islington commented 3 years ago

    New changeset dcea78ff53d02733ac5983255610b651aa1c0034 by Ken Jin in branch 'master': bpo-42392: Mention loop removal in whatsnew for 3.10 (GH-24256) https://github.com/python/cpython/commit/dcea78ff53d02733ac5983255610b651aa1c0034

    gpshead commented 3 years ago

    There appear to be no versionchanged:: 3.10 in the asyncio docs on the APIs that formerly accepted a loop= parameter linking people to information on when that went away (3.10) and why. Specifically I'm talking about https://docs.python.org/3.10/library/asyncio-stream.html.

    The asyncio stack traces people will face when porting code to 3.10 are mystifying (they may not even show use of a loop parameter) when this comes up, so we should really leave more bread crumbs than expecting people to find the What's New doc.

    ...
    E    server = event_loop.run_until_complete(coro)
    E  File "/opt/hostedtoolcache/Python/3.10.0-beta.1/x64/lib/python3.10/asyncio/base_events.py", line 641, in run_until_complete
    E    return future.result()
    E  File "/opt/hostedtoolcache/Python/3.10.0-beta.1/x64/lib/python3.10/asyncio/streams.py", line 113, in start_unix_server
    E    return await loop.create_unix_server(factory, path, **kwds)
    E  TypeError: _UnixSelectorEventLoop.create_unix_server() got an unexpected keyword argument 'loop'

    Arguably something similar to that whatsnew text should've been added to the docs in 3.8 with the loop deprecation. Something like this?

    .. versionchanged:: 3.7
    
       This function now implicitly gets the
       current thread's running event loop.
    
    .. versionchanged:: 3.10
       That `loop` parameter has been removed.

    including a ReST link to more info in the whats new doc on the last entry would be useful.

    Fidget-Spinner commented 3 years ago

    There appear to be no versionchanged:: 3.10 in the asyncio docs on the APIs that formerly accepted a loop= parameter

    Sorry, I missed that. Working on it.

    gpshead commented 3 years ago

    New changeset d8fd8c8568cbc2f53c1abeda3596a89a46f0e3d7 by Ken Jin in branch 'main': bpo-42392: [docs] Add deprecated-removed loop labels for asyncio (GH-26357) https://github.com/python/cpython/commit/d8fd8c8568cbc2f53c1abeda3596a89a46f0e3d7

    miss-islington commented 3 years ago

    New changeset 150a8e8a3edbbed12b98c8f22e2972cd47fd3ba5 by Miss Islington (bot) in branch '3.10': [3.10] bpo-42392: [docs] Add deprecated-removed loop labels for asyncio (GH-26357) (GH-26390) https://github.com/python/cpython/commit/150a8e8a3edbbed12b98c8f22e2972cd47fd3ba5

    gpshead commented 3 years ago

    thanks!

    graingert commented 2 years ago

    @asvetlov

    The alternative is using the fast C atomic compare_and_swap function which is executed under the hold GIL. We need the pure-Python fallback anyway.

    I was checking some minor detail in socketpair and noticed this: https://github.com/python/cpython/blob/bceb197947bbaebb11e01195bdce4f240fdf9332/Lib/socketserver.py#L700

    and it looks like it benchmarks a 100ns faster than a global lock:

    #!/usr/bin/env python3
    import pyperf
    from asyncio import events
    
    """Event loop mixins."""
    
    import threading
    
    _global_lock = threading.Lock()
    
    # Used as a sentinel for loop parameter
    _marker = object()
    
    class _LoopBoundMixin:
        _loop = None
    
        def __init__(self, *, loop=_marker):
            if loop is not _marker:
                raise TypeError(
                    f'As of 3.10, the *loop* parameter was removed from '
                    f'{type(self).__name__}() since it is no longer necessary'
                )
    
        def _get_loop(self):
            loop = events._get_running_loop()
    
            if self._loop is None:
                with _global_lock:
                    if self._loop is None:
                        self._loop = loop
            if loop is not self._loop:
                raise RuntimeError(f'{self!r} is bound to a different event loop')
            return loop
    
    class _LoopBoundMixinSetDefault:
        _loop = None
    
        def __init__(self, *, loop=_marker):
            if loop is not _marker:
                raise TypeError(
                    f'As of 3.10, the *loop* parameter was removed from '
                    f'{type(self).__name__}() since it is no longer necessary'
                )
    
        def _get_loop(self):
            loop = events._get_running_loop()
            if self._loop is None:
                self.__dict__.setdefault("_loop", loop)
    
            if self._loop is not loop:
                raise RuntimeError(f'{self!r} is bound to a different event loop')
    
            return loop
    
    runner = pyperf.Runner()
    runner.timeit(name="get loop with lock",
                  stmt="lbm = _LoopBoundMixin(); lbm._get_loop(); lbm._get_loop()",
                  setup="import asyncio; from __main__ import _LoopBoundMixin; asyncio._set_running_loop(asyncio.new_event_loop())")
    
    runner.timeit(name="get loop with setdefault",
                  stmt="lbm = _LoopBoundMixin(); lbm._get_loop(); lbm._get_loop()",
                  setup="import asyncio; from __main__ import _LoopBoundMixinSetDefault as _LoopBoundMixin; asyncio._set_running_loop(asyncio.new_event_loop())")
    
    runner.timeit(name="get loop already set with lock",
                  stmt="lbm._get_loop()",
                  setup="import asyncio; from __main__ import _LoopBoundMixin; asyncio._set_running_loop(asyncio.new_event_loop()); lbm = _LoopBoundMixin(); lbm._get_loop()")
    
    runner.timeit(name="get loop already set with setdefault",
                  stmt="lbm._get_loop()",
                  setup="import asyncio; from __main__ import _LoopBoundMixinSetDefault as _LoopBoundMixin; asyncio._set_running_loop(asyncio.new_event_loop()); lbm = _LoopBoundMixin(); lbm._get_loop()")