Closed 1st1 closed 3 years ago
asyncio.Lock and other primitives should no longer accept the loop
parameter. They should also stop storing the current loop in the self._loop
attribute. Instead, they should use get_running_loop()
whenever they need to access the loop.
(or alternatively they can cache the running loop
, but the first loop lookup should be performed with asyncio.get_running_loop()
)
Kyle, lmk if you want to work on this.
This was on my to-do list but I very much appreciate if somebody champions this issue. I should finish sslproto PR first.
Sure, I would be interested in helping with this. Although if a newer contributor takes it up before I'm able to, I wouldn't be opposed to that either (my main priority at the moment is helping with PEP-594 since it's a concrete goal of my internship w/ the PSF, but I should have some time to work on this as well).
As far as I can tell though, there's currently a similar PR open: https://github.com/python/cpython/pull/18195 . This attempts to deprecate the loop argument and creating primitives such as asyncio.Lock outside of a running event loop with the following approach:
def __init__(self, *, loop=None):
self._waiters = None
self._locked = False
if loop is None:
self._loop = events._get_running_loop()
if self._loop is None:
warnings.warn("The creation of asyncio objects without a running "
"event loop is deprecated as of Python 3.9.",
DeprecationWarning, stacklevel=2)
self._loop = events.get_event_loop()
else:
warnings.warn("The loop argument is deprecated since Python 3.8, "
"and scheduled for removal in Python 3.10.",
DeprecationWarning, stacklevel=2)
So, do we want to add more strictness to that with always using get_running_loop()
to access the event loop each time instead of accessing self._loop, and effectively ignore the user-supplied one? Presumably, this would start with a warning for passing a loop arg and then be removed entirely as a parameter ~two versions later.
(or alternatively they can cache the running
loop
, but the first loop lookup should be performed withasyncio.get_running_loop()
)
AFAIK, at the C-API extension level, get_running_loop() already caches the running loop in cached_running_holder
. (https://github.com/python/cpython/blob/9c98e8cc3ebf56d01183c67adbc000ed19b8e0f4/Modules/_asynciomodule.c#L232). So from a performance perspective, wouldn't it effectively be the same if we repeatedly use get_running_loop()
to access the same event loop? I think it also adds a nice integrity check to be certain that the primitive wasn't initially created within a running event loop, and then later accessed outside of one.
The only concern that I can see with this approach is that users could potentially create a primitive in one running event loop and then access it in a separate loop running in a different thread (without using something like self._loop, the primitive would no longer be associated with a specific event loop and could potentially used within *any* running event loop). I'm not entirely certain if that is a real problem though, and if anything, it seems like it could prove to be useful in some multi-loop environment. I just want to make sure that it's intended.
Oh my. FWIW I think that we need to implement this differently. I don't think it matters where, say, an asyncio.Lock was instantiated. It can be created anywhere. So IMO its init shouldn't try to capture the current loop -- there's no need for that. The loop can be and should be captured when the first await lock.acquire()
is called.
I'm writing a piece of code right now that would need to jump through the hoops to simply create a new asyncio.Lock()
in a place where there's no asyncio loop yet.
Oh my. FWIW I think that we need to implement this differently. I don't think it matters where, say, an asyncio.Lock was instantiated. It can be created anywhere. So IMO its init shouldn't try to capture the current loop -- there's no need for that. The loop can be and should be captured when the first
await lock.acquire()
is called.
That's good to know and I think more convenient to work with, so +1 from me. I guess my remaining question though is whether it's okay to await lock.acquire()
on a single lock instance from multiple different running event loops (presumably each in their own separate threads) or if there's a need to restrict it to only one event loop.
I'm writing a piece of code right now that would need to jump through the hoops to simply create a new
asyncio.Lock()
in a place where there's no asyncio loop yet.
From what I've seen of asyncio user code, it seems reasonably common to create async primitives (lock, semaphore, queue, etc.) in the init for some class prior to using the event loop, which would fail with usage of get_running_loop()
in the init for the primitives. So, if it's not an issue to wait until accessing the event loop until it's actually needed (e.g. in the lock.acquire() or queue.get/put()), I think we should definitely try to be conscious about when we call get_running_loop()
going forward to ensure we're not imposing arbitrary inconveniences on users.
That's good to know and I think more convenient to work with, so +1 from me. I guess my remaining question though is whether it's okay to
await lock.acquire()
on a single lock instance from multiple different running event loops (presumably each in their own separate threads) or if there's a need to restrict it to only one event loop.
No, it's not OK to use one lock across multiple loops at the same time. But most asyncio code out there doesn't have protections against that, and we never advertised that having multiple loops run in parallel is a good idea. So while we could build protections against that, I'm not sure its needed.
Andrew, thoughts?
From what I've seen of asyncio user code, it seems reasonably common to create async primitives (lock, semaphore, queue, etc.) in the init for some class prior to using the event loop, which would fail with usage of
get_running_loop()
in the init for the primitives. So, if it's not an issue to wait until accessing the event loop until it's actually needed (e.g. in the lock.acquire() or queue.get/put()), I think we should definitely try to be conscious about when we callget_running_loop()
going forward to ensure we're not imposing arbitrary inconveniences on users.
Yep. This sums up how I think of this now.
My initial thought was protecting the Lock (and other primitives) creation when a loop is not running.
Yuri insists that Lock can be created without a loop. Technically it is possible, sure. But the lock is tightly coupled with a loop instance. In other words, the loop belongs to the loop. The lock cannot be used after the loop dies (stopped and closed).
Thus, the global scoped lock object still looks suspicious in my eyes. The lock's lifecycle should be closer than the lifecycle of the loop, isn't it? I know, asyncio.Lock() can safely live after the loop closing but should we encourage this style? There are many other asyncio objects like HTTP clients and DB drivers that should be closed before the loop finishing for graceful closing TCP transports etc.
Another thing to consider is: whether to cache a loop inside a lock; whether to add a check when the lock is used by two loops?
I think for the last question the answer is "yes". I recall many questions and bug reports on StackOverflow and aiohttp bug tracker when people use the multithreaded model for some reason and tries to get access to a shared object from different threads that executes each own loop.
The check becomes extremely important for synchronization primitives like asyncio.Lock class; threading.Lock is supposed to be shared between threads and users can apply the same pattern for asyncio.Lock by oversight. Also, supporting global asyncio.Lock instances increase the chance that the lock is used by different loops.
Yeah, I follow the reasoning.
My use case:
class Something:
def __init__(self):
self._lock = asyncio.Lock()
async def do_something():
async with self._lock:
...
And Something
won't be created in a coroutine. So now I have to jump through the hoops to implement lazy lock instantiation.
But the lock is tightly coupled with a loop instance. In other words, the loop belongs to the loop. The lock cannot be used after the loop dies (stopped and closed).
I agree. Maybe the solution should be this then:
class asyncio.Lock:
def _get_loop(self):
loop = asyncio.get_running_loop()
if self._loop is None:
self._loop = loop
if loop is not self._loop: raise
if not loop.is_running(): raise
async def acquire(self):
loop = self._get_loop()
...
This is what would guarantee all protections you want and would also allow to instantiate asyncio.Lock
in class constructors, simplifying code.
Despite the fact that asyncio.get_running_loop() never returns None but raises RuntimeError (and maybe other tiny cleanups), I can live with the proposal.
It doesn't make a system worse at least and backward compatible. We can return to the idea of raising a warning from the constructor later, on collecting more feedback.
P.S. There is a subtle non-deterministic behavior in the proposal: if the lock is accessed from a concurrent thread, the exception about wrong usage will be raised later at an arbitrary code point. This is a well-known problem of the lazy initialization pattern and maybe we should do nothing with it. The programming is a compromise always.
Despite the fact that asyncio.get_running_loop() never returns None but raises RuntimeError (and maybe other tiny cleanups),
It never raises when called from inside a coroutine. And I propose to do that. So it will never fail.
Regarding the example _get_loop():
def _get_loop(self):
loop = asyncio.get_running_loop()
if self._loop is None:
self._loop = loop
if loop is not self._loop: raise
if not loop.is_running(): raise
Would this be added to all asyncio primitives to be called anytime a reference to the loop is needed within a coroutine?
Also, regarding the last line "if not loop.is_running(): raise" I'm not 100% certain that I understand the purpose. Wouldn't it already raise a RuntimeError from asyncio.get_running_loop()
if the event loop wasn't running?
The only thing I can think of where it would have an effect is if somehow the event loop was running at the start of _get_loop()
and then the event loop was stopped (e.g. a loop in an alternative thread was stopped by the main thread while the alternative thread was in the middle of executing _get_loop()
). But to me, that seems to be enough of an edge case to simplify it to the following:
def _get_loop(self):
loop = asyncio.get_running_loop()
if self._loop is None:
self._loop = loop
if loop is not self._loop: raise
(Unless you intended for the first line loop = asyncio.get_running_loop()
to instead be using the private asyncio._get_running_loop()
, which returns None and doesn't raise. In that case, the original would be good to me.)
Other than that, I think the approach seems solid.
Perhaps Kyle is right, I had a misunderstanding with get_running_loop()
vs _get_running_loop()
.
The last version seems good except for the rare chance of race condition.
The safe code can look like:
global_lock = threading.Lock() like GIL
def _get_loop(self):
loop = asyncio.get_running_loop()
if self._loop is None:
# the lock is required because
# the thread switch can happen
# between `self._loop is None` check
# and `self._loop = loop` assignment
with global_lock:
if self._loop is not None:
self._loop = loop
if loop is not self._loop: raise
The alternative is using the fast C atomic compare_and_swap
function
which is executed under the hold GIL.
We need the pure-Python fallback anyway.
Multithreading is hard...
The safe code can look like:
global_lock = threading.Lock() like GIL
SGTM. We can use this strategy for all synchronization primitives and for objects like asyncio.Queue.
Perfect!
We have a consensus now and waiting for a champion who propose a Pull Request.
New changeset 0ec34cab9dd4a7bcddafaeeb445fae0f26afcdd1 by Yurii Karabas in branch 'master': bpo-42392: Remove loop parameter form asyncio locks and Queue (bpo-23420) https://github.com/python/cpython/commit/0ec34cab9dd4a7bcddafaeeb445fae0f26afcdd1
Sorry, there are a few things in the committed patch that should be fixed. See the PR for my comments.
Is there anyone who is assigned to removing the deprecated loop
parameter from asyncio
codebase?
If not I can take this task, I believe I have enough free time and curiosity to do that :)
Is there anyone who is assigned to removing the deprecated
loop
parameter fromasyncio
codebase?If not I can take this task, I believe I have enough free time and curiosity to do that :)
You can certainly feel free to work on that and it would definitely be appreciated! However, I would recommend splitting it into several PRs, basically as "Remove *loop* parameter from x` rather than doing a massive PR that removes it from everywhere that it was deprecated. This makes the review process easier.
Also, keep in mind that there are some uses of loop that are still perfectly valid and will remain so, such as in asyncio.run_coroutine_threadsafe()
. It should only be removed in locations where there was a deprecation warning from 3.8 or sooner present (as we give two major versions of deprecation notice before most breaking changes are made -- this has been made official policy as of PEP-387).
Should I create a separate issue for every PR or they all can be done in the scope of this PR (we can update issue title to match what was done)?
New changeset b9127dd6eedd693cfd716a4444648864e2e00186 by Yurii Karabas in branch 'master': bpo-42392: Improve removal of *loop* parameter in asyncio primitives (GH-23499) https://github.com/python/cpython/commit/b9127dd6eedd693cfd716a4444648864e2e00186
Should I create a separate issue for every PR or they all can be done in the scope of this PR (we can update issue title to match what was done)?
Up to you, I don't think it really matters.
New changeset f533cb80cbbb7acdf9ce1978cfba095ce5eeedaa by Yurii Karabas in branch 'master': bpo-42392: Remove loop parameter from asyncio.streams (GH-23517) https://github.com/python/cpython/commit/f533cb80cbbb7acdf9ce1978cfba095ce5eeedaa
A few functions in tasks.py
a left and documentation should be updated.
I would attach simple PRs based on the discussion here to this issue. If a particular change needs additional and particular discussion, I would open a separate PR, announce it here, and list it in the Dependencies box. Then this generic issue, 'Remove ... in all...' cannot be closed until the separate specialized issues are.
New changeset e4fe303b8cca525e97d44e80c7e53bdab9dd9187 by Yurii Karabas in branch 'master': bpo-42392: Remove loop parameter from asyncio.tasks and asyncio.subprocess (GH-23521) https://github.com/python/cpython/commit/e4fe303b8cca525e97d44e80c7e53bdab9dd9187
New changeset 86150d39c888579b65841f4391d054b7b3eff9f2 by Yurii Karabas in branch 'master': bpo-42392: Remove deprecated loop parameter from docs (GH-23552) https://github.com/python/cpython/commit/86150d39c888579b65841f4391d054b7b3eff9f2
Looks like we have done everything, we can close this issue
Thanks for your help!
New changeset dcea78ff53d02733ac5983255610b651aa1c0034 by Ken Jin in branch 'master': bpo-42392: Mention loop removal in whatsnew for 3.10 (GH-24256) https://github.com/python/cpython/commit/dcea78ff53d02733ac5983255610b651aa1c0034
There appear to be no versionchanged:: 3.10 in the asyncio docs on the APIs that formerly accepted a loop= parameter linking people to information on when that went away (3.10) and why. Specifically I'm talking about https://docs.python.org/3.10/library/asyncio-stream.html.
The asyncio stack traces people will face when porting code to 3.10 are mystifying (they may not even show use of a loop parameter) when this comes up, so we should really leave more bread crumbs than expecting people to find the What's New doc.
...
E server = event_loop.run_until_complete(coro)
E File "/opt/hostedtoolcache/Python/3.10.0-beta.1/x64/lib/python3.10/asyncio/base_events.py", line 641, in run_until_complete
E return future.result()
E File "/opt/hostedtoolcache/Python/3.10.0-beta.1/x64/lib/python3.10/asyncio/streams.py", line 113, in start_unix_server
E return await loop.create_unix_server(factory, path, **kwds)
E TypeError: _UnixSelectorEventLoop.create_unix_server() got an unexpected keyword argument 'loop'
Arguably something similar to that whatsnew text should've been added to the docs in 3.8 with the loop deprecation. Something like this?
.. versionchanged:: 3.7
This function now implicitly gets the
current thread's running event loop.
.. versionchanged:: 3.10
That `loop` parameter has been removed.
including a ReST link to more info in the whats new doc on the last entry would be useful.
There appear to be no versionchanged:: 3.10 in the asyncio docs on the APIs that formerly accepted a loop= parameter
Sorry, I missed that. Working on it.
New changeset d8fd8c8568cbc2f53c1abeda3596a89a46f0e3d7 by Ken Jin in branch 'main': bpo-42392: [docs] Add deprecated-removed loop labels for asyncio (GH-26357) https://github.com/python/cpython/commit/d8fd8c8568cbc2f53c1abeda3596a89a46f0e3d7
New changeset 150a8e8a3edbbed12b98c8f22e2972cd47fd3ba5 by Miss Islington (bot) in branch '3.10': [3.10] bpo-42392: [docs] Add deprecated-removed loop labels for asyncio (GH-26357) (GH-26390) https://github.com/python/cpython/commit/150a8e8a3edbbed12b98c8f22e2972cd47fd3ba5
thanks!
@asvetlov
The alternative is using the fast C atomic
compare_and_swap
function which is executed under the hold GIL. We need the pure-Python fallback anyway.
I was checking some minor detail in socketpair and noticed this: https://github.com/python/cpython/blob/bceb197947bbaebb11e01195bdce4f240fdf9332/Lib/socketserver.py#L700
and it looks like it benchmarks a 100ns faster than a global lock:
#!/usr/bin/env python3
import pyperf
from asyncio import events
"""Event loop mixins."""
import threading
_global_lock = threading.Lock()
# Used as a sentinel for loop parameter
_marker = object()
class _LoopBoundMixin:
_loop = None
def __init__(self, *, loop=_marker):
if loop is not _marker:
raise TypeError(
f'As of 3.10, the *loop* parameter was removed from '
f'{type(self).__name__}() since it is no longer necessary'
)
def _get_loop(self):
loop = events._get_running_loop()
if self._loop is None:
with _global_lock:
if self._loop is None:
self._loop = loop
if loop is not self._loop:
raise RuntimeError(f'{self!r} is bound to a different event loop')
return loop
class _LoopBoundMixinSetDefault:
_loop = None
def __init__(self, *, loop=_marker):
if loop is not _marker:
raise TypeError(
f'As of 3.10, the *loop* parameter was removed from '
f'{type(self).__name__}() since it is no longer necessary'
)
def _get_loop(self):
loop = events._get_running_loop()
if self._loop is None:
self.__dict__.setdefault("_loop", loop)
if self._loop is not loop:
raise RuntimeError(f'{self!r} is bound to a different event loop')
return loop
runner = pyperf.Runner()
runner.timeit(name="get loop with lock",
stmt="lbm = _LoopBoundMixin(); lbm._get_loop(); lbm._get_loop()",
setup="import asyncio; from __main__ import _LoopBoundMixin; asyncio._set_running_loop(asyncio.new_event_loop())")
runner.timeit(name="get loop with setdefault",
stmt="lbm = _LoopBoundMixin(); lbm._get_loop(); lbm._get_loop()",
setup="import asyncio; from __main__ import _LoopBoundMixinSetDefault as _LoopBoundMixin; asyncio._set_running_loop(asyncio.new_event_loop())")
runner.timeit(name="get loop already set with lock",
stmt="lbm._get_loop()",
setup="import asyncio; from __main__ import _LoopBoundMixin; asyncio._set_running_loop(asyncio.new_event_loop()); lbm = _LoopBoundMixin(); lbm._get_loop()")
runner.timeit(name="get loop already set with setdefault",
stmt="lbm._get_loop()",
setup="import asyncio; from __main__ import _LoopBoundMixinSetDefault as _LoopBoundMixin; asyncio._set_running_loop(asyncio.new_event_loop()); lbm = _LoopBoundMixin(); lbm._get_loop()")
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at =
created_at =
labels = ['type-feature', 'docs', '3.10', 'expert-asyncio']
title = "[document removal of] the deprecated 'loop' parameter asyncio API in 3.10"
updated_at =
user = 'https://github.com/1st1'
```
bugs.python.org fields:
```python
activity =
actor = 'gregory.p.smith'
assignee = 'docs@python'
closed = True
closed_date =
closer = 'gregory.p.smith'
components = ['Documentation', 'asyncio']
creation =
creator = 'yselivanov'
dependencies = []
files = []
hgrepos = []
issue_num = 42392
keywords = ['patch']
message_count = 36.0
messages = ['381275', '381276', '381277', '381278', '381304', '381305', '381306', '381310', '381329', '381370', '381372', '381382', '381385', '381389', '381451', '381456', '381766', '381771', '381776', '381780', '381794', '381822', '381823', '381882', '381883', '381979', '381985', '382060', '382079', '382133', '385383', '394270', '394345', '394474', '394477', '394480']
nosy_count = 9.0
nosy_names = ['terry.reedy', 'gregory.p.smith', 'asvetlov', 'docs@python', 'yselivanov', 'miss-islington', 'aeros', 'uriyyo', 'kj']
pr_nums = ['23420', '23499', '23517', '23521', '23552', '24256', '26357', '26390']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue42392'
versions = ['Python 3.10']
```