Closed 1st1 closed 7 years ago
Looks good!
OK, nm on the mock.patch() thing then.
@gvanrossum I've updated the PR.
What about my suggestion (in the previous PR, sorry) of using "active" or "running" instead of "current"? (I am coming back to this because in my head I keep missing the distinction between get_event_loop() and get_current_loop(). :-)
I like _get_running_loop
! "Running" is less ambiguous than "current" and "active".
Make it so!
Make it so!
Done :) Please see the latest patch.
Looks good, better than the previous PR.
I have a question though. Why does run_forever
store the old loop? It seems like this allows to run a loop inside another one. In the previous PR, setting a loop when another one is already running raises a RuntimeError('A loop is already running')
. Would that be doable here?
@vxgmichel
I have a question though. Why does run_forever store the old loop? It seems like this allows to run a loop inside another one. In the previous PR, setting a loop when another one is already running raises a RuntimeError('A loop is already running'). Would that be doable here?
The more I think about it the more I agree. This would be one scenario where we have a chance to detect accidental blocking inside a coroutine or callback. @1st1 what do you think?
Agree. Stackoverflow has several examples where authors tries to hide high level api under loop.run_until_complete()
.
Explicit forbidding running another loop from coroutine is great idea.
(off-topic) We should have examples of a better recommended practice. IMO such high level APIs should fork a background thread to run the event loop and use concurrent.futures.Future instances to schedule calls and be able to wait for them.
@gvanrossum
IMO such high level APIs should fork a background thread to run the event loop and use concurrent.futures.Future instances to schedule calls and be able to wait for them.
I have an asyncio executor example that does pretty much that. I don't think it would fit in the official documentation, but maybe on asyncio-doc?
The more I think about it the more I agree. This would be one scenario where we have a chance to detect accidental blocking inside a coroutine or callback. @1st1 what do you think?
Yeah, let's do it. I'll update the PR.
Updated.
LGTM! Go ahead and merge.
Thank you for reviews @gvanrossum @vxgmichel @asvetlov!
PS. Why did the final squashed commit claim failing tests? I didn't see a Travis CI link.
Short answer: race between Travis & GH (at least that's what it looks like).
Long answer: I squashed everything and force pushed it to the PR. When I merged it to the master
branch. The PR was automatically closed by GH and it then aborted the builds (that were just started that time). Travis aborted the build marking it as a fail.
The master looks green: https://travis-ci.org/python/asyncio/builds/173340363
Oh, your workflow is so different from mine... I use the web squash flow.
--Guido (mobile)
this broke being able to use asyncio from forked processes created from a run loop :( testcase:
import asyncio
import multiprocessing
def sub_proc():
async def doit():
print("hello")
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(doit())
async def main():
proc = multiprocessing.Process(target=sub_proc)
proc.start()
proc.join()
if __name__ == '__main__':
_loop = asyncio.get_event_loop()
_loop.run_until_complete(main())
results in:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/local/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/tmp/test.py", line 12, in sub_proc
loop.run_until_complete(doit())
File "/usr/local/lib/python3.5/asyncio/base_events.py", line 454, in run_until_complete
self.run_forever()
File "/usr/local/lib/python3.5/asyncio/base_events.py", line 411, in run_forever
'Cannot run the event loop while another loop is running')
RuntimeError: Cannot run the event loop while another loop is running
this works in python 3.5.2 but fails in 3.5.3. I'm guessing this exacerbated https://bugs.python.org/issue22087
Forking from a running event loop triggers many undefined behaviours. Just don't do it.
this is a really unfortunate response as this is a common workflow for async unittests (https://github.com/Martiusweb/asynctest) and would require ALL workers to be created before your first async method, which for things like pools cannot be guaranteed. Further, it would break scenarios where you have a sub-processes re-spawned after serving x number of requests. I've been using this workflow for over a year in a production environment without issue. Unless the point of this PR was to break this workflow I think it should addressed. What this PR is saying is that async processes cannot launch async sub-processes, which was never previously stated or enforced.
Further, in general, async forking support has been mothballed for more than a year and really needs to be addressed per the issue I referenced above.
this is a really unfortunate response as this is a common workflow for async unittests (https://github.com/Martiusweb/asynctest) and would require ALL workers to be created before your first async method, which for things like pools cannot be guaranteed.
If one package does something in some particular way, it doesn't mean that it's a correct way. It means that we need to fix that one package.
[..] What this PR is saying is that async processes cannot launch async sub-processes, which was never previously stated or enforced.
No, it doesn't say that. It just happened to expose a bug (forking from a running event loop was never officially supported). Forking with a running event loop is discouraged in virtually all async frameworks. There are dozens of side effects that can't be reliably resolved in a cross-platform case. Alternative asyncio event loop implementations such as uvloop will simply segfault if you do this.
There is a way to fork reliably, it just requires some boilerplate code. The key is to fork with the loop being paused. For example (untested):
fork_cb = None
def schedule_fork(new_proc):
global fork_cb
fork_cb = new_proc
asyncio.get_event_loop().stop()
async def coro():
# some asyncio code...
# some asyncio code...
# some asyncio code...
def new_proc():
print('forked process', os.getpid())
print('forking from', os.getpid())
schedule_fork(new_proc)
loop = asyncio.get_event_loop()
loop.create_task(coro())
# Create more tasks...
while True:
loop.run_forever()
if fork_cb is not None:
cb, fork_cb = fork_cb, None
if not os.fork():
try:
cb()
finally:
os.exit()
else:
break
A workaround would be to store PID in the threadlocal object that get_event_loop()
is using. But that would make the call slower and will only mask the real issue and enable bad practices.
Further, in general, async forking support has been mothballed for more than a year and really needs to be addressed per the issue I referenced above.
I think we actually can add built-in support for forking to asyncio: a coroutine asyncio.fork(cb)
:
async def some_coro():
await asyncio.fork(code_for_child_process)
I'll experiment with the idea and maybe propose this in 3.7.
@Martiusweb Would you be able to fix forking in asynctest by using the approach outlined in the above ^^ comment?
FYI asynctest doesn't directly fork, it's just users of asynctest like myself who happen to fork (launch aiohttp server in sub-process and then run tests). IMHO I believe the workaround you mentioned is not robust in that one may not control the main run loop (like when running a aiohttp server) and further I worry of the heavy hand of stopping the event loop for complicated applications which have many tasks which are blocked waiting for input, timers, and tasks waiting of various events. In my example I run an aiohttp server that dynamically launches processes from a worker-pool to process tens of thousands of S3 requests.
IMO if uvloop crashes that would be a problem in its run-loop impl and should be addressed separately and not affect the BaseEvent loop impl. You also say forking from async from run-loop was never officially supported, but by the same argument it was never officially not-supported either from what I gather from the docs (https://docs.python.org/3/library/asyncio-eventloop.html), and it worked. You say there are many problems with this workflow but there's been a proposed patch for over a year that has sat idle. IMHO I believe this is being punted for "being too hard" without justification given how easily it was worked around in the past. I would imagine the easiest solution is to have forked processes NOT inherit the event loop from the parent...anything else I can see being uber complicated.
I realize I have not dug into the innards of the BaseLoop impl but I'm trying to fight for what I think is a very useful workflow that should not be trivially waved as unimportant. The argument given against this to me feels like a reason for people to version lock to minor releases. It changes something from being trivial to use, to very complicated and cumbersome without a very icky hack from sub-processes I had to come up with:
def reset_running_loop(new_loop):
current_loop = asyncio.get_event_loop()
asyncio.set_event_loop(new_loop)
if PY_353 and current_loop.is_running():
asyncio.events._running_loop = asyncio.events._RunningLoop()
FYI another example that just uses asyncio to fork:
import asyncio
from concurrent.futures import ProcessPoolExecutor
def sub_proc():
async def doit():
print("hello")
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(doit())
async def main():
ppool = ProcessPoolExecutor()
loop = asyncio.get_event_loop()
await loop.run_in_executor(ppool, sub_proc)
if __name__ == '__main__':
_loop = asyncio.get_event_loop()
_loop.run_until_complete(main())
I think the justification needs to be made if python's base modules are fork safe or not. If asyncio is not fork safe this seems to go against the other python base modules. What work is required to be able to subprocess the BaseEventLoop to make it fork safe?
IMHO I believe the workaround you mentioned is not robust in that one may not control the main run loop (like when running a aiohttp server) ..
.. In my example I run an aiohttp server that dynamically launches processes from a worker-pool to process tens of thousands of S3 requests.
Forking from within a running HTTP server coroutine is a very bad idea. It's a blocking operation that might take non-trivial amount of time. You better pre-fork, or have a master process that controls its children processes and creates new ones when needed.
You can also use loop.subprocess_exec
to safely run your S3 logic.
and further I worry of the heavy hand of stopping the event loop for complicated applications which have many tasks which are blocked waiting for input, timers, and tasks waiting of various events.
Event loop can safely be stopped and resumed, no matter how many timers/tasks you have. If you care about timeouts being triggered because you pause the loop to fork -- the same will happen when you use os.fork()
(again, it's not a cheap-n-fast operation).
IMO if uvloop crashes that would be a problem in its run-loop impl and should be addressed separately and not affect the BaseEvent loop impl.
True (and that segfault will soon be fixed).
But there are problems even with forking pure python asyncio programs. For instance, epoll
is fundamentally not fork-safe. If you continue to use the same event loop in the forked child process you will encounter bugs/crashes/wrong behaviour.
One solution is to use the multiprocessing
module, which is supported by both asyncio and uvloop, or to do fork+exec manually.
FYI another example that just uses asyncio to fork: [..] ProcessPoolExecutor example [..]
You can also use run_in_executor
API with process pools -- that is also fully supported (because concurrent.futures
uses multiprocessing
).
Calling bare os.fork()
is fundamentally unsafe, you simply should not use it. It's a low level syscall, and asyncio event loop is at least one level above it. What you are asking is to fix one particular aspect of the API, so that you can continue to use what worked by accident. But even if we make get_event_loop
work you will eventually have other problems with os.fork
.
gevent, for instance, monkey patches os.fork
to make it work safe. As I said in my previous comment, we might want to add a specialized asyncio.fork
method, but bare os.fork
is very unlikely to to be fully supported ever.
I think the justification needs to be made if python's base modules are fork safe or not. If asyncio is not fork safe this seems to go against the other python base modules. What work is required to be able to subprocess the BaseEventLoop to make it fork safe?
Strictly speaking any network application (blocking or non-blocking) is not os.fork
friendly. You have to do the forking with extra care, and generally people only do fork+exec. The officially recommended way to do multiprocessing is to use the multiprocessing
package.
You also say forking from async from run-loop was never officially supported, but by the same argument it was never officially not-supported either from what I gather from the docs (https://docs.python.org/3/library/asyncio-eventloop.html), and it worked.
True, this is something we will hopefully fix very soon.
thanks for the insight, but I think our streams aren't quite crossing yet :) I never stated the workflows involved directly forking, I gave two examples that this PR broke without using fork directly. Also you can create a sub-process without blocking the parent process by using loop.run_in_executor(None, ...
so that issue is moot. Also as stated I don't want to use the parent's base event loop. I want to be able to fork a process and have it use it's own event loop like I'm able in 3.5.2 per the first example. Thank you for your time thinking about this workflow!
thanks for the insight, but I think our streams aren't quite crossing yet :) I never stated the workflows involved directly forking, I gave two examples that this PR broke without using fork directly. [..]
Right. I tried your script from https://github.com/python/asyncio/pull/452#issuecomment-281811666 and indeed what should work doesn't work! Seems like it's a bug after all, I'll open a PR.
yay =) ya it's the same problem as the first example.
As discussed in [1] and [2], this PR makes
asyncio.get_event_loop()
to always return the currently running event loop when it is called from a coroutine.The patch adds two new functions:
asyncio._set_current_loop()
andasyncio._get_current_loop()
. Both functions are thread-specific and are considered as low-level APIs, intended to be used by third-party loop implementations only. Even though they have a leading underscore, they will be documented as public asyncio API.loop.run_forever()
is modified to callasyncio._set_current_loop()
andasyncio._get_current_loop()
.asyncio.get_event_loop()
is modified to check if there is a current loop set by callingasyncio._get_current_loop()
. If there is no current event loop, it returns the result ofget_event_loop_policy().get_event_loop()
(old behaviour).The biggest part of the patch is fixing unit tests: all test classes call
super().setUp()
.test_utils.TestCase.setUp
patchesasyncio._get_current_loop()
to always returnNone
. This change is to make sure that the loop is still being passed explicitly within asyncio.[1] PR: https://github.com/python/asyncio/pull/355 [2] https://groups.google.com/d/msg/python-tulip/yF9C-rFpiKk/tk5oA3GLHAAJ