python / asyncio

asyncio historical repository
https://docs.python.org/3/library/asyncio.html
1.04k stars 177 forks source link

Docs: clarification of "loop=None' defaults #362

Open Badg opened 8 years ago

Badg commented 8 years ago

Lots of asyncio primitives and operations have a keyword argument loop=None. I cannot find an explanation of it in the docs (doesn't mean it doesn't exist, just that I can't find it). Reading the source yields the following: The optional event_loop argument allows explicitly setting the event loop object used by the future. If it's not provided, the future uses the default event loop (for the current context).

For me personally, this is pretty unclear, especially when running multiple event loops. I think I've figured it out, but asyncio would be easier to learn if it were better documented. Specifically, (and again for me personally), important questions that would be helpful to answer in the docs are:

  1. What behavior does loop=None effect? eg, "The optional loop argument allows explicitly setting the event loop the future is 'run' in. Passing None will use the default loop for the current context, as determined by asyncio.get_event_loop().
  2. What exactly is meant by "the current context"? (Should this be a separate issue?) Reading the source (asyncio.events.py):

    def get_event_loop(self):
    """Get the event loop.
    
    This may be None or an instance of EventLoop.
    """
    
    if (self._local._loop is None and
       not self._local._set_called and
       isinstance(threading.current_thread(), threading._MainThread)):
       self.set_event_loop(self.new_event_loop())
    
    if self._local._loop is None:
       raise RuntimeError('There is no current event loop in thread %r.'
                          % threading.current_thread().name)
    
    return self._local._loop

    I'm pretty sure that means "the current thread". Disambiguation here would be very helpful, especially for those of us mixing threading with asyncio (again, a very important use case will be running multiple event loops).

  3. Ideally, including some helpful yardstick about when you should consider passing the loop explicitly would help clear up some uncertainty surrounding the lay of the land. I'm hesitant to use the default value of None, even though it cleans up my coro signatures, I think because I'm unclear on the internals of the asyncio primitives and how they interact with threads. I'm aware they aren't threadsafe, but lacking an understanding of what makes them un-threadsafe, I'm unsure of what situations do, and do not, warrant explicitly setting the loop. I'm pretty sure I understand it, but there's enough uncertainty for me that I'm pretty much just always explicitly setting the loop, because "better safe than sorry".

Cheers!

gvanrossum commented 8 years ago

Yes, (1) is pretty much on the mark.

Re (2), we intentionally don't say too much about "current context" because its definition is actually dependent on the implementation of the (global) event loop policy, which defines the behavior of get_event_loop(), set_event_loop() and new_event_loop(). The default policy supports only one loop per thread and requires an explicit set_event_loop(new_event_loop()) call for all threads except the main thread (where the first get_event_loop() call implies the latter).

Re (3), I would first start by discussing a yardstick for when to have multiple event loops per thread. I personally believe this should be essentially never, because asyncio was designed with that assumption -- explicit manipulation of the event loop is considered mostly useful for writing asyncio's own test suite and perhaps for platforms that integrate an asyncio event loop with their system's UI event handling.

On Mon, Jun 20, 2016 at 10:36 AM, Nick Badger notifications@github.com wrote:

Lots of asyncio primitives and operations have a keyword argument loop=None. I cannot find an explanation of it in the docs (doesn't mean it doesn't exist, just that I can't find it). Reading the source yields the following: The optional event_loop argument allows explicitly setting the event loop object used by the future. If it's not provided, the future uses the default event loop (for the current context).

For me personally, this is pretty unclear, especially when running multiple event loops. I think I've figured it out, but asyncio would be easier to learn if it were better documented. Specifically, (and again for me personally), important questions that would be helpful to answer in the docs are:

  1. What behavior does loop=None effect? eg, "The optional loop argument allows explicitly setting the event loop the future is 'run' in. Passing None will use the default loop for the current context, as determined by asyncio.get_event_loop(). 2.

    What exactly is meant by "the current context"? (Should this be a separate issue?) Reading the source (asyncio.events.py):

    def get_event_loop(self):"""Get the event loop.This may be None or an instance of EventLoop.""" if (self._local._loop is None and not self._local._set_called and isinstance(threading.current_thread(), threading._MainThread)): self.set_event_loop(self.new_event_loop()) if self._local._loop is None: raise RuntimeError('There is no current event loop in thread %r.' % threading.current_thread().name) return self._local._loop

    I'm pretty sure that means "the current thread". Disambiguation here would be very helpful, especially for those of us mixing threading with asyncio (again, a very important use case will be running multiple event loops).

  2. Ideally, including some helpful yardstick about when you should consider passing the loop explicitly would help clear up some uncertainty surrounding the lay of the land. I'm hesitant to use the default value of None, even though it cleans up my coro signatures, I think because I'm unclear on the internals of the asyncio primitives and how they interact with threads. I'm aware they aren't threadsafe, but lacking an understanding of what makes them un-threadsafe, I'm unsure of what situations do, and do not, warrant explicitly setting the loop. I'm pretty sure I understand it, but there's enough uncertainty for me that I'm pretty much just always explicitly setting the loop, because "better safe than sorry".

Cheers!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/python/asyncio/issues/362, or mute the thread https://github.com/notifications/unsubscribe/ACwrMk09SPX54hCnaokljaW1XiNr1l65ks5qNs-ygaJpZM4I58LB .

--Guido van Rossum (python.org/~guido)

Badg commented 8 years ago

(1): awesome. If nobody gets around to it by then, I may be able to take care of that in the next few weeks, assuming someone can point me the way for contribution (doc quality requirements, pull request to where, etc)

(2): I thought that might be the case, given its separation from AbstractEventLoopPolicy. But, isn't the ambiguity still a barrier to understanding? Perhaps an appropriate compromise might be to add an explanation in the AbstractEventLoopPolicy.get_event_loop documentation that says something to the effect of "The current context is defined by the specific event loop implementation. For the event loops available within asyncio, the current context is equivalent to the current thread. For third-party event loops, this may not be the case." For an added bonus, that section could also be linked from anywhere that mentions "current context".

(3) To be completely honest, it hadn't even occurred to me that you could have multiple event loops per thread. That also strikes me as particularly messy. I was implying that the event loop was happening in a different thread (but the same process). There are more than a few reasons that could happen; in my case I'm using two event loops so I can separate semi-ephemeral transactions (sending and receiving on long-lived websockets connections) from very-long-lived asynchronous message transactions (as an example, imagine awaiting an email response). I suppose there's a case to be made that I should combine these into the same event loop, but it very severely impacts the composability of code in my particular case to do so.

A more immediate and definitive example would be running a third-party library synchronously, while using the event loop to handle IO. Some libraries, eg Kivy, must be run in the main thread. So I can imagine someone using Kivy for UI and aiohttp to connect or provide some web service, and then dealing with the aiohttp event loop from the kivy (main) thread would then require explicit loop declarations. Note that this would entail a lot of creativity and the thread boundaries, and many calls to the various _threadsafe functions asyncio provides, but I could definitely see it happening.

Does my concern there make sense? I have at this point a pretty good understanding of when I need to pass the loop explicitly, but I've only figured that out after a lot of experimentation and trial+error. That learning curve could be substantially eased by a note somewhere about when it's useful to pass loop=<foo> explicitly.

I should also hedge this whole thing by saying that my understanding of the internals of the event loops is very limited, so (in particular re: multiple event loops per process by not thread) I may be way off-base.

gvanrossum commented 8 years ago

(1) For docs we still use Hg, just submit your patch to bugs.python.org (more instructions at https://docs.python.org/devguide/).

(2) But the distinction is not up to the event loop -- the policy is a separate object. You can have a new loop without a new policy, and vice versa. It may be reasonable to say that "current context" is the same as "current thread" unless the event loop policy has been changed (in which case the semantics are entirely up to the new policy object).

(3) All communication between loops running in different threads should use loop.call_soon_threadsafe(), and you should avoid any other explicit use of event loops. If you're not doing it that way you're asking for trouble. Given that your "understanding of the internals of the event loops is very limited" you should just not do this.

On Mon, Jun 20, 2016 at 2:20 PM, Nick Badger notifications@github.com wrote:

(1): awesome. If nobody gets around to it by then, I may be able to take care of that in the next few weeks, assuming someone can point me the way for contribution (doc quality requirements, pull request to where, etc)

(2): I thought that might be the case, given its separation from AbstractEventLoopPolicy. But, isn't the ambiguity still a barrier to understanding? Perhaps an appropriate compromise might be to add an explanation in the AbstractEventLoopPolicy.get_event_loop https://docs.python.org/3/library/asyncio-eventloops.html#asyncio.AbstractEventLoopPolicy.get_event_loop documentation that says something to the effect of "The current context is defined by the specific event loop implementation. For the event loops available within asyncio, the current context is equivalent to the current thread. For third-party event loops, this may not be the case." For an added bonus, that section could also be linked from anywhere that mentions "current context".

(3) To be completely honest, it hadn't even occurred to me that you could have multiple event loops per thread. That also strikes me as particularly messy. I was implying that the event loop was happening in a different thread (but the same process). There are more than a few reasons that could happen; in my case I'm using two event loops so I can separate semi-ephemeral transactions (sending and receiving on long-lived websockets connections) from very-long-lived asynchronous message transactions (as an example, imagine awaiting an email response). I suppose there's a case to be made that I should combine these into the same event loop, but it very severely impacts the composability of code in my particular case to do so.

A more immediate and definitive example would be running a third-party library synchronously, while using the event loop to handle IO. Some libraries, eg Kivy, must be run in the main thread. So I can imagine someone using Kivy for UI and aiohttp to connect or provide some web service, and then dealing with the aiohttp event loop from the kivy (main) thread would then require explicit loop declarations. Note that this would entail a lot of creativity and the thread boundaries, and many calls to the various _threadsafe functions asyncio provides, but I could definitely see it happening.

Does my concern there make sense? I have at this point a pretty good understanding of when I need to pass the loop explicitly, but I've only figured that out after a lot of experimentation and trial+error. That learning curve could be substantially eased by a note somewhere about when it's useful to pass loop= explicitly.

I should also hedge this whole thing by saying that my understanding of the internals of the event loops is very limited, so (in particular re: multiple event loops per process by not thread) I may be way off-base.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/python/asyncio/issues/362#issuecomment-227273157, or mute the thread https://github.com/notifications/unsubscribe/ACwrMn17ZKui8VtqHotpYId4VW7Vkqxnks5qNwQogaJpZM4I58LB .

--Guido van Rossum (python.org/~guido)

Badg commented 8 years ago

(1): 👍 I'll see what I can do.

(2): Gotcha, I didn't realize the policy was that loosely coupled with the loop. What about "The current context is defined by the specific event loop policy. Typically, 'current context' is equivalent to 'current thread'. If the policy has changed, the meaning of 'current context' may change as well."

(3) The safety of my use of asyncio is grossly off-topic; I understand that bridging the thread boundary requires use of the threadsafe methods, and I also understand that ignoring thread safety for the event loops is very much caveat emptor. What I'm hoping the docs would clarify, is why explicit setting of the event loop when creating asyncio objects, ex. foo = asyncio.Lock(*, loop=bar), would ever be needed, given that the above are the case. To me (and I'm sure others), the existence of the loop keyword argument suggests that the objects can be created from different loops and/or threads. If that isn't possible, it needs to be documented; the big "unless otherwise noted these aren't threadsafe" for asyncio as a whole seems insufficient when confronted by the ambiguity of the loop keyword.

gvanrossum commented 8 years ago

(2) I'd prefer stating what the default policy does and explaining that each policy has its own interpretation.

(3) The exact conditions are too hard to explain; the simplified rule is "don't use this".

On Mon, Jun 20, 2016 at 3:44 PM, Nick Badger notifications@github.com wrote:

(1): 👍 I'll see what I can do.

(2): Gotcha, I didn't realize the policy was that loosely coupled with the loop. What about "The current context is defined by the specific event loop policy. Typically, 'current context' is equivalent to 'current thread'. If the policy has changed, the meaning of 'current context' may change as well."

(3) The safety of my use of asyncio is grossly off-topic; I understand that bridging the thread boundary requires use of the threadsafe methods, and I also understand that ignoring thread safety for the event loops is very much caveat emptor. What I'm hoping the docs would clarify, is why explicit setting of the event loop when creating asyncio objects, ex. foo = asyncio.Lock(*, loop=bar), would ever be needed, given that the above are the case. To me (and I'm sure others), the existence of the loop keyword argument suggests that the objects can be created from different loops and/or threads. If that isn't possible, it needs to be documented; the big "unless otherwise noted these aren't threadsafe" for asyncio as a whole seems insufficient when confronted by the ambiguity of the loop keyword.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/python/asyncio/issues/362#issuecomment-227292392, or mute the thread https://github.com/notifications/unsubscribe/ACwrMprYdPYmlQGuJLobPdhTcb9CtnQFks5qNxfWgaJpZM4I58LB .

--Guido van Rossum (python.org/~guido)