Closed miracle2k closed 6 years ago
Oh ick, it's very unfortunate we missed this :-(. That's a real problem in our testing... it looks like currently Travis is testing 3.7-dev, but Travis's normal idea of "3.7-dev" is some snapshot from January, which may predate this change, and if you want to actually test 3.7 or 3.8-dev then you need to jump through some poorly-documented hoops. We should do that pretty urgently, given how dependent this project is on asyncio development.
As for what to do now... I'm actually not sure what our options are. Going to CC @1st1. The problem is that we need to somehow override get_event_loop
so that it can return different values in different Trio tasks, all running on the same thread. So far we've done this by monkeypatching, because there was no alternative (and because the code we were patching looked unlikely to change, haha whoops). If the current loop was stored as a ContextVar
instead of a thread-local, this would be easy – and looking at _asynciomodule.c
I think this might also simplify the code, since right now it's got its own implementation of the clever caching stuff that ContextVar
uses? But of course even if @1st1 agrees then that still requires changes to _asynciomodule.c
, which doesn't help for 3.7 releases already in the wild. As for what we can do right now... this seems very difficult, since now all the get_event_loop
logic is written as static
functions inside _asynciomodule.c
, and it's called directly by code like Future.__init__
, so there's no real way to intercept it at all.
We could...
Try to somehow disable the _asynciomodule.c
speedup module entirely? But this seems very difficult, especially if import asyncio
happens before import trio_asyncio
.
Use a trio Instrument
to detect whenever the trio scheduler is about to switch to a task that has an active trio-asyncio loop, and call asyncio.set_running_loop
? That sounds super awkward and also inefficient.
Sneak some patch into 3.7.1 that adds a new API to _asynciomodule.c
that we can use, with the argument that totally breaking trio-asyncio is a regression in 3.7? (I'm not sure when 3.7.1 is expected; it's possible it's, like, "tomorrow", in which case I guess 3.7.2 would be the goal.)
Yury, any thoughts?
Gaah. Sorry that I entirely missed this. Travis needs convincing that updating this kind of critical testing infrastructure more often is necessary.
Personally I'm all for the third solution: Store the current asyncio loop in a contextvar. Could somebody look into creating a patch for this? I'm unfortunately swamped with (a) work (b) craftsmen (c) sorting my partner's belongings into my home (d) more work …
Store the current asyncio loop in a contextvar. Could somebody look into creating a patch for this?
First we'd want to find out whether Yury would be willing to accept such a patch :-)
That is the first step of "looking into". :-P
As for what to do now... I'm actually not sure what our options are. Going to CC @1st1. The problem is that we need to somehow override get_event_loop so that it can return different values in different Trio tasks, all running on the same thread.
I think I need more details to understand the problem you're trying to solve here. BTW, have you tried using get_running_loop()
? Or asyncio.current_task().get_loop()
?
So far we've done this by monkeypatching, because there was no alternative (and because the code we were patching looked unlikely to change, haha whoops). If the current loop was stored as a ContextVar instead of a thread-local, this would be easy – and looking at _asynciomodule.c I think this might also simplify the code, since right now it's got its own implementation of the clever caching stuff that ContextVar uses?
I tried storing the current event loop in a ContextVar and it didn't work. We'd have to add ContextVar on top of the current code and that's a lot of extra complexity. Besides, I'd be cautious about exposing some internal context var as part of asyncio public API.
I think I need more details to understand the problem you're trying to solve here.
Short: we're re-implementing the asyncio mainloop on top of Trio. Thus when we run a callback (including a Future's completion routine) we need to restore asyncio's idea of the current event loop ourselves. The sensible way to do this is via contextvars, so I store the current loop in the current Trio task's context and monkeypatch asyncio.get_event_loop
to read it from there, if present. (Otherwise I fall back to the previous code, since there may be threads with a "traditional" asyncio loop.)
BTW, have you tried using
get_running_loop()
? Orasyncio.current_task().get_loop()
?
Trio-asyncio also monkeypatches get_running_loop
because asyncio uses a thread-local for it. We can't use that because there can be more than one concurrent trio-asyncio mainloops in a thread. (Our mainloop is started by an async context manager instead of asyncio.run()
, obviously.)
asyncio.current_task().get_loop()
cannot work, because the first thing current_task()
does is to retrieve the current loop. :-P
I tried storing the current event loop in a ContextVar and it didn't work.
Why not?
I'd be cautious about exposing some internal context var as part of asyncio public API.
Umm, well, the contextvar wouldn't be exposed – it'd be carried over seamlessly as part of the contextvar support in Trio and Trio-asyncio. We'd set it once, via set_event_loop()
right after creating the loop's context, and we'd be good.
@1st1 We're implementing an asyncio event loop. Our loop object is associated with a specific Trio nursery, that it uses to schedule callbacks and so forth. For code that's running inside this nursery, we want get_event_loop
to return this object.
And then maybe we have another nursery somewhere else, inside another task, and it has a different asyncio event loop associated with it... and for code inside this nursery, we want get_event_loop
to return this object.
And for code that's inside trio, but not inside any asyncio event loop, we'd really prefer that get_event_loop
would raise a helpful error instead of silently creating a normal non-trio-integrated event loop object. Though I'm not sure if we have that working now or not.
async def function_that_uses_an_asyncio_library():
async with trio_asyncio.open_loop() as loop:
assert asyncio.get_event_loop() is loop
async def main():
async with trio.open_nursery() as nursery:
# running two of those at the same time, both assertions should pass
# (even though the loop objects are different)
nursery.start_soon(function_that_uses_an_asyncio_library)
In asyncio, the event loop is a global thing, shared across the whole program, and the event loop lets you spawn background tasks. But Trio's whole thing is that background tasks can't be globally scoped. How can those be reconciled? Well, this is a compromise: we make the event loop "global" within a branch of the task tree, so from the inside it feels like asyncio, but from the outside we're still enforcing trio's normal restrictions.
Wasn't it the case before #13 that trio-asyncio had its own EventLoopPolicy? Would it be a solution to bring that back (but maybe require to explicitly install it rather than automatically on import?
Well I mean, #13 was that messing with the EventLoopPolicy
broke your code and we had to switch to this other thing to fix it :-). (The auto-at-import thing isn't really the issue I don't think... no matter when you override the global policy you're going to break asyncio code running after that.)
And for code that's inside trio, but not inside any asyncio event loop, we'd really prefer that get_event_loop would raise a helpful error instead of silently creating a normal non-trio-integrated event loop object. Though I'm not sure if we have that working now or not.
No, though that's rather easy to change. (5b6496d (untested))
Well I mean, #13 was that messing with the EventLoopPolicy broke your code and we had to switch to this other thing to fix it :-). (The auto-at-import thing isn't really the issue I don't think... no matter when you override the global policy you're going to break asyncio code running after that.)
We can certainly try whether reverting #13 would help here. That would imply that we stop supporting any asyncio>trio usecases on Python >3.6, but as asyncio.run()
is now the preferred way of starting an asyncio loop this is a non-issue – replace that with trio_asyncio.run()
and you're done.
NB: Are loop.run_until_complete()
and friends going to be deprecated sometime, or will asyncio carry that around indefinitely?
This issue prompted me to go back and review why we want get_event_loop
to work in trio-mode functions. (In asyncio-mode functions it's not a huge deal, because trio-asyncio can do set_running_loop
whenever it invokes an asyncio-mode callback.)
I think there are two reasons:
Reason 1 is: originally, we were fumbling around trying to figure out how seamlessly we could get trio and asyncio code to work together, and even considering whether we could make it totally seamless (i.e. supporting straight-up await asyncio_fn(...)
from trio functions and vice-versa). So in that context, making synchronous functions work in both modes made a lot of sense. I think since then we've moved more towards a strict separation between trio-mode and asyncio-mode, with explicit transitions, so maybe this isn't as relevant anymore?
Reason 2 is: we were originally thinking about converting code like:
loop = asyncio.get_event_loop()
loop.run_until_complete(some_fn())
loop.run_until_complete(another_fn())
loop.run_forever()
So we were imagining you'd be doing on-trivial work with the asyncio loop, from outside the asyncio context. And in particular, some_fn
and another_fn
here might well be Future
-returning functions that used get_event_loop
internally. But now, asyncio is moving strongly away from this style -- though I guess there is a fair amount of code out there that does still use it; I'm not sure how important it is for trio-asyncio to continue to support it or not.
These two things – that we've moved more towards a strong trio-mode vs asyncio-mode separation, and that asyncio has moved more towards asyncio.run
style – make me wonder if we should consider refactoring our API a bit more generally. Would it make sense to move away from open_loop
and towards just having run_in_asyncio
and run_in_trio
functions, that mark the transitions between trio-mode and asyncio-mode, and make that boundary stronger by not attempting to support get_event_loop
at all when in trio mode? That would also be simpler for users – fewer concepts to keep track of – but is it practical?
(@smurfix I guess this question is mostly for you, since you're probably the person who has the most actual experience using trio-asyncio, while I am just speculating from my armchair over here.)
The main thing that worries me here is the question of when loops are created versus reused. Right now it's very clear: open_loop
creates a loop and sets it as active, run_in_asyncio
always uses the nearest active loop (and if there is none, that's an error). If we got rid of open_loop
, then we'd have to change this as well, and it's a subtle issue.
Like, if you do run_in_asyncio
→ run_in_trio
→ run_in_asyncio
, obviously the outer run_in_asyncio
needs to create a new loop. Does the inner run_in_asyncio
also create a new loop, or does it re-use the loop that the outer run_in_asyncio
created?
I suppose creating a new loop would be the most consistent. But you do need a way to re-use the outer loop, for when the Trio code is a plugin inside a larger asyncio program and you need to call back into that larger program, so if we did it this way we'd need some way to re-use the existing loop as well. Maybe loop.run_in_asyncio
, which would be clean and explicit... but how burdensome would it be in practice to have to pass the loop through trio? Or maybe run_in_asyncio(..., reuse_loop=True)
?
Or, we could make it so the inner run_in_asyncio
automatically re-uses an outer loop. This would simplify the case where you do want to use the same loop, but... it kind of makes me itch a bit. Imagine you have a library, call it "traiohttp", that uses some asyncio library internally, but provides a trio API on the outside. If run_in_asyncio
automatically checks for any existing loops and reuses them, then "traiohttp"'s semantics will suddenly change when you use it from a program that just happens to use run_in_asyncio
in some far away part of the call stack. In particular, suddenly "traiohttp" may start accidentally (!) be creating tasks in some far away nursery way up the stack, and violate causality (!).
Actually "traiohttp" is kind of an interesting case... you probably use it like:
# This is literally just aiohttp's API
async with traiohttp.ClientSession() as session:
async with session.get(url) as response:
print(response.text())
The interesting thing here is that we're doing multiple operations, on multiple different objects, that all have to run on the same asyncio loop. But traiohttp wants to masquerade as a plain ordinary trio library that only happens to use trio-asyncio as an invisible internal implementation detail. To do this it will actually have to create a loop object at the same time the ClientSession
is created, and then hold onto that loop object and pass it around and use it explicitly for later operations. If it relies on any kind of implicit loop passing, then it might get confused if the calling app happens to also be using trio-asyncio for some other purpose.
In fact, it doesn't even need a "loop" object in the sense of asyncio – the only operation it needs is run_in_asyncio
. So maybe a better way to think of this is as a trio→asyncio portal. And that suggests an API like:
# Explicitly opening a portal and using it:
async with trio_asyncio.open_asyncio_portal() as asyncio_portal:
...
# Convenience function, similar to asyncio.run
async def run_in_asyncio(fn, *args):
async with trio_asyncio.open_asyncio_portal() as asyncio_portal:
return await asyncio_portal.run(fn, *args)
# And there's a global trio_asyncio.run_in_trio for getting back into trio
# Because there's only one Trio state, we don't need to keep track of multiple loop objects
# And trio's API is already composable, with no globals or causality violation, so basically
# introducing a loop object wouldn't do anything.
await asyncio_trio.run_in_trio(fn, *args)
Does that logic make sense? And, @smurfix , @miracle2k , anyone else who's tried using trio_asyncio for solving actual problems... how convenient or burdensome would this API be for your code?
In most use cases you're not simply calling one asyncio
function that does something complex and when that's done, that's it. That's how you design Trio code. Asyncio code usually doesn't works that way.
Thus, creating a new loop in run_asyncio
seldom makes sense. Also, "inner" calls to asyncio may happen when asyncio calls your Trio callback; creating a new loop for that by default would break things.
As to passing the loop explicitly vs. storing it in the context: usually you don't get to decide what arguments your callback gets, so you need to store the loop somewhere. The context's advantage is that you can't access a dead loop – when the context exits, the loop's gone, no chance of re-using it by accident.
@smurfix Hmm, ok, I hear that. And what do you think about calling sync asyncio APIs from trio-mode code – is that something you end up doing in practice? Actually, I think I want to split that into two questions: (1) do you call loop methods directly? (2) do you call functions that implicitly do get_event_loop
, like Future.__init__
?
Can you point me to any repos with ordinary everyday trio-asyncio code in them, like your homeassistant plugins or whatever?
(Sorry for all the questions :-))
Well, as soon as you're in sync code there's no longer any visible distinction between trio and asyncio modes, so why bother?
Switching into asyncio is somewhat expensive (queue the call, wait for the asyncio mainloop task to process it), you wouldn't want to do that for sync calls if it can be avoided.
@smurfix But does it come up that you're in trio mode and then are like "oh, I need to call this asyncio API that happens to be synchronous", or does that mostly only happen when you're already in asyncio mode?
And the reason to bother is, well, this issue: currently on 3.7 we do not have any way to support calling synchronous asyncio APIs when we're in trio mode, and we need to decide what to do about that. Restoring that functionality might only be possible with heroic efforts. So I'm trying to figure out how important it is.
@njsmith Suppose you need to talk to an asyncio protocol. So you override the dataReveived callback to queue the data to to your Trio task, that's easy and reasonably cheap. You then need to send something back, so you call yourProtocol..transport.send() … oops.
In actual usage, this is a not that big an issue because self-respecting protocols save their loop when they're set up and pass it to anything that might need it. At setup time the loop variable should be easily accessible.
in other words, just pass the loop into the future-or-whatever and things work again:
import trio
import trio_asyncio
from asyncio import Future, get_event_loop
@trio_asyncio.trio2aio
async def x(loop):
f = Future(loop=loop)
get_event_loop().call_later(2, f.set_result, 1)
await f
async def main():
async with trio_asyncio.open_loop() as loop:
await x(loop)
trio.run(main)
IMHO the way forward is to remove compatibility mode (or at least the guarantee thereof), i.e. force people to us trio_asyncio.run
instead of asyncio.run
. At that point I can revert the patch that replaced the installation of trio_asyncio
's loop policy with monkeypatching because importing trio_asyncio
can never happen too late, which was the reason for removing it IIRC (nut I will check).
If that still causes problems, well, teaching people to pass the loop around in Trio mode is not too much of a burden – you only need it when setting things up from Trio code (always assuming that the asyncio
code in question passes the loop
around correctly). It's still a departure from "trio_asyncio knows which loop you're talking about, so you don't need to bother with that", but it's non-intrusive enough, so I could live with that.
You then need to send something back, so you call
yourProtocol.transport.send()
… oops.
If we made it so that either this worked (if transport.send
doesn't call get_event_loop
), or else raised an error ("event loop not found"), and directed people to do portal.run_sync(yourProtocol.transport.send, ...)
, then... it wouldn't be too bad. I see what you're saying about it being nicer to not have to do that at all though.
self-respecting protocols save their loop
That was the original way asyncio was supposed to work. Then experience (and critiques from curio and trio!) convinced them that it was a bad approach, and now they've moved much more strongly towards implicit loop access... plus, well, not every library is well-written, but people still want to use them. So I think we shouldn't count on asyncio libraries tracking their loops manually, and I definitely don't want to end up telling people "you can call most sync-colored APIs, except the ones you can't, but probably you won't run into those very often, so don't worry about it".
IMHO the way forward is to remove compatibility mode (or at least the guarantee thereof), i.e. force people to us
trio_asyncio.run
instead ofasyncio.run
.
That's fine with me – as far as I'm concerned the only reason we ever supported working loop.run_until_complete
, asyncio.run
, etc. is as an internal hack to let us access more pre-existing test suites.
But I don't think this next bit is right :-(
At that point I can revert the patch that replaced the installation of trio_asyncio's loop policy with monkeypatching because importing trio_asyncio can never happen too late, which was the reason for removing it IIRC (nut I will check).
When you call asyncio.get_event_loop
, it checks:
The problem is that neither of this is actually what we want. get_running_loop
obviously doesn't work because we'd have to call it on every task switch. (I guess we could do this, by hacking some special-case code into the trio scheduler, but I'd really really like to avoid that.) And the event loop policy allows us to run arbitrary code, so we can tell it to check a contextvar or raise an error... but it's process-global, so our policy will apply to all threads, not just the one that trio is running in.
And sure enough, when we tried that, someone filed a bug (#13) saying that they were trying to run vanilla-asyncio or uvloop-asyncio in one thread and trio-asyncio in a second thread and the event loop policies were colliding and breaking stuff.
We could reduce these issues by being cleverer. For example, we could wait until the first time someone creates a trio-asyncio loop to install our policy, and we could make sure that when our policy detects that it's being called outside of the trio-asyncio thread, then it defers to whatever policy the user set before that. That might even be good enough to get us through the 3.7 cycle, until we could get something better? But it's still going to create really arcane frustrating bugs.
I haven't followed the discussion in detail, sorry, but I just wanted to say that there's zero chance of us changing how asyncio.get_event_loop()
and related functions work in 3.7 (including using contextvars in any way). So I suggest to explore how asyncio policies can be used to fix this.
@1st1 That kindof goes without saying.
I do wonder, though, whether there's a reason the loop policy isn't thread-specific, besides the obvious answer of "we didn't think there'd be a usecase for that"?
@1st1 That kindof goes without saying.
I do wonder, though, whether there's a reason the loop policy isn't thread-specific, besides the obvious answer of "we didn't think there'd be a usecase for that"?
I suggest you to soften your communication style. It really comes out as unnecessarily snarky and makes me unmotivated to continue the discussion here.
besides the obvious answer of "we didn't think there'd be a usecase for that"?
The obvious answer is that asyncio wasn't designed to work in multithreaded environments the way Trio wants it to. And IIRC Guido didn't like the idea of using many event loops in multiple OS threads in one process. The API was explicitly and consciously designed to use a single global asyncio policy for the entire process, as managing many of them in multiple threads manually would be too cumbersome for pure asyncio programs. It's also the first time I hear this feature to be proposed (which by itself doesn't mean that the idea is bad, it only means that it's not that common). The ship for making policies thread-specific has sailed though, so we'll have to work around the existing API and its limitations.
I understand that Trio wants to have a very robust and unbreakable mechanism, but, unfortunately, I don't see how we can do that besides monkey-patching asyncio's set_event_loop_policy
(to issue warnings when something doesn't work as expected).
On 30.07.2018 17:50, Yury Selivanov wrote:
@1st1 <https://github.com/1st1> That kindof goes without saying. I do wonder, though, whether there's a reason the loop policy isn't thread-specific, besides the obvious answer of "we didn't think there'd be a usecase for that"?
I suggest you to soften your communication style. It really comes out as unnecessarily snarky and makes me unmotivated to continue the discussion here.
I am sorry. That wasn't intentional – in fact I have been on the receiving end of that sort of comment often enough to take this sort of thing with a large grain of levity – and my awareness of the fact that other people might perceive differently is, unfortunately, somewhat limited.
-- -- Matthias Urlichs
I am sorry. That wasn't intentional – in fact I have been on the
NP, I think I actually overreacted a bit. FWIW I'm super interested in Trio and pretty open to fixing asyncio when there's a clear win for both projects. So please continue the discussion, I'm trying to follow it.
monkey-patching asyncio's
set_event_loop_policy
That's actually an interesting idea I hadn't considered. We could make our own "smart" loop policy that checks whether it's in a trio thread, and if not then passes on the request to whatever policy the user had registered. And then we could monkeypatch set_event_loop_policy
so that whenever someone tries to set a new policy, we redirect that to changing our fallback policy. That could actually work well, so long as the policy APIs remain in Python and monkeypatchable (which appears to be the case for 3.7 at least).
It'd still probably be good to get out of this monkeypatching business eventually, but that could restore current functionality on 3.7, at least.
0.8.0 implements a per-thread loop policy. This should allow trio-asyncio (in thread A) to interoperate seamlessly with pure asyncio (thread B) and/or uvloop (thread C).
Getting out of the monkeypatching business requires asyncio support: we'd need a per-thread policy.
0.8.0 implements a per-thread loop policy. This should allow trio-asyncio (in thread A) to interoperate seamlessly with pure asyncio (thread B) and/or uvloop (thread C).
...Maybe asyncio should work this way, but it still seems like a surprise that trio-asyncio would make asyncio's loop policy thread-local? And unnecessary for our purposes...?
done @ v0.8.2
says:
This seems to be related to stuff now written in C. A naked
Future()
call does not pick up the trio loop.If in asyncio/futures.py I go to the end and change this code:
to make sure
_asyncio
is not imported, my program works.