python / cpython

The Python programming language
https://www.python.org
Other
63.36k stars 30.34k forks source link

Revert the new asyncio Streams API #82423

Closed 1st1 closed 4 years ago

1st1 commented 5 years ago
BPO 38242
Nosy @rhettinger, @njsmith, @asvetlov, @cjrh, @ambv, @1st1, @icgood, @tirkarthi, @bmerry, @aeros
PRs
  • python/cpython#16445
  • python/cpython#16455
  • python/cpython#16482
  • python/cpython#16485
  • python/cpython#13143
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-bug', '3.8', 'expert-asyncio'] title = 'Revert the new asyncio Streams API' updated_at = user = 'https://github.com/1st1' ``` bugs.python.org fields: ```python activity = actor = 'icgood' assignee = 'none' closed = True closed_date = closer = 'asvetlov' components = ['asyncio'] creation = creator = 'yselivanov' dependencies = [] files = [] hgrepos = [] issue_num = 38242 keywords = ['patch'] message_count = 25.0 messages = ['352908', '352911', '352931', '352934', '352935', '352936', '352942', '352943', '352947', '352948', '353044', '353045', '353046', '353052', '353057', '353058', '353103', '353401', '353405', '353417', '353420', '353536', '353539', '353540', '353541'] nosy_count = 10.0 nosy_names = ['rhettinger', 'njs', 'asvetlov', 'cjrh', 'lukasz.langa', 'yselivanov', 'icgood', 'xtreak', 'bmerry', 'aeros'] pr_nums = ['16445', '16455', '16482', '16485', '13143'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue38242' versions = ['Python 3.8'] ```

    1st1 commented 5 years ago

    == Context

    1. Andrew and I implemented a new streaming API in asyncio 3.8. The key idea is that there's a single Stream object, combining old StreamReader & StreamWriter APIs. On top of that, Stream.write() and Stream.close() became coroutines. The new API is significantly easier to use, but it required us to:

    (a) Make backwards compatible changes to subprocess APIs; (b) Add two new APIs: asyncio.connect() -> Stream and asyncio.StreamServer (c) Soft-deprecate asyncio.open_connection() and asyncio.start_serving().

    1. The Trio project considers implementing new Streams API (see [1]). The key idea is to make the core Stream object very simple and then enable building complex protocol pipelines using composition. Want SSL? Add an SSL layer. Have a framed binary protocol? Push a reusable framing layer to the stack and parse the protocol. On top of that, Trio wants us to standardize Streams, so that it's possible to write framework agnostic protocol code using async/await and to even reuse things like SSL implementation.

    == Problem

    I really like how Trio approaches this.

    The asyncio.Stream class we have now is overloaded with functionality. It's not composable. It's internal buffer and APIs are designed to parsing text protocols (i.e. parsing a complex binary protocol requires an entirely different buffer implementation).

    Long story short, I think we should revert the new streaming API from the 3.8 branch and see if Trio & asyncio can design a better Streaming API. Otherwise we end up in a weird situation where we added a bunch of new APIs to 3.8 which can be deprecated in 3.9.

    Worst case scenario we'll just ship our current versions of Streams in 3.9 (not in 3.8).

    Thoughts?

    [1] https://github.com/python-trio/trio/issues/1208

    njsmith commented 5 years ago

    Yury asked me to weigh in here, since I guess between him and Andrew there's some uncertainty about whether reverting is the right choice or not. I can't answer that, but I can share some thoughts.

    Unfortunately, I wasn't aware of the Stream PR when it was first being written and reviewed; I only discovered the change a month or two ago, by accident. If I had, I'd have said something like:

    This is definitely a major improvement over the old API, that fixes some important issues, which is awesome. But, it only fixes some of the issues, not all of them, and it's really difficult to change things in asyncio after they ship. So there's a tricky question: do you want to ship this now so users can start taking advantage of its improvements immediately, or do you want to wait to make sure you don't have to do multiple rounds of changes?

    Of course, now we're in a situation where it's already merged, which makes things more awkward. But maybe it will clarify things a bit to do a thought experiment: if the asyncio.Stream API was a PR that hadn't been merged yet and we were considering merging it – would you hit the green merge button, given what we know now, or would you hold off for 3.9?

    asvetlov commented 5 years ago

    There is an alternative proposal: let's deprecate and eventually remove streams API entirely (for sockets, pipes, and subprocesses). By this, we will never have a chance to conflict with trio.

    Another option is removing asyncio at all (again to never conflict with trio and possible future awesome libraries) but the ship has sailed maybe.

    rhettinger commented 5 years ago

    Reversion is cheap. Published APIs are hard to take back.

    If the plan is to deprecate, then anything added for 3.8 should be reverted. There's no point in publishing a new API only to take it away.

    njsmith commented 5 years ago

    That seems a bit extreme, and I don't think conflicts with Trio are the most important motivation here. (And they shouldn't be.) But, we should probably write down what the motivations actually are.

    Yury might have a different perspective, but to me the challenge of the asyncio stream API is that it's trying to do three things at once:

    1. Provide a generic interface for representing byte streams
    2. Provide useful higher-level tools for working with byte streams, like readuntil and TLS
    3. Adapt the protocol/transports world into the async/await world

    These are all important things that should exist. The problem is that asyncio.Stream tries to do all three at once in a single class, which makes them tightly coupled.

    *If* we're going to have a single class that does all those things, then I think the new asyncio.Stream code does that about as well as it can be done.

    The alternative is to split up those responsibilities: solve (1) by defining some simple ABCs to represent a generic stream, that are optimized to be easy to implement for lots of different stream types, solve (2) by building higher-level tools that work against the ABC interface so they work on any stream implementation, and then once that's done it should be pretty easy to solve (3) by implementing the new ABC for protocol/transports.

    By splitting things up this way, I think we can do a better job at solving all the problems with fewer compromises. For example, there are lots of third-party libraries that use asyncio and want to expose some kind of stream object, like HTTP libraries, SSH libraries, SOCKS libraries, etc., but the current design makes this difficult. Also, building streams on top of protocols/transports adds unnecessary runtime overhead.

    A further bonus is that (1) and (2) aren't tied to any async library, so it's possible to borrow the work that Trio's been doing on them, and to potentially share this part of the code between Trio and asyncio.

    So that all sounds pretty cool, but what does it have to do with shipping asyncio.Stream in 3.8? The concern is that right now asyncio has two APIs for working with byte streams (protocols/transports + StreamReader/StreamWriter); then 3.8 adds asyncio.Stream; and then the design I'm describing will make a *fourth, which is a lot. I think we need something *like asyncio.Stream in the mix, for compatibility and bridging between the two worlds, but right now it's not quite clear how that should look, and pushing it off to 3.9 would give us time to figure out the details for how these things can all fit together best.

    njsmith commented 5 years ago

    I think we need something *like* asyncio.Stream in the mix, for compatibility and bridging between the two worlds, but right now it's not quite clear how that should look, and pushing it off to 3.9 would give us time to figure out the details for how these things can all fit together best.

    Sorry, this was unclear. The "two worlds" I mean here are the current asyncio landscape and a future where we have standardized composable ABCs.

    gvanrossum commented 5 years ago

    I am going to recommend sticking to the status quo, i.e. Andrew's improvements to asyncio.Stream should stay. The rest of this message is an elaboration.

    The new perfect composable Streams design is just that -- a design. Many things could go wrong in the implementation. Once it exists as a 3rd party add-on and users agree it's better we *may* decide to deprecate asyncio.Stream. Or not -- there are many examples in the stdlib of modules for which a better 3rd party alternative exists but which are nevertheless useful and not deprecated.

    Much of the asyncio universe already exists outside the stdlib -- the perfect composable Streams API should probably never be moved into the stdlib (unless it's so perfect that it never needs to change :-).

    Andrew's improvements help current users of asyncio.Stream. If those users eventually want to migrate to the perfect composable Streams API, once it's available, fine. But I don't think we're helping them by not giving them improvements that have already been implemented (and which everyone here agrees are improvements!) right now.

    Users of the current asyncio.Stream have a choice -- they can adopt the improvements in 3.8, or they can wait for the perfect design to be implemented. Everybody's constraints are different. Let's not pretend we already know what they should choose.

    If in the future we end up changing our mind, that's okay. It's happened before, and we've lived.

    asvetlov commented 5 years ago

    I would say that we can reimplement the Stream class on top of new composable streams in the future. I'd love to do it right now but these streams don't exist yet.

    I totally support the idea of new streams design, it looks very attractive. The implementation may take a while and require many labor-hours though.

    asyncio is driven by volunteers, mostly by Yuri and I. We constantly don't implement all our *wishes* at the time on next Python feature-freezing but only a subset of them.

    Maybe the situation will change in the future but I have no strong confidence that we finish the design of the new streams in 3.9.

    Task group is another very required thing. If I choose between groups and steams I probably invest my time into the former.

    1st1 commented 5 years ago

    Task group is another very required thing. If I choose between groups and steams I probably invest my time into the former.

    We should get them in 3.9. I'm going to be working on the ExceptionGroup PEP today.

    asvetlov commented 5 years ago

    Awesome!

    njsmith commented 5 years ago

    I saw Yury on Saturday, and we spent some time working through the implications of different options here.

    For context: the stream ABCs will be a bit different from most third-party code, because their value is proportional to how many projects adopt them. So some kind of central standardization is especially valuable. And, they'll have lots of implementors + lots of users, so they'll be hard to change regardless of whether we standardize them (e.g. even adding new features will be a breaking change). Part of the reason we've been aggressively iterating on them in Trio is because I know that eventually we need to end up with a minimal core that can never change again, so I want to make sure we find the right one :-).

    So Yury and I are tentatively thinking we'll make some kind of PEP for the 3.9 timescale, probably just for the core ABCs, maybe in the stdlib or maybe just as an informational PEP that we can use to convince people that this is "the python way" (like the WSGI PEP).

    Now, the implications for the asyncio.Stream API in 3.8. This is tricky because we aren't sure of how everything will shake out, so we considered multiple scenarios.

    First decision point: will asyncio.Stream implement the ABCs directly, or will you need some kind of adapter? If we go with an adapter, then there's no conflict between the ABC approach and whatever we do in 3.8, because the adapter can always fix things up later. But, it might be nicer to eventually have asyncio.Stream implement the ABC directly, so users can use the recommended API directly without extra fuss, so let's consider that too.

    Next decision point: will the byte stream ABC have an __aiter that yields chunks? We're pretty sure this is the only place where the ABC *might* conflict with the asyncio.Stream interface. And as Josh Oreman pointed out in the Trio issue thread, we could even avoid this by making the ABC's chunk-iterator be a method like .chunks() instead of naming it __aiter.

    So even with the current asyncio.Stream, there are two potentially workable options. But, they do involve *some compromises, so what would happen if we wanted asyncio.Stream to implement the ABC directly without and adapter, *and the ABC uses __aiter__? We can't do that with the current asyncio.Stream. Are there any tweaks we'd want to make to 3.8 to keep our options open here?

    The obvious change would be to leave out __aiter__ from asyncio.Stream in 3.8. That would leave all our options open. For sockets this would be easy, because the old functions are still there and still returning StreamReader/StreamWriter objects. For 3.8, we're adding a bunch of new Stream-based APIs, and users won't encounter a Stream until they switch to those. (The new APIs are: connect, connect_read_pipe, connect_write_pipe, connect_unix, StreamServer, UnixStreamServer). The problem is the subprocess functions (create_subprocess_exec, create_subprocess_shell), since they've been changed *in place* to return asyncio.Stream instead of StreamReader/StreamWriter.

    We considered the possibility of migrating the existing subprocess functions to a different __aiter__ implementation via a deprecation period, but concluded that this wasn't really workable. So if we want to go down this branch of the decision tree, then 3.8 would have to leave createsubprocess{exec,shell} as using StreamReader/StreamWriter, and in either 3.8 or 3.9 we'd have to add new subprocess functions that use Stream, like we did for sockets.

    Doing this for subprocesses is a bit more complicated than for sockets, because subprocesses have a Process object that holds the stream objects and interacts with them. But looking at the code, I don't see any real blockers.

    If we completely separated the old StreamReader/StreamWriter functions from the new Stream functions, then it would also have another advantage: we could clean up several issues with Stream that are only needed for compatibility with the old APIs. In particular, we could get rid of the optional-await hacks on 'write' and 'close', and turn them into regular coroutines.

    So I guess this is the real question we have to answer now. Which of these do we pick?

    Option 1: Keep the Stream code as-is, and accept that using asyncio.Stream with future ABC-based code will require some compromises

    Option 2: Create new functions for spawning subprocesses; revert createsubprocess{exec,shell} to use StreamReader/StreamWriter; take advantage of the break between old and new APIs to clean up Stream in general; take advantage of that cleanup to remove __aiter__ so the the future ABC-based code won't have to compromise.

    I think this is one of those good problems to have, where really we could live with either option :-). Still, we should pick one on purpose instead of by accident.

    Yury's preference was "option 1", because he feels compromises for the ABC design aren't that bad, and adding and documenting new subprocess APIs is expensive.

    I'm leaning towards "option 2", because we're already paying the cost of adding and documenting a ton of new APIs for sockets. Given that, adding a few more to that list doesn't add that much cost, and it lets us get more value from *all* of the new APIs, by letting us shed more of the legacy backcompat constraints. Right now we don't have any path to fixing 'write'/'close' at all; this would give us one. I'd also be willing to put in some time to actually implement this if that turns out to be the limiting factor.

    njsmith commented 5 years ago

    BTW Andrew, while writing that I realized that there's also overlap between your new Server classes and Trio's ABC for servers, and I think it'd be interesting to chat about how they compare maybe? But it's not relevant to this issue, so maybe gitter or another issue or something?

    1st1 commented 5 years ago

    Nathaniel, thanks for the summary!

    A few comments to address some points:

    So Yury and I are tentatively thinking we'll make some kind of PEP for the 3.9 timescale, probably just for the core ABCs

    Yes! At the very least for things like asynchronous version of "closable", e.g. an object with an "aclose()" coroutine method. I'm sure there are some other straightforward design patterns that we can codify. Maybe we can do that for streams too -- see some thoughts below.

    First decision point: will asyncio.Stream implement the ABCs directly, or will you need some kind of adapter?

    I'd love asyncio.Stream to implement the ABCs directly. The only problem is that Trio isn't yet settled on the design of those ABCs and we need to make some decisions for asyncio *now*.

    I hope that the Trio project can minimize the number of methods they want to add to those ABCs so that we don't need to duplicate a lot of functionality in asyncio.Stream. E.g. in the new asyncio.Stream there's a Stream.write() coroutine; in Trio it's Stream.send_all(). I'm not entirely convinced that "send_all()" is a good name, for example, even though I now understand the motivation. We can discuss that later in a relevant issue though.

    Another important point to consider: if the new Trio Stream ABCs are *significantly different* from asyncio.Stream and would require us to alias too many methods or to do heavy refactoring and deprecations, then Trio will have to show some real world usage and traction of its APIs first.

    If we completely separated the old StreamReader/StreamWriter functions from the new Stream functions, then it would also have another advantage: we could clean up several issues with Stream that are only needed for compatibility with the old APIs. In particular, we could get rid of the optional-await hacks on 'write' and 'close', and turn them into regular coroutines.

    We'd like to avoid that and have one asyncio.Stream class in asyncio. Using legacy StreamReader/StreamWriter functions for subprocesses alone (long term) isn't a solution for us, since there're real problems with .write() and .close() not being awaitables. Sticking to having a new asyncio.Stream API and old StreamReader/StreamWriter for subprocesses isn't an acceptable solution either. We'd like to minimize the API surface that asyncio users have to deal with.

    The obvious change would be to leave out __aiter__ from asyncio.Stream in 3.8.

    Option 1: Keep the Stream code as-is, and accept that using asyncio.Stream with future ABC-based code will require some compromises

    Nathaniel, I think it's important to emphasize that those compromises should be mutual. I'm willing to support changing "Stream.close()" to "Stream.aclose()" and to perhaps alias some methods. We can also implement "Stream.chunks()". But changing the semantics of "__aiter__" is, unfortunately, not on the table, at least for me.

    If Trio doesn't want to change the __aiter__ semantics of its Stream ABC (which is only a *proposal* right now!), then:

    Adding new APIs and deprecating old ones is a huge burden on asyncio maintainers and users. So the "obvious change" for *me* would be using "Stream.chunks()" iterator in Trio. For Trio it's a question of whether the new API is pretty; for asyncio it's a question of how many APIs we need to deprecate/change. I hope you understand my position and why I am strong -1 on the "Option 2".

    Right now we don't have any path to fixing 'write'/'close' at all;

    Andrew and I discussed that this morning. Here's the plan:

    1. We add "aclose()" and "write()" coroutines to the new "asyncio.Stream()". It won't have "wait_closed()" or "drain()" or "close()".

    2. We add a _LegacyStream class derived from the new asyncio.Stream. We will use it for subprocesses. Its "write()" method will return an "OptionallyAwaitable" wrapper that will nudge users to add an await in front of "stdin.write()". _LegacyStream will be completely backwards compatible.

    This path enables us to add a decent new streaming API while retaining consistency and backwards compatibility.

    BTW Andrew, while writing that I realized that there's also overlap between your new Server classes and Trio's ABC for servers, and I think it'd be interesting to chat about how they compare maybe? But it's not relevant to this issue, so maybe gitter or another issue or something?

    If possible please use email/bpo/github. I don't use gitter and I'd like to be part of that discussion (or at least be able to follow it).

    njsmith commented 5 years ago

    I hope that the Trio project can minimize the number of methods they want to add to those ABCs so that we don't need to duplicate a lot of functionality in asyncio.Stream. E.g. in the new asyncio.Stream there's a Stream.write() coroutine; in Trio it's Stream.send_all(). I'm not entirely convinced that "send_all()" is a good name, for example, even though I now understand the motivation. We can discuss that later in a relevant issue though.

    Yeah, we definitely need to bikeshed the method names but we should do that separately. The actual methods are mostly settled though, and designed to be as minimal as possible. The complete set is:

    And we'll provide standard implementations for iteration and __aenter/aexit__.

    Another important point to consider: if the new Trio Stream ABCs are *significantly different* from asyncio.Stream and would require us to alias too many methods or to do heavy refactoring and deprecations, then Trio will have to show some real world usage and traction of its APIs first.

    It sounds like the only aliasing will be for the send data method; everything else either has different semantics, or has both the same semantics and the same name.* And there won't be any refactoring needed; the whole goal of this design is to make sure that any reasonable stream implementation can easily provide these methods :-).

    *I was going to say the "send EOF" method would also be potentially aliased, but send EOF should be async, because e.g. on a TLS stream it has to send data, and Stream.write_eof is sync. Or maybe we should migrate Stream.write_eof to become async too?

    Nathaniel, I think it's important to emphasize that those compromises should be mutual. I'm willing to support changing "Stream.close()" to "Stream.aclose()" and to perhaps alias some methods. We can also implement "Stream.chunks()". But changing the semantics of "__aiter__" is, unfortunately, not on the table, at least for me.

    Let's put __aiter aside for a moment, and just think about what's best for asyncio itself. And if that turns out to include breaking compatibility between Stream and StreamReader/StreamWriter, then we can go back to the discussion about __aiter :-)

    Here's the plan:

    1. We add "aclose()" and "write()" coroutines to the new "asyncio.Stream()". It won't have "wait_closed()" or "drain()" or "close()".

    2. We add a _LegacyStream class derived from the new asyncio.Stream. We will use it for subprocesses. Its "write()" method will return an "OptionallyAwaitable" wrapper that will nudge users to add an await in front of "stdin.write()". _LegacyStream will be completely backwards compatible.

    This path enables us to add a decent new streaming API while retaining consistency and backwards compatibility.

    I don't think this is a terrible plan or anything like that. But I'm still confused about why you think it's better than adding new subprocess spawn functions. IIUC, your goal is to make the documentation and deprecations as simple as possible.

    If we add two new subprocess functions, then the documentation/deprecations look like this:

    OTOH, with your proposal, we also have a set of deprecations and migrations to do:

    In addition, deprecating a function is straightforward and well understood: you just issue a deprecation warning, and tools can automatically show the source of the problem and suppress duplicates, or convert it into an error. But deprecating call-without-await is messy – you have to issue the warning from inside __del__, so you can't easily find the source of the problem, suppress duplicates, or convert it into an error.

    So like... both of these approaches are definitely possible, but to me it seems like if you look at it holistically, your approach is actually making the documentation and deprecations *more* complicated, not less.

    1st1 commented 5 years ago

    So like... both of these approaches are definitely possible, but to me it seems like if you look at it holistically, your approach is actually making the documentation and deprecations *more* complicated, not less.

    I think Nathaniel might have a point here. Andrew, Guido, what do you think?

    aeros commented 5 years ago

    We have to document that Process objects use a third kind of stream object that doesn't match either the old or new APIs, and how this one works

    From my perspective, this point would have the largest user learning cost due to the stream object having a completely different API. As a result, I'm significantly more in favor of adding the two new subprocess functions.

    1st1 commented 5 years ago

    I slept on this and discussed this issue privately with a few non-involved people who use asyncio daily.

    My final opinion on this issue: we must revert the new streaming API from asyncio and take our time to design it properly. I don't like what we have in the 3.8 branch right now. I don't feel comfortable rushing decisions and doing last minute API changes.

    Andrew, do you want me to submit a PR or you can do it?

    aeros commented 5 years ago

    Andrew, do you want me to submit a PR or you can do it?

    Since this has been elevated to a release blocker, I wouldn't mind helping to revert this ASAP. I can open a PR to fix it today.

    1st1 commented 5 years ago

    Since this has been elevated to a release blocker, I wouldn't mind helping to revert this ASAP. I can open a PR to fix it today.

    Sure, by all means, any help would be hugely appreciated. Thank you, Kyle.

    You'll need to be careful to only revert the new functions & the asyncio.Stream class. Also the new docs. Due to proximity to the deadline, please be prepared that we might need to abandon your pull request if it's not ready by Sunday morning. In which case Andrew or I will do this ourselves.

    aeros commented 5 years ago

    You'll need to be careful to only revert the new functions & the asyncio.Stream class.

    So far the trickiest part has proven to be the tests (specifically test_streams.py) and keeping the deprecation warning for passing explicit loop arguments. I've had to be careful to be certain that no tests were unintentionally removed.

    Due to proximity to the deadline, please be prepared that we might need to abandon your pull request if it's not ready by Sunday morning

    No problem, I'll make sure to allow anyone with write access (yourself and Andrew) to edit the PR I open directly to make any needed changes. That way, at least any progress I make on it can help reduce the amount of work for you two.

    aeros commented 5 years ago

    Currently focusing on the Lib/asyncio/ and Lib/test/ changes. Working on doc changes next, but that should be significantly easier.

    In addition to https://github.com/python/cpython/commit/23b4b697e5b6cc897696f9c0288c187d2d24bff2 (main commit from Andrew that added asyncio.Stream and new functions), I've also had to remove https://github.com/python/cpython/commit/4cdbc452ce3 (minor asyncio test change from Pablo) due to it causing issues with the other tests from deleting asyncio.StreamReader and asyncio.StreamWriter.

    1st1 commented 5 years ago

    New changeset 6758e6e12a71ef5530146161881f88df1fa43382 by Yury Selivanov in branch 'master': bpo-38242: Revert "bpo-36889: Merge asyncio streams (GH-13251)" (bpo-16482) https://github.com/python/cpython/commit/6758e6e12a71ef5530146161881f88df1fa43382

    1st1 commented 5 years ago

    New changeset 1c19d656a79a00f58361ceb61c0a6d1faf90c686 by Yury Selivanov in branch '3.8': bpo-38242: Revert "bpo-36889: Merge asyncio streams (GH-13251)" (bpo-16482) (bpo-16485) https://github.com/python/cpython/commit/1c19d656a79a00f58361ceb61c0a6d1faf90c686

    1st1 commented 5 years ago

    I've reverted the code. Andrew, would really appreciate if you could quickly do a post commit review.

    aeros commented 5 years ago

    I've reverted the code. Andrew, would really appreciate if you could quickly do a post commit review.

    Oops, I'll reopen it.

    aiven-anton commented 1 year ago

    Has there been any follow-up work on the discussions here, around implementing a layered byte streaming API? Are there any public discussions one can follow?

    gvanrossum commented 1 year ago

    Sorry, I think it just died after the revert.

    aiven-anton commented 1 year ago

    @gvanrossum Gotcha, thanks for the update. Not a problem really, just needed to make sure I'm taking an informed design decision in integrating against StreamReader/StreamWriter for now.