malinoff / amqproto

Apache License 2.0
22 stars 2 forks source link

Can we avoid using Futures in the core? #3

Closed malinoff closed 7 years ago

malinoff commented 7 years ago

Currently Future objects are in the core to provide a number of very convenient behaviors:

  1. It is a synchronization mechanism: AMQP methods that have replies simply own a Future instance that is being set in reply handlers when replies arrive.
  2. It is a callback-based API replacement: the caller has interest in responding to certain events, like handling an incoming message after channel.basic_consume, and instead of providing a callback to basic_consume the caller simply awaits the returned Future (the exact method of awaiting depends on the actual I/O implementation).
  3. It is a way to have a very concise and boilerplate-less I/O adapters implementations: for instance, amqproto.Channel.basic_consume method cannot be defined as async method (for obvious reasons). If basic_consume does not return something that can be awaited, then we would need to basically re-define all (34) AMQP methods in the asyncio adapter and either mark them as async methods or return awaitables from them. Which is not appropriate, in my opinion.

Is there a way to keep all three conveniences and remove the use of Future objects in the core?

Lukasa commented 7 years ago

So I think all of these behaviours represent convenience wrappers around the core basic idea of the protocol: certainly 1 and 2 do. In both cases what you're trying to do is provide a data structure that can be used to signal a state transition to the user of the library, much like you would do with a callback. If we map back to HTTP/1.1 for a moment (just because it's simpler), you could imagine a sans-io style API that looks like this:

def send_request(self, method: str, headers: Dict[str, str], body: Optional[bytes]) -> Future[Response]:

This is certainly a convenient API, but it has some complexities. Firstly, you don't want a generic Future, you need some kind of multi-Future that knows how to interact with the concurrency primitives of the framework in which it runs. In a threaded context you want a condition-variable based Future, in a Twisted context you want a Deferred-alike, in an asyncio context you want an asyncio.Future-alike, and so-on. This distinction is important even in the similar cases like Twisted vs asyncio (whose idea of what a Future looks like differ quite profoundly in their internals), and gets much more so when you consider substantively different concurrency models (e.g. Twisted vs curio vs threading). Such a Future is a complex beast, and so if you can avoid writing one you probably should.

Secondly, it requires a great deal of additional state tracking. You will essentially need to keep either weak or strong references to all Futures that your library emits. These references incur memory overhead and garbage collection pressure. They also require your user to persist references to them as well, either on stacks or somewhere in some collection data structure.

Thirdly, it requires both you and your user to think much more carefully about re-entrancy in the code. Written this way, whenever data is passed to a data_received type method some number of callbacks will be invoked. These callbacks will re-enter user code, and may further re-entrantly enter your own sans-io code if the user's callback invokes your library. That means you must very carefully police when you invoke the Future with a result, because at that point you are delegating authority to the user. At this point a number of things may happen, including exceptions bubbling up or the user calling back into your library. You need to ensure that it is safe for this code to re-enter. Speaking as someone who has had to carefully debug unexpectedly re-entrant code in Twisted in the past, it is unlikely that both you and your users will correctly expect this and ensure that your code is in an appropriate state to handle the re-entrancy. The safest way to do this is to have your data_received method ingest the data, do all the appropriate data processing, and only right before you're about to process the next chunk of data should you invoke any appropriate Future objects. However, once you follow that model you get into the most important objection, which is...

Callbacks and Futures is a higher level abstraction. The lowest-level, least-convenient API for this is the one used by h11 and h2, which is to do this:

def data_received(self, data:bytes) -> None:
    pass

def next_event(self) -> Optional[Event]:
    pass

(As a side note, h2 actually merges these two into one function, but there is no reason to and h11's separation of these concerns is probably a bit nicer.)

In this model, the sans-io library processes data when next_event is called, and it processes data only up until something happens. This something is chosen by the library: here are the lists of somethings for h11 and for h2. Once this something happens it is synchronously returned from this function. No awaiting, no futures, no callbacks, just the simplest possible control flow for imperative code: call a function, get a return value.

What happens with this return value is then entirely up to the user. Low-level users may choose to handle all of these things inline, but users who have concurrency frameworks they want to integrate this may want to map from these events to Futures or Protocols or whatever abstraction their framework likes best. The key here is that this is not the responsibility of your sans-io library: it's simply not your job to decide how best to signal the completion of some event. That's a matter for the adapter.

Here is an example of doing this with HTTP/2 in Twisted. Notice that we translate these events immediately into callbacks of various kinds, some of which end up firing of Deferreds registered by users. What's important here is that I did not have to shim anything into a Twisted-specific model, because the lowest common denominator of computation is calling a function and synchronously getting a response.

The key lesson here is that you cannot write a Future that will satisfy all possible use-cases, and so I recommend not bothering at all. Instead, write something that can easily be translated into whatever use-cases you have. By all means, your sans-io library can also ship helpers for concurrency frameworks you like that return things more useful to those frameworks (e.g. Future objects), but that should not be your primary API: your primary API should be having functions that synchronously return data, and your helpers should be written in terms of that API.

malinoff commented 7 years ago

@Lukasa thanks for your input. First of all, I'd like to clarify my definition of a "user" so we're on the same page. There are two types of users: the end-users, who want to publish a message into rabbitmq, and the developers, who may want to implement an I/O adapter or to hack on the internals of the library. When I say "user" I mean an "end-user", not a developer of an I/O adapter. And when there's a trade-off between happiness of a user and a I/O adapter developer, I choose the former. Also please don't be offended - I'm not here to offend you - if my replies look aggressive or too "decision-made"y, it's not true. I'm willing to explore ways of how to improve the implementation. Now, let's get back to your input:

you don't want a generic Future, you need some kind of multi-Future that knows how to interact with the concurrency primitives of the framework in which it runs

Indeed. That's why each adapter has a way to specify its own Future-like implementation, which knows how to deal with a specific I/O framework. The core itself does not expect any specific behavior from a future object implementation except for the public api as defined in concurrent.futures.Future. Having said that, the sans-io api example you showed should've looked more like this:

def send_request(self, method: str, headers: Dict[str, str], body: Optional[bytes]) -> AbstractFuture[Response]:

(if only we have AbstractFuture in the standard library :) )

Such a Future is a complex beast, and so if you can avoid writing one you probably should.

Totally agree. That's why my first intent was to re-use existing implementations (asyncio.Future, twisted.internet.defer.Deferred, curio.Task, etc) in a compatible way.

You will essentially need to keep either weak or strong references to all Futures that your library emits.

Could you please elaborate on that? Currently I have a single future object at any given time. As I described, this future is used to synchronize server replies with the caller's expectations and it's being re-instantiated every time a response is received. Due to how AMQP works, there's nothing that can happen in between and there's nothing that may need that future (to call .set_result on it or to await it).

it requires both you and your user to think much more carefully about re-entrancy in the code.

I also don't understand this. The library itself does not add nor invoke any callbacks attached to the future object. Callbacks processing is entirely up to the user. Yes, of course, when the callbacks are processed lots of things can happen, including exceptions bubbling. But that is also a responsibility of the user - how to handle such situations. For example, users can ignore them completely and let the adapter's __exit__ and __aexit__ context managers methods to deal with such situations appropriately (for example, to properly close the connection).

The key here is that this is not the responsibility of your sans-io library: it's simply not your job to decide how best to signal the completion of some event. That's a matter for the adapter.

It may be, of course. But I think it's my job to give an easy and convenient way to actually implement I/O adapters for the library, while also keeping the api concise. That's more about the 3rd point (which you haven't addressed as I can see; please point me if I'm wrong).

I would really like to see what you're describing in action, but I can only google examples of not really convenient apis. For example, aioh2 example client. There's just too much internal machinery exposed, for a GET request. Or twisted client example from the hyper docs. I don't want my users to implement a twisted's Protocol every time they need to send a HEAD request.

Lukasa commented 7 years ago

There are two types of users: the end-users, who want to publish a message into rabbitmq, and the developers, who may want to implement an I/O adapter or to hack on the internals of the library.

Fab, so when I said "user" I consistently meant it in the second case. It should be noted that I believe very strongly in having a hierarchy of interfaces. Users in the sense in which you mean the word should not be writing to low-level protocol interfaces like the ones we're discussing.

That's why my first intent was to re-use existing implementations (asyncio.Future, twisted.internet.defer.Deferred, curio.Task, etc) in a compatible way.

So my proposal is that this should be a higher-level concern: that you should build the synchronous, non-Future-using API first, and then write translations from that to code that uses Future objects if you believe that's the higher-level API you want to expose. That way, developers that want to integrate your AMQP library without using Future objects (e.g. to write it as a Twisted Protocol) can do so, while those that want a more await focused API can have it.

Due to how AMQP works, there's nothing that can happen in between and there's nothing that may need that future

In that case, why should the Future be the low-level interface? My assumption is that you return the Future, then as the result of some bytes being fed in by your user you fire the Future. It seems to me the lowest level abstraction is just to say "a response arrived". Firing a callback or actualizing a Future seems like a higher-level action that can be taken in response to such a thing. No need to add the burden of a managed callback for users that don't want it.

I also don't understand this. The library itself does not add nor invoke any callbacks attached to the future object. Callbacks processing is entirely up to the user.

Sure, but the developer can call your library from within the Future processing. As an example, consider a Twisted Deferred. A Deferred fires its callbacks synchronously, which means that when Deferred.callback is called the callbacks registered by the developer will fire immediately. Those callbacks may in turn invoke your library again synchronously inside the call to callback. This is a design problem that proliferates in asynchronous code: any time you invoke code that is provided by a third-party you must be prepared for that code to invoke your library re-entrantly.

This is why I suggest against using Future or Future-like objects at the lowest-level: you need to be very cautious about when you fire those callbacks to ensure that your library can tolerate a re-entrant call at that time. The advantage of returning from functions is that generally you have completed all state changes you need to perform at the time of the function return.

It may be, of course. But I think it's my job to give an easy and convenient way to actually implement I/O adapters for the library, while also keeping the api concise.

My suggestion is to do both. Write the low-level API that avoids Future objects, then wrap it yourself. That way, developers who want to use Future-based APIs still have access to that concise API, but are not restricted to it.

I don't want my users to implement a twisted's Protocol every time they need to send a HEAD request.

Sure, you don't. But h2 is much lower-level than that. If you're at the stage of saying "I need to send a HTTP/2 HEAD request" then don't reach for h2, reach for requests or treq. The purpose of h2 is to be the tool that the developers of requests and treq reach to when they say "I need to support the HTTP/2 protocol within our higher-level API".

Twisted's server-side HTTP/2 support did not change the interfaces to Twisted's existing HTTP APIs at all: zero changes. One day users were writing code that could only work with HTTP/1.1, the next day it worked with HTTP/1.1 and HTTP/2. But under the covers, Twisted needed the flexibility to work with the protocol at as low a level as possible such that they could fit it into those API constraints.

There are two problems to solve. The first is to provide convenience to your users. That will usually bundle the I/O in with the protocol solution, because I/O is tricky and most users don't want to do it. The second problem is to provide power to developers. That will want to keep the I/O out, such that developers can bring their own I/O approaches to the table. I'm recommending you avoid Future objects in the second case, but feel free to use them in the first.

njsmith commented 7 years ago

That way, developers that want to integrate your AMQP library without using Future objects (e.g. to write it as a Twisted Protocol) can do so, while those that want a more await focused API can have it.

It's not just Twisted who wants to avoid Futures, btw. My library trio goes the other way: it explicitly throws out all forms of Futures and callbacks in favor of pure async/await calls. The motivation here comes from the same blog post where I give the asyncio program with the 3 bugs: later on I diagnose the root cause as asyncio having all kinds of "implicit concurrency". This is basically the same as what Cory is saying about unexpected reentrance. Setting a call back, returning a Future, calling transport.write are all operations that schedule something to happen at some indefinite point in the future, i.e. they're semantically equivalent to spawning a new concurrent task. Concurrent tasks are really useful, supporting them is the whole point of a library like this, but they're also extremely tricky: now you need to worry about how they can be interleaved with your other work, you need to synchronize with them, you need to propagate cancellation, etc. So trio requires some minimal but explicit ritual to spawn a Task, there are constraints on how you can do it, and there aren't any implicit task spawning operations supported.

for instance, amqproto.Channel.basic_consume method cannot be defined as async method (for obvious reasons). If basic_consume does not return something that can be awaited, then we would need to basically re-define all (34) AMQP methods in the asyncio adapter and either mark them as async methods or return awaitables from them.

So this is a common observation in the debate about composition vs inheritance: composition requires more typing up front, and seems to violate DRY. But this isn't really true, because if you have two different classes, then they might share implementation, but by definition they don't share public interfaces, so describing their interfaces separately isn't really repeating yourself. With inheritance, you end up with complex long-distance coupling where modifying one class implicitly modifies the public API of other classes (including ones written by third parties you don't even know about), you have to jump through hoops with things like self.FutureType to encode all the API differences in a single implementation, and you're pressured to try and make the APIs as similar as possible, even if it doesn't really fit. (My experience is that interfaces are always less similar than they first look; there's always these niggling details that you don't think of when looking from 10,000 feet.)

With composition, you have some extra typing up front to list all the methods, but python makes it easy to minimize the repetition with some metaprogramming, and then when you change things in the future it's much more straightforward.

malinoff commented 7 years ago

@Lukasa, @njsmith thank you for your explanations and examples. I've managed to avoid using Future objects in the core. Now it's possible to implement other I/O adapters without Future-like objects or wrappers.