Actor-system related questions

uazu commented 3 years ago

Brief summary

Some stuff can't be written in a blocking model, even a non-blocking blocking model like async/await. By blocking I mean that when you do an async call, your coroutine blocks, and the called object is also blocked on that one piece of work until it returns asynchronously. An obvious example that doesn't fit this is a network stack layer where events come in from both below and above and also from timers. All these events have to be responded to immediately. Blocking (or logically blocking) just won't work.

Doing some kind of a "select" on the calling side solves the "only one outgoing call" problem, and the "being called blocks an object" (i.e. "only one incoming call") problem can probably be solved by having multiple proxy objects for your main object, so you don't block the main object. But this is all a very round-about way of getting the required behaviour.

So this is where the actor model comes in. I don't know whether you want to discuss the actor model in this review, but the subject keeps on coming back. As the author of Stakker crate, I am very happy to contribute to the discussion if it is of interest. Here are some subjects you might wish to cover in your review:

Different models of actor system in relation to async/await:

Very high-level actor system, i.e. used for cross-machine communication. Sits way above async/await.
Medium-level actor system, i.e. actors implemented immediately above async/await runtime
Low-level actor system, i.e. close-to-the metal actor system, sits below async/await (i.e. low-level actor system acts as an executor)

Impedance mismatch between async/await and actor model:

An actor can have many calls outstanding on it, and also have many calls outstanding on other actors
Async/await only supports one call each way without bringing in extra features
Means that actor systems interfacing to async/await have to deal with this impedance mismatch, i.e. either compromising the actor model (e.g. blocking the whole incoming actor queue whilst a single outgoing async/await call blocks) or adding intermediate actors that wrap an async/await object and queue the calls to that object so that other actors don't have to block

So I guess these are the questions this raises:

How best to handle people who come to async/await trying to solve a problem which really needs a non-blocking actor system?
How best to support people implementing new actor runtimes either above or below async/await?
How best to support interfacing between actor systems and async/await, i.e. dealing with the impedence mismatch?

For example, could we make async/await suitable for actor-like tasks? The fundamental problem is that the state self is locked during the .await. If more than one coroutine could access self at the same time (i.e. interleaved at yield points) then the problem of blocking the actor queue would be solved. (If this could be done with only static checks, i.e. no runtime RefCells or whatever, so much the better.) However maybe this is just completely incompatible with the async/await model, so it is just not possible. So an external actor system is the only way to handle these kinds of problems.

For example, stuff of interest related to async/await for my own low-level actor system (Stakker):

Since this plans to act as an executor to interface to the async/await ecosystem, the executor-independent interface is of great interest, e.g. common traits and other means for executor-independent async/await code to talk to executors
To implement actor coroutines with low overhead, it needs an 'until_next_yield lifetime in async/await in order to safely switch in and out references to self and the context. Or alternatively completion of the existing plans for Rust generators. This is allow several actor coroutines to efficiently interleave access to the same shared actor state.

Optional details

Which character(s) would be the best fit and why? Niklaus: new programmer from an unconventional background
What are the key points or morals to emphasize?
- Need to guide people who have a problem to solve that isn't easily solvable with async/await.
- Need to focus on executor-independent interop between executors and async/await to grow the executor ecosystem, e.g. to allow actor system-based executors
- Need to consider whether it's possible to smooth the interop between actor model and async/await model

Tell me if you want me to write this up, i.e. whether this (or any parts of it) are subject areas of interest, and where in your framework for this review it should fit.

uazu commented 3 years ago

I think the best way I can contribute right now is try to add async/await support to Stakker, with Stakker working as an executor. (This was already planned.) Then write it up as a status quo, I guess for some fictional runtime, assuming I get enough of it done within the time limits for this review. Since Stakker was implemented before async/await stabilised, it wasn't designed around the same assumptions, so it should be a reasonable test-case.

One question I have is: How big is the executor-independent async/await ecosystem that I can expect to be able to interface to? If there was (in future) a way for crates to advertise that they support running (partially or fully) across runtimes, e.g. some tag or fixed phrase, then that would be useful.

Stuff I should look at supporting:

std::future::Future
futures crate
futures-lite crate
Tokio AsyncRead, AsyncWrite?
async_executors crate interface
agnostik crate interface

Anything else?

nikomatsakis commented 3 years ago

I'm trying to think how to turn this into a story -- I'd love to read more about it.

uazu commented 3 years ago

I have a couple of suggestions for stories:

"Alan creates a deadlock". Because there's effectively a lock whilst an async call is made (i.e. whilst awaiting), there is also a possibility of a deadlock, maybe in combination with channels or something. I'm not familiar enough with coding for async/await, but maybe an example can be constructed.
"Alan makes a reentrant call". Are these possible in general under async/await? Reading some of the docs I hear of people making a call to tokio and it panicking because they are already in a call within tokio, e.g. block_on or something. Again I'm not familiar enough to think of a good example.

I don't want to sound critical, since I've been coding with the never-blocking actor model for 10 years and it's natural to me. Sequential coding is much more familiar: just look at the relative popularity of Go vs Pony. So I can totally understand the motivation for Rust's async/await. But maybe these are some of the trade-offs for that familiarity. In the first 3-4 pages of my Stakker design notes I briefly go over how I ended up back with the actor model again, even though I was trying something different in Rust. The main thing is never-blocking, so there can't ever be a deadlock, and the borrow checker forcing shallow stacks, which means it's impossible to construct code with a reentrant call. So I guess whatever I do, I keep coming "home" to the actor model, although that isn't planned.

I don't think shallow stacks are the only way, though, but to avoid reentrant calls you need a way to defer a call to a queue. The actor model gives you the wiggle-room to do that, because all the inter-actor calls are defined as async.

Regarding having N calls outstanding at the same time from an actor (or to an actor), I'm not sure how to make that into a story. I know that it really suits some areas of application, although perhaps it's not necessary for most. So you can fire off N calls (or equivalently send N messages) and trust that you'll get the responses back at some point. (Stakker guarantees that you'll get a response even if the called actor fails and the Ret is dropped).

The alternative coroutine model which I hope to implement for Stakker, which I've called actor coroutines, also seems hard to turn into a story. I've been coding a long while using this model in Lua. This gives a sequential coding model, but for actors. The coroutine has direct access to the actor state (i.e. Self) when it runs, and can only live as long as the actor. Logically the 'resumes' are driven (behind the scenes) by "messages" received by the actor (although in reality these are just FnOnces on a queue like everything else). One significant difference to async/await is that since this is a never-blocking system, when the coroutine yields, normal actor behaviours and other actor coroutines for the same actor may run. So the coroutine has to give up their &mut Self reference on each yield. Some more notes here. (This is giving me some difficulties to implement on top of async/await, since I want to avoid dynamic checks.)

The other story which I hope to contribute to is:

"Grace tries to interface her low-level runtime to the async/await ecosystem".

This is pretty well-defined already. Someone at some point has to decide that some of these interfaces are mature enough and well-enough tested to pull into the standard library. I don't have a high-enough perspective to judge that, but perhaps I can give some more data. I'm looking forward to getting into the detail of this.

nikomatsakis commented 3 years ago

Something I've been thinking over -- that seems to be a latent theme in a few stories -- is "environmental state". That is, having access to some shared resources which are "released to the wild" during an await.

If we make this an &mut parameter in async await today, those resources get captured by the future, which isn't really what people want. You can do an Arc but then you need mutexes or ref-cells and that's not especially ergonomic. You really don't want the ref-cell to be locked over an await, either.

I have to go digging, I feel like I've seen echoes of this theme in a few places.

It seems relevant to actors because I imagine the actor's state itself kind of fits this.

uazu commented 3 years ago

As I understand it, the requirement has been captured for generators in this issue.

uazu commented 3 years ago

I've documented the first part of implementing an executor on top of my actor runtime: https://uazu.github.io/blog/20210406.html

Maybe the different perspective might be interesting. I consider using GhostCell with futures. I don't have any conclusions yet as I've only done the basic stuff so far.

matklad commented 3 years ago

I want to add that “some stuff can't be written in a blocking model, even a non-blocking blocking model like async/await.” is a somewhat profound fact, which isn’t really widely known. For example, only this year I was able to put my finger on a specific problem:

https://matklad.github.io/2021/04/26/concurrent-expression-problem.html

(Stakker design notes were instrumental to my understanding, thanks!)

uazu commented 3 years ago

For some reason people want to write Go in Rust, so they're just going to have to learn the hard way all the places where that's not a good idea. Yes, it is convenient for a certain class of problem. For other problems they are going to find it very hard to find a clean solution, and perhaps wonder why. Layering improvised actors on top of channels and an async/await runtime seems a very poor workaround to me, with its own unique problems that a low-level actor model system doesn't have. I can't be blogging all the time to communicate this, although I might try again at some point.

I haven't taken the async/await work for Stakker any further yet because I have another unrelated open-source crate that I'm trying to complete and get out the door in spare moments.

rust-lang / wg-async

Actor-system related questions #90

Brief summary

Optional details