Async IO - Githubissues

rust-lang / rfcs

RFCs for changes to Rust

https://rust-lang.github.io/rfcs/

Apache License 2.0

5.95k stars 1.57k forks source link

Async IO #1081

Closed steveklabnik closed 6 years ago

steveklabnik commented 9 years ago

Rust currently only includes synchronous IO in the standard library. Should we also have async IO as well, or leave that to an external crate? What would the design look like?

Moved from https://github.com/rust-lang/rust/issues/6842

burdges commented 8 years ago

I've a partial answer from @dwrensha to :

Does eventual_io pay any costs beyond gj for its greater abstraction via eventual? It'd be interesting if gj and eventual_io could be merged by making the Cap'n proto RPC use eventual_io or something.

It appears gj wanted single-threaded performance with Rcs while eventual wants multi-thread safety with Arcs and Send bounds. At least this particular difference could be resolved with higher-kinded polymorphism.

dwrensha commented 8 years ago

@burdges: it's not just performance. Sharing Rc data among tasks on a single thread is less error-prone than sharing Arc data among multiple threads, because the yield points are explicit.

burdges commented 8 years ago

Interesting, I briefly search for Rc vs Arc, but found little and didn't try that hard. I probably should've searched the RFCs specifically. Thanks!

icorderi commented 8 years ago

I created the crate await as a placeholder for what we would like to have in the core. I also included a partial survery of what's currently out there --in rust--, and what is built on top of what.

After looking at the abstractions the crates provided and the dependencies between them, it seems we have a crate for each kind of abstraction @chpio enumerated (and there seem to be no arguments against that list).

The first thing that strikes out is that mio, or rather, Event Loops, are definitely an important building block throughout the other crates.

There also seems to be a few "shinny" + "shinny-io" crate pairs going around. Ideally all those "*-io" would become unnecessary with the "proper" core abstractions in place.

contributions are welcome, encouraged and very much needed!

vinipsmaker commented 8 years ago

@icorderi: I too believe that an executor (what you call event loop) is an important first step.

Recently my employer/client wanted to build a library that doesn't spawn hundreds of threads to maintain a few connections and asked me to design a new solution.

I started taking a look on current Rust abstractions. Pure mio is horrible because it doesn't compose. You cannot create different crates abstracting different protocols because they need to collaborate with each other (e.g. token clashing, message types to mio...). If you create crates low-level enough, you need to do too much boilerplate to integrate them.

Abstractions wrapping mio were too incomplete, too intrusive or mixed up too many concepts within a single library (which breaks single responsibility principle and makes unclear to reason about what happens as there are no clear boundaries).

Then I started researching executors design (an executor is to function execution as an allocator is to allocation) and I've liked the design of this particular proposal:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0113r0.html

An executor that allows you to post functions is important to allow you to guarantee that your asynchronous operations initiating function always return immediately. This is a base guarantee to avoid problems like unfairness, jitter, and starvation. An example of such unfairness happening is given in the paper http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0114r0.pdf (4.5 When scheduling logic is embedded in the language).

I started designing/experimenting a mio wrapper that provide an executor, a timer queue and a socket reactor: https://github.com/vinipsmaker/asio-rs/tree/master/examples

My colleague @canndrew was working on a design based on http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4045.pdf to allow a model that can be adapted to any async paradigm (callbacks, futures, coroutines...), but he eventually said he'd need higher-kinded types.

tikue commented 8 years ago

@vinipsmaker (shameless plug) I'm currently working on an async re-write of tarpc built on top of mio. It allows registering multiple different tarpc services on a single event loop, and multiple clients on a single event loop. Here's a contrived example of a client calling a server which subsequently makes 2 calls to another server and waits until both calls are available before sending a reply to the client.

I'd be interested to hear if that suits your needs. It only currently really works on nightly due to some feature flags being flipped on. Hopefully that won't be the case for long.

vinipsmaker commented 8 years ago

@tikue: The model you propose isn't a large improvement over mio's one. I can already use pure mio (if I build a monolith crate and forget about mio's problem of composing interfaces). There are other mio-wrapper and I don't need to build my own. My comment was more in a contribution to a future standardized abstraction within the Rust core library. Within this discussion, I cannot see a hacky way around mio current API as a good solution.

First thing about mio is that it uses a reactive/passive model rather than proactor/active model.

An active model puts the user as the active party. This is an elegant solution to answer questions like "how can I defer acceptance of new connections during server high-load scenarios?"

I also must say from my experience that reactive APIs are harder to design tests for (you need work around to identify which reaction maps to which action from the test).

You can also look at the situation as mio being based on the readiness model and APIs that receive more technical praises based on the completion notification model.

There is also concerns about coroutines integration. Sure, coroutines are not the solution to everything, but many times blocking code is more readable just because you can keep using the language constructs (if, while...) to do control flow. Coroutines are the only solution to asynchronous programming keeping this property that I know of. There is more than one developer who reported to me that Boost.Asio's ability to start an operation within a certain async paradigm (e.g. coroutines) and then move to another model (e.g. callbacks) for the same I/O objects was quite useful.

EDIT:

There's a little inaccuracy around the example of deferring acceptance. mio is capable of that. However, I don't see the same property in your abstraction.

EDIT2:

A little documentation around your documentation would be good too. And I know it's unfair from my part to ask for documentation of your design when I myself have not done the same my design.

tikue commented 8 years ago

@vinipsmaker there's no fundamental reason tarpc can't support deferring acceptance. It's just not a high priority at the moment, and there's a lot of design space to consider. Regarding documentation, feel free to open issues where you believe it's lacking. As the async branch is still undergoing a lot of churn (it hasn't even merged to master yet), writing documentation isn't super-high priority at the moment.

vinipsmaker commented 8 years ago

@tikue: you're right. There is no fundamental reason why tarpc can't support deferring acceptance. My previous argument wasn't good and didn't explain the problem.

The problem is how much detail and customization points you need to put in your abstractions. If it's a reactive interface then you start with a small abstraction and start building complexity atop over time. "_I need a pool to recycle objects, let me add a PoolFactory and a set_factory... now I want to use objects from this array allocated in the stack and no more than X requests being handled at once, let me add... now I want deferred acceptance... and now..._". You'll put an interface for everything you think about and the interface will never be enough. With an active style, the problem is solved and you don't need to do anything. You don't need to justify with "it's not a priority at the moment" because it's done from the moment the abstraction is usable.

The active style also maps very naturally to some other paradigms like futures and coroutines (accept which returns a future or suspend the current coroutine).

golddranks commented 8 years ago

cc @carllerche

burdges commented 8 years ago

It's dubious that anyone understands this design space well enough right now. Are there any proofs of deadlock freedom for a subset of coroutine flexibility? Just because I need complex shutdown logic do I need to forgo the advantages of restricting to futures? etc.

It sounds like the first question should be : What language features do projects like eventual, gj, rotor, and mioco need to better integrate with one another? At minimum, the lack of higher-kinded polymorphism contributed to the fork between eventual and gj, and appears to be limiting others as well.

vinipsmaker commented 8 years ago

It sounds like the first question should be : What language features do projects like eventual, gj, rotor, and mioco need to better integrate with one another?

This would be the executor. Exactly what async IO needs. One thread being shared by multiple tasks. The executor does that, control executions within this thread.

At minimum, the lack of higher-kinded polymorphism contributed to the fork between eventual and gj, and appears to be limiting others as well.

This is my understanding as well.

icorderi commented 8 years ago

@burdges I'm trying to answer exactly that.

"What are the core features that we need to better integrate all this async crates?"

The gist of the current state is:

/// Abstracting the ability to be resumed when something completes
trait Await<T> {
    fn await(ctx: &mut Context) -> T;
}

The Context is an abstraction related to what @vinipsmaker referred to as the "executor". The "executor" is essentially a Context factory that knows how to create Thread's or mioco coroutines or something else.

/// An execution context
///
/// This is a `Thread` on a threaded model, or a **mioco** `Coroutine`
trait Context { 
   // Currently playing with what should go in here...
   // The primitives to do signaling and cross-context communication should be here 
}

Ideally this would be backed by some sugar and allow us to write code like this:

async fn foo_async() -> impl Await<String> {
    let s = await TcpStream::bind_async("example.com:8080");
    let mut buff = [0u8;1024];
    let c = await s.read_async(&mut buff);
    String::from_utf8(buff[..c].to_vec()).unwrap()
}

The desugared form can be based on continuations or a state machine. Where the Context flows through.

/// Example of desugaring the **await** syntax using continuations
pub fn foo_async_desugared<C: Context>(ctx: &mut C) -> impl Await<(String, TcpStream)> {
    TcpStream::bind_async("localhost:8123", ctx)
        .and_then(move |mut s, ctx| {
            let mut buff = [0u8;1024];
            s.read_async(&mut buff, ctx)
                .map(move |c| (String::from_utf8(buff[..c].to_vec()).unwrap(), s), ctx) }, ctx)
}

You can find the current state of this here (docs).

ahicks92 commented 8 years ago

@icorderi Is there a good reason that the syntax sugar for making async convenient should be separate from the syntax for something like a Python generator? As I understand it, generators are something at least some people want (including me, kind of).

I know Python ultimately decided that the answer is yes, but I'm not sure what their justifications were or if they apply to Rust. I knew at one point--I think it was mainly that Python is dynamic and you could accidentally return stuff you shouldn't--but haven't looked at the PEP in forever.

Perhaps they need to desugar to different things for zero cost abstraction.

icorderi commented 8 years ago

@camlorn, yes, the sugar should not be the same, the two are semantically different.

Think of it without the "asynchronous" bit. [exaggerating] You are essentially saying that because Vec's can hold a single value. We should make all functions just return Vec.

An Await<T> is parallel to returning a T, while a Iterator<T> is parallel to Vec<T>.

On the other hand, the desugaring of yield x is quite "similar" to let x = await foo(), in the sense that both would likely come down to using continuations or state machines.

The compiler's ability to break down a function into continuations / states can be used to support both features.

vinipsmaker commented 8 years ago

@icorderi:

/// Abstracting the ability to be resumed when something completes

It's pretty obvious that this is VERY tightly coupled to the concept of coroutines. Therefore, you're not answering the question "What are the core features that we need to better integrate all this async crates?" because you assume coroutines concepts right from the beginning.

While you advocate "awaitable concepts" I advocate "sharing the thread among multiple jobs/tasks/closures/..." (i.e. the executor).

And if you want to propose a coroutine implementation, you do not even need to integrate it with a thread-sharing/executor/... mechanism. The integration should be done by a third piece.

"What are the core features that we need to better integrate all this async crates?"

This is something that Christopher Kohlhoff (Boost.Asio author) has done for C++. He not only provided the foundation, but also provided all these "crates" (callbacks, coroutines, futures and the integration among them). And his model is not too tightly coupled like most would do, but rather extensible and open to new paradigms (you could implement blocking-style if you want or something new you think of). His library is really flexible and you could adopt several models for a project (single-threaded, one executor per thread each processing multiple clients, one thread per client...).

He is now proposing papers to have these abstractions integrated in the core C++ itself and the one thing to notice is that his abstractions are separate abstractions and not a big blob of something that can only be used together. executors, network interfaces, foundation for extensible (i.e. it supports callbacks, futures, coroutines...) async abstractions and stackless coroutines.

If we gonna discuss coroutines, I wish we would discuss coroutines only as these features are building blocks for other abstractions and not a suportive tool you'll only use glued with network code. For instance, in Python you have generators. Some folk from the C++ land are implementing even channels inspired by Go channels on top of coroutine interfaces.

icorderi commented 8 years ago

@vinipsmaker,

It's pretty obvious that this is VERY tightly coupled to the concept of coroutines.

I'll have to strongly disagree with that.

Being able to reason abstractly over a function call that will not return immediately/synchronously does not imply or require coroutines to exist.

We do not need a coroutine to accomplish the following:

async fn foo_async() -> impl Await<String> {
    let s = await TcpStream::bind_async("example.com:8080");
    let mut buff = [0u8;1024];
    let c = await s.read_async(&mut buff);
    String::from_utf8(buff[..c].to_vec()).unwrap()
}

bind_async and read_async operate on top of select/poll constructs. No coroutines involved.

Now, you are obviously not wrong either. If we had the ability to speak abstractly over asynchronous computations, we might as well give them the possibility to be scheduled on an execution context that is not just pure threads. Otherwise, we would be wasting a lot of potential. This is were coroutines come in play.

But I don't expect the std to have coroutines, just to have an abstraction for an execution context (thread, coroutine) so that external libraries can implement concrete models. The std would also impl Context for Thread.

For this to work, we need to make sure that pretty much everything under std::sync can be implemented on top of this Context abstraction (or have the *_async() versions of the blocking calls), so that RWLocks, Mutex, Barrier, channels don't break other implementations of Context by blocking an entire Thread.

With this in place, asyncio can be an independent crate on the nursery implemented on top of an epoll/kqueue crate (quite probably also in the nursery).

mioco can then be written on top of that to provide a CoroutineContext that allows us to run Await's on them.

futures/eventual/gf/etc... can all be written on top of Await and used on either mioco or Thread's, it all depends if you schedule the initial async call on a Thread or on a Coroutine.

vinipsmaker commented 8 years ago

@icorderi:

I quoted "Abstracting the ability to be resumed when something completes" when I stated "it's very tightly coupled to the concept of coroutines". I restate the same phrase again, as you haven't show me how "resumable functions" aren't very related to the concept of coroutines.

And in your example:

async fn foo_async() -> impl Await<String> {
    let s = await TcpStream::bind_async("example.com:8080");
    let mut buff = [0u8;1024];
    let c = await s.read_async(&mut buff);
    String::from_utf8(buff[..c].to_vec()).unwrap()
}

You mentioned bind_async and read_async, but what is interesting is the await construct. What does it do? Suspend function to be resumed later? If this is what it does, then this is a coroutine. So yes, coroutines are involved.

Also, I'm not sure I understand your Context proposal. Could you explain it more, please?

icorderi commented 8 years ago

@vinipsmaker

Take the following two snippets:

std::thread::Thread::spawn_async(foo_async)

mioco::Coroutine::spawn_async(foo_async)

In the Thread snippet, a new Thread gets created to execute foo_async, the await's inside foo_async will park the Thread, and it will stay there until the IO completes. In this case, the code should behave the same as the blocking version:

fn foo_async() -> String {
    let s = TcpStream::bind("example.com:8080").unwrap(); 
    let mut buff = [0u8;1024];
    let c = s.read(&mut buff).unwrap();
    String::from_utf8(buff[..c].to_vec()).unwrap()
}

In the Coroutine snippet, the await's will park the Coroutine, allowing other coroutines to use the thread that freed up. Whenever the await inside foo_async completes, it will unpark the coroutine. Now a few things can happen here, that are all design choices on how mioco wakes up a coroutine. If we assume a cooperative scheduler, then, whenever the current running code in the Thread that was backing the Coroutine yields/pauses/completes, foo_async will resume (assuming no other work was pending scheduling on that coroutine).

Maybe it's subtle, but abstracting a computation that will return at some later point in time (Await), is not the same as user threads (Coroutines).

Although in reality, a world with just Await and no Coroutine's is kinda boring. On the other hand, a world with Coroutines an no Await is the mess we live in right now, where we have no common abstraction to describe something that will return at some point in time.

Yes, Await and Coroutine are related in the sense that, having one without the other is boring or an API mess. But they are not the same concept, nor one, requires the other.

We already have Coroutine's, I'm not proposing a coroutine library, nor that saying that the std should have a default coroutine implementation.

What I'm proposing is to abstract a computation that will return at some later point in time (Await) and to abstract an execution Context. The only execution Context in the std is a Thread, the same, very real 1:1 thread we have today.

We can have a zoo of Coroutine implementations with cooperative scheduling, preemptive scheduling, at w.e. other flavor people want publish on crates.io.

The important thing is that any code written to be async using Await, will run in all the async libraries, and any Await+Send created inside some Coroutine-X crate, can be tossed over to some Coroutine-Y crate and it will just work.

This also allows us to write some async library backed by coroutines that will continue to make forward progress even if someone decides to call a lock on a mutex. Because that lock will freeze the Context which might be a thread or a coroutine. Same thing applies for channels.

This is precisely the goal, to be able to interoperate between all this async libraries.

@vinipsmaker, did it make sense?

vinipsmaker commented 8 years ago

@icorderi:

Thanks for the explanation. Your proposal is much more clear now.

abstracting a computation that will return at some later point in time (Await), is not the same as user threads (Coroutines).

This is what futures & promises are for. In this case, the await is just syntax sugar for that [possibly blocking code]. There is a crate in Rust just for that: https://en.wikipedia.org/wiki/Futures_and_promises

If your proposal is not about futures & promises, could you explain the differences (aside from naming and other irrelevant differences)?

Yes, Await and Coroutine are related in the sense that, having one without the other is boring or an API mess. But they are not the same concept, nor one, requires the other.

Your Await is your Await (i.e. a new concept). You're the authority to say what it is and what it is not. I'm trying to understand what is it and I'm trying to relate your explanation to things I know using more standardized terms than "new" concepts.

Your example:

async fn foo_async() -> impl Await<String> {
    let s = await TcpStream::bind_async("example.com:8080");
    let mut buff = [0u8;1024];
    let c = await s.read_async(&mut buff);
    String::from_utf8(buff[..c].to_vec()).unwrap()
}

There are two possibilities here:

await TcpStream::bind_async("example.com:8080") blocks. Then we're discussing non-async abstractions and I'm on the wrong thread.
await TcpStream::bind_async("example.com:8080") suspends the function (i.e. local variables are saved and this is a coroutine no matter if you're spawning it from another thread or the same thread). Coroutines are subroutines that have two more operations, suspend and resume (and this is what I see here).

Now, your code has await (the keyword) and Await (the "object"/class/...). await is very related to coroutines. Please spend a little more time explaining more if I'm wrong. And I'm aware that this is just syntax sugar to Await and Await should be usable without await.

If you want to pursue this route, you'll need to provide a solution for "futures islands" (futures with multiple implementations cooperating together). I'm aware of none and I'm aware of endless discussions in the Boost mailing list trying to come up with some. And it seems to me that you're trying to unify async code so threads and coroutines could have the same algorithms. Therefore I'm just more inclined to believe that you'll need to solve this problem (futures islands) if you want to follow this route.

All in all, this doesn't solve one single problem: sharing a single thread with multiple jobs/tasks/closures/... . This is the executor. We have mio, but mio alone is a mess to integrate with different crates (token clashing, only one possible implementation of Handler running...).

and to abstract an execution Context

I'm still unsure about what is this "execution Context".

The important thing is that any code written to be async using Await

Not all problems were solved. We still need to "spawn" a few things and now in which threads they'll run. I'm not convinced until these others pieces are described well enough too.

This is precisely the goal, to be able to interoperate between all this async libraries.

Agreed.

A little text of myself covering a little on the topic of asynchronous code (including the Kohlhoff's solution to interoperate multiple async paradigms): https://vinipsmaker.wordpress.com/2015/11/26/multitasking-styles-event-loops-and-asynchronous-programming/

icorderi commented 8 years ago

@vinipsmaker thanks for taking the time to go back and forth on this. I appreciate the discussion and you raise very valid questions. I hope the following long post will provide you with answers.

"abstracting a computation that will return at some later point in time (Await), is not the same as user threads (Coroutines)." - @icorderi

This is what futures & promises are for. In this case, the await is just syntax sugar for that [possibly blocking code]. There is a crate in Rust just for that: https://en.wikipedia.org/wiki/Futures_and_promises

I'll have to disagree with you again. Just because it quacks, and with a nudge in the right direction it swims, it doesn't make it a duck.

A (Future, Promise) pair are different abstractions than an Await + Context.

You can hold a Future, check if it's completed, wait for it to complete. Once it is completed, you can access it all you want. A Promise is used to fulfill / set / send the value to the corresponding Future it was created alongside with.

You can think of an Await as an asynchronous FnOnce, once it's consumed, that's the value it produced. You can't really call it again, it doesn't cache the result for you, it doesn't notify you of completion, you can't ask it if it's done (it will just finish when it finishes).

A Future/Promise pair is closer to a channel() than to an Await.

Off course, if someone gives you a Future and you need the value now, you can let x = await the_future.

@vinipsmaker, when I say abstraction x is not y, I don't mean that you couldn't get to y from an x. I simply mean that semantically, they are not the same concept.

Let's look at some async options, from the user's perspective.

Active

If you just want caching, use a something like Lazy::new<T, A: Await<T>>(a: A) -> Lazy<T>.
If you want tracking of completion (+ caching),
- You have an Await? then use a something like Future::from_async(..) (this does not produce a Promise, the return of the Await is an implicit "set".)
- Someone else will produce the value? then use Future::new() -> Promise.
If you just need the value of an async computation and you can't make forward progress without it then let x = await foo() will do the trick.
if x is not a one-of and you will be getting more x's, use let (tx, rx) = channel()

Passive

if you want to be invoked when something happens / is-ready-for-you, use an Event

@vinipsmaker, I'll try to explain the Context based on the following quote from you:

...it seems to me that you're trying to unify async code so threads and coroutines could have the same algorithms.

That sentence, particularly what I bolded out, is precisely why the Context abstraction exists, and what the Context abstraction is for.

Since we agree that a Thread and a Coroutine are not the same thing. Simply because of that, a Context and a Coroutine cannot be one and the same, because a Thread is also a Context.

Context unifies Threads, Coroutines, etc. under a single abstraction.

By the way, what you state as the "future islands" problem is exactly what gets solved with the Await, Context abstractions.

All in all, this doesn't solve one single problem: sharing a single thread with multiple jobs/tasks/closures/...

Correct, this is not intended to solve that. This is about what basic abstractions should go into the std to support interoperability between different async crates.

I'll maintain my position, I don't believe we need Coroutine's in the std. We just need an abstraction that will allow the community to create different jobs/tasks/closures/Await schedulers. Schedulers that create Contexts to execute this things on.

Not all problems were solved. We still need to "spawn" a few things and now in which threads they'll run. I'm not convinced until these others pieces are described well enough too.

Fair enough.

As far as scheduler choices go, the scheduler choice should be done by the main() crate. That's the guy that is about to consume a big pile of Await stuff and can't really defer the decision any further.

fn main_async() -> impl Await<()> {
    // lots of awaits calling more awaits, spawning even more awaits
}
fn main() {
    ThreadPool::with_capacity(8).run_synchronously(main_async).unwrap();
    // or
    GreenPool::new(8).run_synchronously(main_async).unwrap();
}

In this example, the two, ThreadPool and GreenPool, are very different schedulers, and the Context's they produce are also different things.

A ThreadPool is created with 8 worker threads. A Context is still a very real Thread for that guy. So it could very much run out of Context's if all threads are waiting on something. If you queue 9 things, 1 will wait until another completes.

On the other hand, the GreenPool is created also with 8 backing threads, but their Context would be a GreenThread. You can have a million Await's running in parallel (yes, only 8 will be making progress at any given time). With cooperative scheduling, every time one pauses waiting for something, another one takes a turn.

Note: Libraries (as opposed to binaries) making a choice of a concrete scheduler should be something rare, and they should have a good reason for it (i.e. a library that will have thousands upon thousands of long running Await's would not be happy with the ThreadPool)

@vinipsmaker, I hope the proposed abstractions, how they interact with each other as well as how crates benefit from this to interoperate with each other start making more sense.

jsgf commented 8 years ago

@vinipsmaker I think @icorderi's proposal could be implemented with a plain state machine, where each "function" maps to a state machine represented by a current state variable and a struct/tuple of state-specific variables. In this case, something that appears syntactically as a call is actually the composition of nested state machines.

I think this is a generalization of a coroutine: in a coroutine the state tuple is a current frame, the nesting is the call stack, and the current state is the program counter. But I wouldn't consider the general state machine concept a "coroutine".

glaebhoerl commented 8 years ago

@icorderi I'm curious how this might work on a type system level. E.g. (I see you already gave the definition for the Await trait above), what do you think the body of trait Context might look like, even if just as an example?

icorderi commented 8 years ago

@glaebhoerl, I haven't been able to to play with Context enough to answer that with certainty.

In order for this to work, Context should have the building blocks of what's needed for coding the internals of the std::sync types (at least the awaitable versions).

The design of Context should allow a Coroutine implementation to never truly block, regardless of code being executed by the coroutine (or stuff the code calls) using Locks, Barriers, Receivers, SyncSenders, etc..

For now, I mirrored most of the free functions on std::thread that operate on the current Thread as well as the unpark() from Thread.

/// Execution `Context` for `Await`'ables
pub trait Context {
    /// Atomically makes the handle's token available if it is not already.
    fn unpark(&self);

    /// When _awaited_, blocks unless or until the current context's token is made available.
    /// This method returns immediatly.
    fn park(&mut self);

    /// When _awaited_, blocks unless or until the current context's token is made available
    /// or the specified duration has been reached (may wake spuriously).
    /// This method returns immediatly.
    fn park_timeout(&mut self, dur: Duration);

    /// When _awaited_, puts the current context to sleep for the specified amount of time.
    /// This method returns immediatly.
    fn sleep(&mut self, dur: Duration);

    /// When _awaited_, cooperatively gives up a timeslice to the scheduler.
    /// This method returns immediatly.
    fn yield_now(&mut self);
}

I haven't gotten around to read all the src code from std:sync to get a complete abstraction on my head. An initial dive on the std::sync code shows that it's built on top of sys::common which are thin wrappers on the OS synchronization structures.

This seem to be the base structures the std::sync module is built on:

Those, will block the thread. Which means, the Context abstraction needs to include them somehow.

It could be as trivial as:

pub trait Context {
    fn mutex(&mut self) -> Mutex;
    fn rwlock(&mut self) -> RwLock;
    fn condvar(&mut self) -> Condvar;
    // ...
}

Perhaps, Mutex/etc.. constructors should be able to get the internal implementation from the Context:

impl<T> Mutex<T> {
    /// Creates a new mutex in an unlocked state ready for use.
    #[stable(feature = "rust1", since = "1.0.0")]
    pub fn new(t: T) -> Mutex<T> {
        Mutex {
            inner: box Context::current()::mutex() // StaticMutex::new(),
            data: UnsafeCell::new(t),
        }
    }
}

A good exercise to get a tighter definition for Context is to start a sync2 crate with Mutex and Receiver working with Await. The Thread'ed version should have no extra overhead and end up doing the same thing the current implementations do. A Coroutine version should not ever block the underlaying Thread.

@glaebhoerl, does this help you understand how it would work at the type system level?

glaebhoerl commented 8 years ago

@icorderi That's helpful, thanks.

One thing I suspect would be an issue with that approach is that it would imply distinguishing a closed universe of supported synchronization primitives -- the ones currently in std. Whereas I suspect the attitude of the people involved would be "those are just the ones we chose to include in std, anyone should be able to roll their own and put it up on crates.io", which no longer works so well if they all have to be centrally enumerated up-front in a trait. (This feels like the expression problem.)

But I'll leave that to people who are more knowledgeable about this stuff.

eternaleye commented 8 years ago

One thing I'd like to bring up is the C++ proposal N4453.

It defines an exceedingly low-level primitive, "resumable functions," which are sufficient to implement async/await, promises/futures, or cooperative green threading.

The proposal uses the "resumable" keyword in three places, which could be implemented independently if Rust were to take this approach:

resumable T in function return types, like constexpr T. This declares that the function may suspend itself for later resumption.
break resumable;, the actual suspension point.
resumable T in variable declarations. This declares that the variable holds the result/state of a resumable expression.

However, all of the above are syntax sugar. What is actually required to implement the feature is an interface for resumable callables, much as Fn/FnMut/FnOnce are an interface for callables that do not suspend.

The interface would only require one method:

trait Resumable {
    type Output;
    fn resume(&mut self) -> Option<Output>;
}

This Resumable trait describes the execution state of a resumable function. The resume() method produces None until the resumable computation is complete, at which point it produces Some. Interestingly enough, this is a near-perfect dual for Iterator - even up to the lack of guarantees should resume() be called again after it returns Some.

A resumable function would then return some R: Resumable, await would be a trivial expansion using match, and suspension points would be simple syntax sugar in the same way closures are. The last could be left to later (pending compiler support), while the former parts require no special support whatsoever.

eddyb commented 8 years ago

We had some discussions in Orlando last year, where we agreed on state-machine generators on top of something like mio for high-efficiency network servers.

Every time I see mentions of coroutines (as in, "green threads"; Python sort of muddled the distinction by what I think is terminology misuse) or boxed callbacks in async IO discussions they make me sad because those actively impose limits on what you can actually do at any given time without running out of memory, not to mention slowdowns, although some virtual call costs wouldn't be noticeable.

cc @pcwalton who knows more about the state machines + mio approach.

eddyb commented 8 years ago

@eternaleye You're describing the other half of a generator, for which a minimal API would be:

pub enum Yield<T, R> {
    Value(T),
    Return(R)
}

pub trait Generator {
    type Value;
    type Return;
    fn next(&mut self) -> Yield<Self::Value, Self::Return>;
}

That is necessary to support both yield expr and return expr, although not enough to actually give input back to the generator, which introduces several complications.

eternaleye commented 8 years ago

@eddyb That's a very odd trait for Generator, IMO. I mean, your definition is just a Result-ified Iterator.

Meanwhile, I see real uses for Iterators where Item: Resumable - in particular, if each value is expensive to compute, and latency is a concern.

eddyb commented 8 years ago

@eternaleye In a way, Yield itself is Result, but it makes more sense when your final return value is io::Result<R>, where Iterator<Item=Result<(), io::Result<R>>> seems very confusing and has three possible after-the-end behaviors (None, Err(Ok(())) or panic).

eddyb commented 8 years ago

Before hearing @pcwalton explain how simple mio can be with generators, where you never have to pass buffers in and out, you just try to read repeatedly, and effectively yield on "would block", I wrote this crazy contraption as a model for generators with multiple entry points based on the types they expect to get back as input.

In a comment there is this playpen link for an example using the manually desugared implementation to show off "copying a file in chunks, with progress notification".

This model hits a wall when you have some construct that will execute multiple await requests in parallel, because it would need a way to know which one succeeded, whereas in the mio model, you just retry all of them, and at least one will succeed.

Based on what I now know, I would probably implement the mio model with helpers for non-network IO to be fair to multiple network clients, in which case, reading from a file could just somtimes result in an artificial "would block", causing different actions to be also attempted.

eternaleye commented 8 years ago

@eddyb: My point is that your definition of Generator is quite different from what I see almost anywhere else; the common definition seems to be admirably captured by Iterator.

Rather, I see Iterator and Resumable as duals in a very real, meaningful way:

An Iterator expands itself into multiple values by repeatedly invoking the same computation
A Resumable collects itself into a single computation by sequentially invoking potentially-distinct computations

A perfect example of something that maps well to Resumable is Iterator::fold() - each step of that maps perfectly to a call to Resumable::resume(), with the final step producing Some

tikue commented 8 years ago

Resumable as described wouldn't work with generators, whereas @eddyb's Yield would.

eternaleye commented 8 years ago

@tikue: I'd suggest reading the linked C++ proposal more deeply - not only can resumables work with generators, the document contains an implementation of generators using resumables.

eddyb commented 8 years ago

Finally gave the paper a read, and it's pretty much what I expected: a dirty value stashing trick (which @eternaleye called "extending the iterator" - confusing out of context):

fn fib(mut n: usize, out: &mut Option<i32>) -> impl Iterator<Item=()> {
    let mut a = 0;
    let mut b = 1;
    while n > 0 {
        *out = Some(a); yield (); // yield(a)
        let next = a + b;
        a = b;
        b = next;
    }
}

fn main() {
    let mut val = None;
    for _ in fib(5, &mut val) {
        println!("{}", val.take().unwrap());
    }
}

While this is uglier than actually having generators in the language (not to mention somewhat less efficient), it also poses borrowing problems: the generator object would borrow val, making it impossible in the current-world borrow-checking to .take() from it in the loop.

Actually solving this problem involves knowledge of the generator object and/or the Iterator::next implementation for it, as in other cases mutating a mutably borrowed value can result in memory unsafety (iterator invalidation, really, if you replace the generator object with Vec iterator).

glaebhoerl commented 8 years ago

@eternaleye I don't understand how Iterator and Resumable are supposed to be duals when their definitions are alpha-equivalent?

The actual dual to iterator is -- allegedly -- the observer pattern. (Linking the google cache because the original seems to be broken.)

eternaleye commented 8 years ago

So @eddyb and I talked further on IRC, and I'm going to try and summarize (if I get anything wrong, just let me know).

Once I got what @eddyb meant regarding the lifetime issues, I agreed that it left an N4453-style approach dead in the water
However, I felt that generators as posed above had a problem of conflation: All suspension points yield values. This poses problems for a number of cases, such as when producing a value may take a while, and the programmer wishes to insert preemption points - this is not particularly nice to express with the above.
Furthermore, there are interesting interactions with "proxy yield" and explicitly driving generators from within other generators.

IMO, the optimal form of asynchronous execution is one in which:

The programmer writes code which, as much as is feasible, appears to be synchronous
When the programmer wishes greater control, they can explicitly handle the asynchrony of functions they call
Synchronous code can easily use asynchronous code without "viral" keywords propagating through

Implicit in this framing is that case 1 is more common than case 2 - most of the time, the programmer should not have to care whether something is synchronous.

Consider the following (strawman syntax):

async fn fib() -> Yield<i64> {
    let mut prev = 1i64;
    yield prev;
    let mut cur = 1i64;
    yield cur;
    while cur > 0 {
        let (old, next) = (cur, prev + cur);
        prev = old;
        cur = next;
        yield cur;
    }
}

async fn fib_prime() -> Yield<i64> {
    for i in fib().filter( is_prime ) {
        yield i;
    }
}

Note that fib() is called as a perfectly normal function in fib_prime() - this is due to point (1) - requiring a keyword here would clutter the majority of code.

fib_prime() may potentially run for quite some time before it yields something; this would cause problems of responsiveness. This can be resolved by having fib_prime() suspend even when it would not yield - specifically, if it suspends any time an async fn it calls would suspend. Here, that's fib() - this limits its worst-case execution between suspensions to one iteration of fib()and one call to is_prime(), rather than an unbounded number of the same.

Of course, there are times when one needs to explicitly drive a Generator. Here's such a case:

async fn foo() {
    let x = proc fib();
    for (i, v) in x.into_iter().enumerate() {
    println!( "{}", i );
    if i % 2 == 0 {
        suspend;
    }
}

This function has side effects, and needs to ensure it's not interrupted between any pair of operations. To do this, it uses the proc keyword to denote that, rather than being interested in the value of the fib() function (regardless of execution semantics), it wishes x to hold "theprocess by which fib() produces the values". It then converts the Generator into an Iterator, and processes it accordingly - suspending itself at a safe time to do so.

This satisfies (2) - the ability to control asynchrony when needed.

Finally, for (3), the tools have already been introduced:

fn baz() {
    let x = fib_prime();
    for i in x {
        println!( "{}", i );
    }
}

As a synchronous function has no suspension points, the behavior from (1) of suspending when fib_prime() suspends would be nonsensical - thus, calling an asynchronous function from a synchronous function returns the generator, in the same manner as using the proc keyword within an asynchronous function. This, together with the default behavior in (1), avoids the need for viral keywords.

In addition, as demonstrated in (2), generators convert cleanly to iterators. As a result, synchronous code that calls asynchronous code has essentially no syntactic overhead whatsoever.

However, in order to support this, the value returned from a Generator needs to support three possible cases, not just two (as @eddyb's suggestion had):

Suspend execution without a value
Suspend execution, producing a value
Finish execution

pub enum Yield<T> {
    Suspend,
    Value(T),
    Done,
}

pub trait Generator {
    type Value;
    fn next(&mut self) -> Yield<Self::Value>;
}

Note: @eddyb's design has Return carry a value R, but it may not be worth it: with suspending, an async function is zero-or-more Suspend followed by exactly one Value, while an async iterator is zero or more sequences of (zero-or-more Suspend followed by zero-or-more Yield) - the return value complicates conversion to an Iterator (if R is not ()) or a Fn{,Mut,Once} (if T is not ()))

In the case illustrated with (1), the "outer Generator" (fib_prime()) would drive the "inner Generator" (fib()) in lockstep - but when the inner Generator produces Value, the outer Generator will capture it, and emit Suspend instead. On next iteration, it will feed the captured value into its own update process.

In the case illustrated with (2), the outer Generator is wholly unaware of the inner Generator - the update process handles driving it entirely on its own.

In the case illustrated with (3), conversion to an Iterator simply wraps the Generator in a newtype, which is trivial and loops on receiving Suspend.

bugaevc commented 8 years ago

May I mention -- coroutines/generators are closely related to the IO monad, and await syntax is just another form of do notation. See #243, in particular this comment by @Eilie.

eddyb commented 8 years ago

@bugaevc I really wish someone had made a nice blog post about this, because rehashing the argument every time is tiresome and pointless: monads a poor fit for Rust's low-level and imperative semantics.

Among other things:

You have to pick what kind of closure to take - Option wants FnOnce, anything which calls the closure more than once wants FnMut or Fn, depending
Monad::bind has to be generic over the closure type, but that means the implementer can't keep the closure around without boxing it (unlike, say, the map adapter on iterators, which gets static dispatch)
How do you integrate Rust's imperative control-flow primitives with closures and monads? This is an unsolved problem AFAIK, other than approaches like "TCP" which are usually implemented using exception handling machinery (hard to account for and optimize correctly).
A state machine transform, unlike monads or generalized CPS, can guarantee static dispatch and no additional indirection. This is something people building scalable systems want.

@dwrensha tried to explain how generators can be used in their blogpost but there's still lots of work to do to refine the details of the various approaches and then compare them.

Until someone can actually show how to implement useful monad-like abstractions in Rust that can compete with generators + ?/catch, I remain unconvinced about the value of that front.

bugaevc commented 8 years ago

@eddyb Sure, I understand monads don't fit into Rust as easy as they do into Haskell. I really like ?/catch and can't wait for them to reach stable. And yeah, they were relatively easy and straightforward to implement without having to dive into this lambda / state machine nightmare.

I have some experience with async stuff in Python, so I do understand both how generators work and how I/O coroutines can be implemented on top of them.

What I want to emphasize is that implementing await/yield syntax is essentially the same as implementing the do notation (with FnOnces).

? was a trivial special case that could possibly be implemented as do, but it was easy enough to be implemented the straightforward way. This time it is the general case. There is no simple way to do it without plunging into monads and do notation, and it has exactly the same set of problems.

Not really, since it's limited to FnOnces and that gets away with most of the problems -- but does Rust really need the totally general case? It seems to me that FnOnces would be enough.

Sure, the implementor can't keep the closure around without boxing it -- but neither can the I/O loop (or whatever machinery implements async I/O) keep around the implicit closure that implements trait Generator (coroutine/generator continuation). It's one-to-one mapping, that's what I'm trying to say.

eddyb commented 8 years ago

@bugaevc But a generator has N states in a single type, where monads/CPS have one closure type per state.

but neither can the I/O loop (or whatever machinery implements async I/O)

But I can allocate an array of a million instances of a single generator ahead of time, with no indirection. This is the future of massively scalable services: minimal computing overhead, maximal throughput.

tikue commented 8 years ago

@eddyb can you elaborate? I can't really picture when you'd want to run a million copies of the same service. I actually think running many different services on a single event loop is going to be the more common use case. Any service that delegates to another service for part of its work will need to do this. This is actually something I'm struggling with in tarpc, where I box services on the event loop to enable heterogeneity.

krstoff commented 8 years ago

@eddyb: I don't want to sidetrack the discussion on-hand, but someone did write an excellent blog-post about it (monads and Rust) some time ago.

https://m4rw3r.github.io/rust-and-monad-trait/

eddyb commented 8 years ago

@krstoff That addresses only my first point and pretends that it can eschew the second (which is not true, not while keeping the monad consistently typed). And it's already pretty complex, sporting a huge feature requirement list.

sagebind commented 8 years ago

First, this thread has been very, very interesting to follow, and I really hope Rust can bring a standard solution to async (or at least something for the many crates out there to play nice together). I can't help but take a stab at this stuff myself (maybe yet another crate will show up from me?).

This thread seems to have wandered around for a while, I think because everyone already has in their own mind how they want the abstraction of async to look like. Whether its callbacks, coroutines, or what have you. None of those are strictly required for async -- they are good abstractions over something that can be confusing to work with at times, but are not required for low-level interoperability.

The issue is this: we want some operation to begin, and we don't want to "block" the code we're at. There are lots of ways to solve this. Here's (another) summary:

Coroutines: Requires compiler support, and/or really, really fancy macros to break the linear flow of execution. When we either yield or await on something, we want to suspend the current coroutine and jump to code elsewhere. When the value is ready, we ought to jump back to where we were and resume execution.

Question for those already thinking on these: for an expression let x = await y;, what is a valid type for y such that it can be "awaited" on? We can do some trickery to optimize coroutines calling other coroutines (to "flatten" them), but otherwise we need some sort of representation of a future value... which leads to...
Futures/promises/awaitables (there are many names used for similar concepts): Basically like a Result<T, E>, but something that has three states:
```
pub enum Awaitable<T, E> {
 Pending,
 Ok(T),
 Err(E),
}
```
Of course this implementation won't work because the awaitable must be mutated from a Pending to an Ok or Err once the value becomes available, but this definition is for clarity of the core idea. This concept isn't completely at odds with other paradigms, unless you need to bypass an actual Awaitable for the sake of performance (like in yield from).
Callbacks: Everyone seems to agree that callbacks are ugly; I do as well, and it quickly becomes a hot mess to work with and causes code to become really messy. However, callbacks have the advantage of being approach-agnostic (with a few exceptions). For example, Awaitables could use callbacks to handle the completion of the value (adapting a previous example:
```
let mut buff = [0u8;1024];
TcpStream::bind_async("example.com:8080").result(|s| {
 s.read_async(&mut buff)
}).map(|c| {
 String::from_utf8(buff[..c].to_vec()).unwrap()
})
```
Not the cleanest, but not all that different from working with a bunch of Results. Not being able to use pattern matching might be annoying, but we could minimize the callbacks with a signature like fn result(then: FnOnce(Result<T, E>) -> Awaitable<T, E>) -> Awaitable<T, E>; and work with the Result directly.

As noted by those knowledgeable in the Python world, async/await was introduced because of the ever so slightly different semantics from generators and to prevent muddying coroutines and generators together. We may want to keep them separate right off the bat to prevent this problem before it starts.

I think await would be the best solution, but I'd rather take advantage of Rust's honest typing precedent and avoid the async keyword. Instead of

async fn do_something<R: Read>(r: R) -> Result<String, ?> {
    (await r.read_async(1024)).map(|buf| String::from(buf))
}

why not this?

fn do_something<R: Read>(r: R) -> Awaitable<String, ?> {
    await r.read_async(1024).map(|buf| String::from(buf))
}

Food for thought.

@icorderi's proposal seems almost useless to me, unless I am missing something?

trait Await<T> {
    fn await(ctx: &mut Context) -> T;
}

For this to be useful at all requires there to be some sort of hidden semantics here that is capable of suspending the execution context, in a rather preemptive manner. This signature promises no such suspension and is misleading. The only way for this to work is to have:

coroutines that can pick up on this and suspend the couroutine without it being in the type signature or without calling a scheduler-specific function. Impressive ability to me, unless coroutines are implemented by the compiler.
thread scheduler that synchronously waits for the value to arrive in a new thread. Again not revealed anywhere in the type signature or invocation. In addition, blocking, just somewhere else other than the current thread isn't really async.
block the current thread at the await() function call for the value to arrive. Doesn't this defeat the point of trying to introduce async? Just do a regular blocking call and save the overhead.

@eternaleye's trait definitions for generators seems the most realistic to me, but again, do we want to muddy generators and coroutines together? If so, that's an acceptable stance, just not one I particularly agree with.

Sources: My own experience in implementing async over the years and working hard bringing it to PHP. Sorry for the wall of text.

glaebhoerl commented 8 years ago

do we want to muddy generators and coroutines together? If so, that's an acceptable stance, just not one I particularly agree with.

It's all just control flow... what's the rationale for segregating them? (Honest question for my own edification.)

eddyb commented 8 years ago

I much prefer the terminology I've seen used during ES Harmony, before ES6 chose a generator design:

generator - resumable function, can be stackful or stackless (stackful being generally wasteful here)
- expression yield (i.e. passing a value into the generator step function) is an optional feature
coroutine - yield-using function in a resumable call stack, which can also contain regular functions

This makes "stackless coroutine" either an oxymoron or an interprocedular function -> generator pass.

What Python did, IIUC, was call generators "coroutines" when they also had expression yield. However, expression yield has no measurable implementation costs in the stackful cases, and neither in stackless state machine transforms performed on CFGs (which is what MIR is), only AST transforms.

We can waste our time either by using misguided terminology variants or being very explicit every single time (e.g. "stackless resumable functions with inputs/outputs"), agree on less worse ones or invent completely new terms. But, you know, naming things is a hard problem and too much of a bikeshed.

Ericson2314 commented 8 years ago

I was talking to eddyb about this last night. A few things that occurred to me:

Do notation as I define in https://internals.rust-lang.org/t/autoclosures-via-swift/3616/52 doesn't actually rely on a monad Trait--It isn't a type directed desuger. (c.f. GHC's rebindable syntax extension). This avoids the lack of HKT and other issues for the time being.
I think generators that always take/return the same type are horribly inferior to their session-type equivalent. I have session types for coroutines (see https://github.com/QuiltOS/libfringe/blob/params/src/pattern/cycle.rs for an example). I think for generators, we'd want a "session trait" instead. Something like trait Cycle3<T, U, V>: FnOnce(T) -> (U, impl Cycle3<V, T, U>) { } should work.

eddyb commented 8 years ago

@Ericson2314 I'm not entirely sure how cycling works, I haven't yet looked into that. However: I should point out that the complexity would be mostly at the type-level, not the state-machine transform. We can provide both session types and an iterator interface at the same time, using the same MIR-based transformation, but different type-level handling.

Ericson2314 commented 8 years ago

@eddyb you said you don't like https://gist.github.com/eddyb/822c658190ccf18058db so much anymore? What is you latest mir-transformation idea?

Previous Next