Tracking issue for RFC 2033: Experimentally add coroutines to Rust

aturon commented 7 years ago

This is an experimental RFC, which means that we have enough confidence in the overall direction that we're willing to land an early implementation to gain experience. However, a complete RFC will be required before any stabilization.

This issue tracks the initial implementation.

related issues

newpavlov commented 5 years ago

So why use this approach instead of the UnsafeGenerator which I've proposed earlier? IIUC both are essentially equivalent to each other, but with yours users have to learn about Pin semantics even if do not work with self-referential structs.

clarfonthey commented 5 years ago

I do agree that the burden of figuring out how to work with self-referential structs should be put on the creators of the self-referential structs, rather than in the API for something like Generator.

For example, why is it that we can't just use &mut self as usual and pass in an &mut PinMut<'a_ T> instead? It seems silly, but it does get around having to put this weird API in the Generator trait.

yuriks commented 5 years ago

If your generator is Unpin, then all someone has to do to use it is to do Pin::new(generator). In return, people who have generators that are not Unpin can also use your code and abstractions on their generators. The Pin API was designed so that pointers to structs which are not self-referential can be easily be put in and out of a Pin. Having a separate UnsafeGenerator trait would force everyone to implement things twice, once for UnsafeGenerator and once for Generator.

yuriks commented 5 years ago

Also, you keep implying that somehow self-referential generators are only relevant for async i/o use cases, but that's not at all true. Any generator that uses borrows to local variables (like this example in this thread) will need to be self-referential.

newpavlov commented 5 years ago

Having a separate UnsafeGenerator trait would force everyone to implement things twice, once for UnsafeGenerator and once for Generator.

Why is that? I don't see why impl<T: UnsafeGenerator> Generator for Pin<T> { .. } wouldn't work.

Pauan commented 5 years ago

@newpavlov Just to make it clear: Pin has absolutely nothing whatsoever to do with asynchronous vs synchronous.

The reason for Pin is to allow for borrowing across yield points, which is necessary/useful for both asynchronous and synchronous code.

Pin just means "you cannot move this value", which allows for self-referential references.

If your struct does not need to pin anything, then a Pin<&mut Self> is the same as &mut Self (it uses DerefMut), so it is just as convenient to use.

It is only when your struct needs to pin something that you need to deal with the complexity of Pin.

In practice that means the only time you need to deal with Pin is if you are creating abstractions which wrap other Generators (like map, filter, etc.)

But if you're creating standalone Generators then you don't need to deal with Pin (because it derefs to &mut Self).

Why is that? I don't see why impl<T: UnsafeGenerator> Generator for Pin<T> { .. } wouldn't work.

Let's suppose we did that. That means that now this code won't work:

let unsafe_generator = || {
    let items = [1, 2, 3];
    for &i in &items {
        yield i;
    }
};

unsafe_generator.map(|x| ...)

It doesn't work because the map method requires self to be a Generator, but unsafe_generator is an UnsafeGenerator.

And your Generator impl requires UnsafeGenerator to be wrapped in Pin, so you would need to use this instead:

let unsafe_generator = || {
    let items = [1, 2, 3];
    for &i in &items {
        yield i;
    }
};

let unsafe_generator = unsafe { Pin::new_unchecked(&mut unsafe_generator) };
unsafe_generator.map(|x| ...)

And now you must carefully ensure that unsafe_generator remains pinned, and is never moved. Hopefully you agree that this is much worse than the previous code.

If instead we require Pin<&mut Self> for the resume method, that means we don't need to do any of that funky stuff, we can just pass unsafe_generator directly to map, and everything works smoothly. Unsafe generators can be treated exactly the same as safe generators!

The difference with your system and the Pin<&mut Self> system is: where is the Pin created?

With your system, you must manually create the Pin (such as when passing unsafe_generator to another API which expects a Generator).

But with Pin<&mut Self> you don't need to create the Pin: it's created automatically for you.

ishitatsuyuki commented 5 years ago

Has there been any discussions about avoiding more than one Box allocation when embedding an immovable generator inside another? That's basically what happens in C++ for optimization purpose, but in Rust we probably need that in the type system.

Another optimization question is about safe-to-move references: particularly, dereferenced Vec references will not change even if the generator moves, but due to how the types work it still requires an immovable generator for now.

Nemo157 commented 5 years ago

No boxes are needed, see https://docs.rs/pin-utils/0.1.0-alpha.4/pin_utils/macro.pin_mut.html for stack pinning that works even for generators in generators.

quantverse commented 5 years ago

Hi to everybody,

experimenting with the generators now and I need to store multiple generators into a Vec. This code works:

#![feature(generators, generator_trait)]

use std::ops::{Generator, GeneratorState};
use std::pin::Pin;

fn main() {

    let gen = Box::new(|| {
        yield 1;
        return "foo"
    });

    let mut vec = Vec::new();

    vec.push(gen);

    match Pin::new(vec[0].as_mut()).resume() {
        GeneratorState::Yielded(1) => {}
        _ => panic!("unexpected return from resume"),
    }

    match Pin::new(vec[0].as_mut()).resume() {
        GeneratorState::Complete("foo") => {}
        _ => panic!("unexpected return from resume"),
    }

}

However I will also need to be able to explicitly specify the vec type without using the inference, but this is where I fail. This code does not work:

#![feature(generators, generator_trait)]

use std::ops::{Generator, GeneratorState};
use std::pin::Pin;

fn main() {

    let gen = Box::new(|| {
        yield 1;
        return "foo"
    });

    //let mut vec = Vec::new();
    let mut vec:Vec<Box<Generator <Yield=i32, Return=&str>>> = Vec::new(); // what is the correct type?

    vec.push(gen);

    match Pin::new(vec[0].as_mut()).resume() {
        GeneratorState::Yielded(1) => {}
        _ => panic!("unexpected return from resume"),
    }

    match Pin::new(vec[0].as_mut()).resume() {
        GeneratorState::Complete("foo") => {}
        _ => panic!("unexpected return from resume"),
    }

}

I get:

error[E0277]: the trait bound `dyn std::ops::Generator<Return = &str, Yield = i32>: std::marker::Unpin` is not satisfied
  --> src/main.rs:19:11
   |
19 |     match Pin::new(vec[0].as_mut()).resume() {
   |           ^^^^^^^^ the trait `std::marker::Unpin` is not implemented for `dyn std::ops::Generator<Return = &str, Yield = i32>`
   |
   = note: required by `std::pin::Pin::<P>::new`

Any idea what is the correct way to explicitly declare the vector?

Nemo157 commented 5 years ago

Use Vec<Box<dyn Generator<Yield = i32, Return = &'static str> + Unpin>>, Unpin is a marker trait you can add on to other trait bounds.

Or use Vec<Pin<Box<dyn Generator<Yield = i32, Return = &'static str>>>> if you want to support self-referential generators as well.

quantverse commented 5 years ago

Wow, thanks a lot!

lachlansneff commented 5 years ago

Has anyone done any work with generator resume arguments? Is there an RFC for them?

ishitatsuyuki commented 5 years ago

Not that I know of. The original RFC mentioned that resume arguments could be added, but that's all.

Zoxc commented 5 years ago

@lachlansneff My initial PR adding generator included resume arguments and that implementation is still in one of my branches. Adding generator resume arguments would be covered by the existing eRFC for generators.

quantverse commented 5 years ago

Found this: https://internals.rust-lang.org/t/pre-rfc-generator-resume-args/10011

npmccallum commented 5 years ago

I have read through the RFC and all the comments here. I may be missing this. However, it doesn't seem that the case of cancellation has been considered.

Let's say that I want to collect the first million numbers in the Fibonacci sequence. I can write a generator that never returns and yields the next number in the sequence. Then I can call resume() until I have one million results. So far so good. However, this needs to use some kind of BigNum because the numbers are large. The BigNum implements Drop to free its memory buffer.

After I've collected my first million BigNums, I stop resuming. What happens to the two BigNums that I'm internally keeping as state? Are they dropped? Do we get a memory leak?

It seems like some sort of cancellation routine is needed here to explicitly end execution early. It probably also needs to be defined in the Generator trait otherwise independent types implementing Generator will not be well-defined.

Nemo157 commented 5 years ago

After I've collected my first million BigNums, I stop resuming. What happens to the two BigNums that I'm internally keeping as state? Are they dropped? Do we get a memory leak?

They are stored as part of the generated impl Generator type, the transform also produces a generated Drop impl for the type, so when you drop the generator all of the current state is correctly dropped (simple demonstration playground).

Pauan commented 5 years ago

After I've collected my first million BigNums, I stop resuming. What happens to the two BigNums that I'm internally keeping as state? Are they dropped? Do we get a memory leak?

Generators work similarly to closures: they are converted into a struct which contains all of the state for the generator. So the generator struct will contain the BigNums as fields.

When the generator struct is dropped, it will automatically drop all of its fields, so there is no memory leak. Here is an example:

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=74878c5f222d4329fdf654f5d873d44d

npmccallum commented 5 years ago

Should Generator depend on Drop then? Third-party implementors of Generator need either clear documentation on the need to handle cancellation semantics or a strict dependency on Drop to force them to do so. Possibly both.

Pauan commented 5 years ago

@npmccallum I'm not sure what you mean. Things will automatically be dropped even if you don't implement Drop.

So unless you're intentionally trying to leak memory, it's very hard to leak memory by accident.

npmccallum commented 5 years ago

@Pauan So if you create a generator using the closure-like syntax, the Rust compiler will create a new ephemeral type, implement Generator and Drop on it and create an instance of that type.

The Rust compiler, however, is not the only party that can implement Generator and Drop on a type. For example, a number of stackfull coroutine crates do this. We might want to document that if you implement Generator you also need to implement Drop and handle cancellation.

We probably shouldn't make Generator depend on Drop since this would make it awkward to implement Generator on types which don't need to free resources.

rpjohnst commented 5 years ago

Unless you're actually managing memory yourself you don't need to implement Drop at all- when you just have a BigNum field the compiler will generate "drop glue" to call BigNum::drop for you. This is exactly the same as a case like this:

struct SomeType {
    some_data: Vec<i32>,
}

// SomeType doesn't implement Drop, but you don't see it leaking anything

npmccallum commented 5 years ago

@rpjohnst Agreed.

Pauan commented 5 years ago

So if you create a generator using the closure-like syntax, the Rust compiler will create a new ephemeral type, implement Generator and Drop on it and create an instance of that type.

That is correct, yes.

We might want to document that if you implement Generator you also need to implement Drop and handle cancellation.

No, that is not necessary. What I said is not specific to generators, it applies to all Rust types:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=f6d244f8fc13f887a99d1f3f425ddaec

As you can see, even though Bar does not implement Drop, it still dropped Foo anyways. And even if Bar does implement Drop, it still drops Foo anyways:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=1b401bfa0b63d97bd81a35f6a5529f84

This is a fundamental property of Rust: everything is automatically dropped, even if you do not implement Drop. That applies to all Rust types (with a couple exceptions like ManuallyDrop).

The only time you need to implement Drop is if you're managing some external resource (like the heap, or a file descriptor, or a network socket, or something like that). But that's not specific to generators.

It's very hard to accidentally leak memory in Rust. The only way I know of to leak memory accidentally is with Rc / Arc cycles.

Nemo157 commented 5 years ago

To be fair, if you are implementing something acting like the coroutines created by the generator transform then you are likely to be manually managing your own internal memory to deal with the states efficiently e.g. translating my earlier example into a manually implemented Generator uses MaybeUninit everywhere and needs a custom Drop impl to take care of it (and is highly likely to not be panic safe).

But still that is not specific to generators/coroutines, that's just to do with manually managed internal memory, and doesn't require any changes to this feature to deal with.

da-x commented 4 years ago

While this is still unstable, can async/await be used in Rust stable in place of generators in a way that does not carry in the full fledged futures and task scheduler run-time, but rather in a more simply typed localized iterator-like interface?

HadrienG2 commented 4 years ago

Sure, there are even crates for that. See https://github.com/whatisaphone/genawaiter for example.

da-x commented 4 years ago

I compared a genawaiter Iterator with the async_std way of accomplishing the same task, i.e. an iterator async task send via a async_std::sync::Sender plus an receive Iterator that does async_std::task::block_on(receiver.recv()). The latter has needless calls to the futex syscall on Linux, and genawaiter of course does not, which is cool. Thanks @whatisaphone, @HadrienG2

whatisaphone commented 4 years ago

Thanks for calling me out, I meant to leave a comment here. If anyone wants to experiment with generator control flow, using genawaiter as a starting point might be quicker than hacking the compiler itself.

I think resume arguments are an important feature, and genawaiter supports them via the resume_with method. I couldn't come up with a good answer for where the first resume argument should end up (before the first yield statement). See the note in this section. I'd love to see a better way to handle that.

HadrienG2 commented 4 years ago

@whatisaphone One design which was proposed before in order to resolve this problem is to unify generator call arguments and generator resume arguments. Basically, your generator acts like an FnMut(ResumeArgs).

It is not very consensual yet though. Critics argue that it is surprising to have the arguments of a generator fn change magically every time you reach a yield statement, and that it's not nice to have to explicitely save them if you don't want to lose them.

programmerjake commented 4 years ago

@whatisaphone how about something like:

// either of:
// fn generator(initial_arg1: i32, initial_arg2: i64) -> String
// yield(initial_resume_arg_pat: (i32, i32)) -> i32
// {

// or:
fn generator(initial_arg1: i32, initial_arg2: i64) -> impl Generator<Return=String, Yield=i32, Resume=(i32, i32)>
yield(initial_resume_arg_pat)
{
    println!("{:?}", (initial_arg1, initial_arg2, initial_resume_arg_pat));
    let yield_value = 23i32;
    let next_resume_arg = yield yield_value;
    println!("{:?}", next_resume_arg);
    "return value".to_string()
}

fn main() {
    let mut g = generator(1i32, 2i64);
    let mut g = unsafe { Pin::new_unchecked(&mut g) };
    assert_eq!(g.resume((5i32, 6i32)), Yielded(23i32));
    assert_eq!(g.resume((7i32, 8i32)), Complete("return value".to_string()));
}

It prints:

(1, 2, (5, 6))
(7, 8)

Generator lambda function:

let lambda: impl FnOnce(i32, i64) -> impl Generator<Return=String, Yield=i32, Resume=R> =
|initial_arg1: i32, initial_arg2: i64| -> i32 /* or impl Generator */
yield(initial_resume_arg_pat: R) -> i32
{
    todo!()
};

(edit: fix generator lambda type)

programmerjake commented 4 years ago

@HadrienG2 My idea avoids losing initial resume arguments and doesn't have magically changing variables.

jonas-schievink commented 4 years ago

I've implemented resume arguments in https://github.com/rust-lang/rust/pull/68524. They go for the "minimal language changes" solution of passing the first resume argument as a "normal" argument and subsequent ones as the result of yield expressions.

I think we should land this implementation first, since it unlocks a lot of interesting patterns (including async/await support on #[no_std], which is what I'm interested in). Generators are experimental, so that shouldn't be a problem. Once we've collected some experience with this we can work on writing a fully-fledged RFC for generators.

bstrie commented 4 years ago

@jonas-schievink does your implementation of generator resume arguments relate to the RFC under discussion at https://github.com/rust-lang/rfcs/pull/2781 ?

jonas-schievink commented 4 years ago

@bstrie No, I've found that proposal to be extremely confusing with how it changes the value of supposedly immutable variables after a yield. My implementation just makes yield evaluate to the resume argument, and passes the first one as an argument.

alexeymuranov commented 4 years ago

I avoid using generators in Python because of their "obscure" syntax: to know whether you are looking at a functions definition or at a generator definition, you need to search for the yield keyword in the body. To me this is as if a function definition and a class definition in Python where both using def (instead of def and class), and to recognise a function, it was necessary to search for return keyword in the body.

When i choose to define a generator in Python, i add a comment before the definition to mark it as a generator.

I am surprised that Rust is following Python here. IMO, if both

let mut one = || { if false {}; 1 };

and

let mut one = || { if false {yield 42}; 1 };

are allowed, they should mean the same thing.

I would expect the generator definition syntax to be remarkably different from the closure definition syntax.

newpavlov commented 4 years ago

Yeah, I also quite dislike that generator syntax mimics closures, I have proposed an alternative syntax here, although it will need a new edition and probably we should use something else instead of generator, e.g. something like cort as a shorthand for coroutine (see this post for motivation).

programmerjake commented 4 years ago

@newpavlov @alexeymuranov I proposed a syntax that doesn't require a new edition but is unambiguous here. It also doesn't have magically changing function arguments.

It's unambiguous because of the yield that is declaring the yield and resume types that is between the return type and the function body.

A generator lambda function would look like:

|initial_arg1: i32, initial_arg2: i64| -> i16 /* or impl Generator */
yield(initial_resume_arg_pat: R) -> i32
{
    let resume_arg2: R = yield 23i32;
    0i16
}

LionsAd commented 4 years ago

I like how libfringe does it where the initial input (resume arguments) to the generator are specified as part of the closure:

|input, yielder| {
  loop {
    input = yielder.yield(input + 1);
  }
}

tommythorn commented 3 years ago

I raised this question on discord and was asked to add it here: Where is the high-performance cooperative multitasking facility?

Clarifications: I'm writing a very high performance simulation system that is most comfortably modeled with actors (or as I knew it, communicating processes, CSP). You can easily do it with (OS) threads, but that's 3-4 orders of magnitude slower than doing it with coroutines from a single pinned thread.

Benchmarking modern C++ stackless corutines I can get 600 M/s context switches (which is insanely good), roughly 12 x86 instructions per iteration. Closer to the model I want is cooperative multitasking, which the Boost Contexts provides. This "merely" manages ~ 100 M/s context switches.

Alas doing to do this in Rust has been a disaster; I've tried some six different crates, but everything is tuned for multithreading and IO, not for compute bound single thread. Performance is typically ~ 300 k/s (eg. tokio). The best I have gotten so far is with futures_lite and spin_on which can eek out 50 M/s (and a ~ 64 instruction inner loop with lots of indirect branches). Looking at the binary trace it seems trying to shoehorn cooperative multitasking over async incurs a fair bit of overhead.

tommythorn commented 3 years ago

I forgot to include my best case example that manages 50 M/s switches (1/6th of what I can do with C++):

use futures_lite::future;

pub fn main() {
    let my_future = async {
        for _step in 0..100_000_000 {
            future::yield_now().await;
        }
    };

    println!("size {}", std::mem::size_of_val(&my_future));

    spin_on::spin_on(my_future);
}

Diggsey commented 3 years ago

@tommythorn the perf difference is likely because the C++ version does not have/need the equivalent of a "waker" to notify a future when to resume.

Since you are using spin_on, and your task is CPU bound and not I/O bound, it may be that you do not need this waker system at all.

However, future::yield_now() will always notify the executor via the task's waker. You could probably improve performance by writing a custom future that returns NotReady the first time it is called, but does not notify the waker. Awaiting this custom future would be more efficient (although strictly speaking you would be violating the contract expected of futures).

In the long run, I think Rust will get generalized coroutines, which would allow you to use the state-machine transform performed by the compiler, without being tied to the async/await model. I wrote down my thoughts on how this should look here.

tommythorn commented 3 years ago

Exactly what it looked like from reading the dynamic instruction trace (a dummy waker was consulted etc). I only have a few month of Rust experience, but I'll try. I'll not pollute this thread any longer, but I must plead that "generator/coroutines" is not what I need and implementing cooperative multitasking on top of generators is critical overhead. What I would like is actual coroutines as defined by Knuth (https://en.wikipedia.org/wiki/Coroutine) with explicit transfer of control, not Python style generators. Thanks.

alexeymuranov commented 3 years ago

I posed once a question on Rust reddit: How about state machines instead of generators?. There may be some relevant comments there.

Diggsey commented 3 years ago

I must plead that "generator/coroutines" is not what I need and implementing cooperative multitasking on top of generators is critical overhead. What I would like is actual coroutines as defined by Knuth (https://en.wikipedia.org/wiki/Coroutine) with explicit transfer of control, not Python style generators. Thanks.

Did you read https://en.wikipedia.org/wiki/Coroutine#Comparison_with_generators ?

There should be zero overhead to implementing coroutines on top of generators like this: in both cases you would need an indirect call as soon as more than one coroutine is involved.

tommythorn commented 3 years ago

@Diggsey, I didn't read that but it's a well know fact, but it's not free; you are implementing a trampoline and thus you transfer control twice, at least one of which is an indirect branch.

In detail: in the classic coroutine implementation yield_to(B) while running A, will save callee registers, store the stack pointer in A, load the stack pointer from B, load the callees, and return. That's it. You have one call to the yield_to(B) and one return. With a generator, you have three contexts to switch between: A -> C -> B (looking at your link they correspond to A-produce, B-consume, and C-dispatcher. Twice the cost.

ADD: Note, with a small change to yield_to() to allow it to pass a value to the next thread, eg: yield_to(B, value) which will arrive as the return value in B: value = yield_to(...), you can implement generators directly on top of this with zero cost. (I used exactly this primitive in a high performance firmware for a NAND flash controller. It was almost as fast as the convoluted state machine it replaced, but vastly more readable and less error prone).

Diggsey commented 3 years ago

With a generator, you have three contexts to switch between: A -> C -> B

I'm not sure what definition of "context switch" you are using, but this is definitely not true.

With a generator (based on the state-machine transform), the call stack initially looks like this:

A <-- top
C

When A yields, it corresponds to a ret:

C <-- top

And then C calls B:

B <-- top
C

In other words, switching coroutines in this model involves a return and an indirect call. That's it. There's no stack swapping or other context switching happening at all.

What you described here as a "classic coroutine implementation" is often slower as it requires swapping out the whole stack, ie. there's a real context switch happening. Furthermore, the compiler cannot optimize across the context switch. Using a state-machine transform is both more cache-friendly and more compiler friendly.

stuhood commented 3 years ago

@tommythorn : Some of the confusion might be that while async/await use a trampoline, the coroutines/generators described by this issue are a lower level building block (for which there is no stable syntax: can see examples of the unstable API in https://github.com/rust-lang/rust/pull/68524/files?file-filters%5B%5D=.md&file-filters%5B%5D=.stderr) that work the way @Diggsey describes. The async/await transform consumes that low level building block to add a high level interface that is amenable to concurrent/multi-core schedulers.

vultix commented 3 years ago

I have a use case for generators that isn't currently supported: I need a Generator that has access to a reference only for the short duration between yields.

Here's an example:

#![feature(generators)]

use std::ops::Generator;

fn test() -> impl Generator<&mut String> {
// What lifetime goes here? ^

    |resume: &mut String| {
        resume.push_str("hello");
        let resume: &mut String = yield ();
        resume.push_str("world");
    }
}

This could be done using GATs if the Generator trait looked more like this:

trait Generator {
    type Resume<'a>;
    type Yield;
    type Return;

    fn resume<'a>(self: Pin<&mut Self>, arg: Self::Resume<'a>) -> GeneratorState<Self::Yield, Self::Return>;
}

fn test() -> impl Generator<Resume<'a> = &'a mut String> {

Nemo157 commented 3 years ago

See https://github.com/rust-lang/rust/issues/68923, the function signature there doesn't need GATs, if transient references actually worked then it could be:

fn test() -> impl for<'a> Generator<&'a mut String>

rust-lang / rust

Tracking issue for RFC 2033: Experimentally add coroutines to Rust #43122