hnes / libaco

A blazing fast and lightweight C asymmetric coroutine library 💎 ⛅🚀⛅🌞
https://libaco.org
Apache License 2.0
3.49k stars 392 forks source link

improvements to documentation #25

Open ioquatix opened 5 years ago

ioquatix commented 5 years ago

Hello, I'm so sorry, but I'm really interested in this library, but I don't quite understand from the diagrams what is going on.

Do you think we can work together to improve the documentation? My naivety might help to improve the documentation as I can point out what I don't understand and perhaps make a PR with improved documentation.

In the first instance, I don't understand how this differs from normal stackful coroutines (fibres). To me, it looks like you are storing the registers in a private area, but then sharing a single stack between multiple coroutines. Does this mean stack is trashed when switching between coroutines? i.e. local variables can't be used. Is this an implementation of stackless coroutine?

hnes commented 5 years ago

Hi ioquatix, I'm sorry to reply so late. In these recent weeks I have been focusing on the development of the next release of libaco, i.e. the issue #22. I'm hoping this reply could be still helpful to you.

Hello, I'm so sorry, but I'm really interested in this library, but I don't quite understand from the diagrams what is going on.

I'm very glad that libaco is useful to you, cheers!

Do you think we can work together to improve the documentation? My naivety might help to improve the documentation as I can point out what I don't understand and perhaps make a PR with improved documentation.

Yes, of course I'm willing to do so. Just feel free to ask any question you like. Any issue/PR that is aiming to make libaco better would be welcomed and highly appreciated. libaco is an open source project and it belongs to all of us.

In the first instance, I don't understand how this differs from normal stackful coroutines (fibres). To me, it looks like you are storing the registers in a private area, but then sharing a single stack between multiple coroutines. Does this mean stack is trashed when switching between coroutines? i.e. local variables can't be used. Is this an implementation of stackless coroutine?

First of all, libaco is a stackful implementation of coroutine. Thus, keeping the stack of coroutine be intact during its whole lifecycle should be essential to a correct coroutine implementation.

So, when one coroutine shares an execution stack with other coroutines in libaco, we have to ensure this coroutine could feel like it is "monopolizing" the stack although the stack is being sharing by many coroutines indeed.

Thus, we have the solution below (it is just a piece of very rough pseudocode for aco_resume, and here you may find the actual implementation):

void aco_resume(co){
    main_co = co->main_co
    // // resume `co`'s execution stack
    // if the stack's owner is `co`
    //    done
    // else
    //    if stack's owner is not null
    //        save the owner's stack to its private save stack
    //          and release the stack
    //    end
    //    let `co` be the owner of the stack and resume it's stack
    // end
    if(co->share_stack->owner != co){
        if(co->share_stack->owner != NULL){
            owner_co = co->share_stack->owner
            save owner_co's stack to owner_co->save_stack
            co->share_stack->owner = NULL
        }
        restore co's save stack to share_stack
        co->share_stack->owner = co
    }
    // do the context switch
    acosw(main_co, co)
}

If you want to go further, you may also read the "Best Practice" part.

ioquatix commented 5 years ago

Thanks for this. Sorry for the late reply.

So for my understanding:

When switch coroutine, if the stack was being used by someone else, you memcpy the data from shared stack to their private area, and then copy your own private stack to shared stack, and resume operation?

hnes commented 5 years ago

Thanks for this. Sorry for the late reply.

Never mind :-)

So for my understanding: A. Each coroutine has it's own private stack area. B. There is one share stack (or more?).

A is right.

There are two constraints when you are using libaco:

  1. One coroutine must and could only have one shared stack (in here the phrase "be linked to" would be more precise than the word "have").

  2. One shared stack could be shared among many coroutines.

You could choose to create as much shared stacks as you want as long as the two constraints are satisfied.

When switch coroutine, if the stack was being used by someone else, you memcpy the data from shared stack to their private area, and then copy your own private stack to shared stack, and resume operation?

Yes, and there is also an optimization: if the shared stack is already occupied by itself (the co you want to switch to) and there would be no such memcpy operation anymore.

ioquatix commented 5 years ago

Ah, I see, so you can have M-N mapping of coroutines to stacks.

That means you could have 10 coroutines sharing one stack area (but 10x private stack areas). Or you could have 10 coroutines sharing two stack areas.

What is the cost of shared stack vs private stack:

hnes commented 5 years ago

Ah, I see, so you can have M-N mapping of coroutines to stacks.

That means you could have 10 coroutines sharing one stack area (but 10x private stack areas). Or you could have 10 coroutines sharing two stack areas.

Sorry, I didn't seem to make myself clear. Yes you could have 10 coroutines sharing one stack area. But if you want to use two shared stack say A & B, then you can let 3 coroutines share the stack A and the other 7 coroutines share the stack B. It is still N:1, but you could choose to have many "N:1".

ioquatix commented 5 years ago

Do you mind explaining the benefits of this design?

i.e. What is the cost of shared stack vs private stack:

ioquatix commented 5 years ago

Is it just a way to have stacks smaller than one page of memory?

hnes commented 5 years ago

Memcpy can be expensive if stack is big, but cheap if stack is small.

Yes, as describe in the "Best Practice" part of the README.md:

> if you want to gain the ultra performance of libaco, just keep the stack usage of the non-standalone non-main co at the point of calling aco_yield as small as possible.

private per-coroutine stack can be smaller than one memory page (4kb) so you can have smaller per-coroutine stack.

Yes. It is very easy to keep the size of the private stack to be less than 1KB and it is usually about 100 to 500 bytes.

I assume shared stack must be at least 4kb or 8kb with guard page?

If the shared stack you want to create is guard page enabled then the answer is:

Yes but only when the page size of the OS is 4KB (for most of the platforms). Because we have to use mmap and mprotect to create the guard page, so it is dependent on the page size. And page size is OS dependent.

hnes commented 5 years ago

Is it just a way to have stacks smaller than one page of memory?

If the word "stack" means shared stack which is not private stack, then you could do it by creating a shared stack without the guard page. But such usage is very dangerous (stack overflow) and so it is not recommended.