treeform / greenlet

Greenlet - Coroutines library for nim similar to python's greenlet.
34 stars 2 forks source link

Benching against Vyukov coroutines #1

Open mratsim opened 4 years ago

mratsim commented 4 years ago

The library looks very interesting, especially given that fast resumable fibers/continuation could serve to unify IO and CPU tasks.

One thing that I never get to measure is the improvement of Vyukov coroutines brings over ucontext and setjmp.

It's basically combining both setjmp and ucontext to have platform independent coroutines and avoid expensive syscalls (afaik it seems lie cgreenlet doesn't support ARM) http://www.1024cores.net/home/lock-free-algorithms/tricks/fibers:

struct fiber_t
{
    ucontext_t  fib;
    jmp_buf     jmp;
};

struct fiber_ctx_t
{
    void(*      fnc)(void*);
    void*       ctx;
    jmp_buf*    cur;
    ucontext_t* prv;
};

static void fiber_start_fnc(void* p)
{
    fiber_ctx_t* ctx = (fiber_ctx_t*)p;
    void (*ufnc)(void*) = ctx->fnc;
    void* uctx = ctx->ctx;
    if (_setjmp(*ctx->cur) == 0)
    {
        ucontext_t tmp;
        swapcontext(&tmp, ctx->prv);
    }
    ufnc(uctx);
}

inline void create_fiber(fiber_t& fib, void(*ufnc)(void), void* uctx)
{
    getcontext(&fib.fib);
    size_t const stack_size = 64*1024;
    fib.fib.uc_stack.ss_sp = (::malloc)(stack_size);
    fib.fib.uc_stack.ss_size = stack_size;
    fib.fib.uc_link = 0;
    ucontext_t tmp;
    fiber_ctx_t ctx = {ufnc, uctx, &fib.jmp, &tmp};
    makecontext(&fib.fib, (void(*)())fiber_start_fnc, 1, &ctx);
    swapcontext(&tmp, &fib.fib);
}

inline void switch_to_fiber(fiber_t& fib, fiber_t& prv)
{
    if (_setjmp(prv.jmp) == 0)
        _longjmp(fib.jmp, 1);
}

Actual code in a library: https://github.com/dvyukov/relacy/blob/dc6be4854d82491483b1781254753706df68a8b3/relacy/platform.hpp

Dmitry Vyukov is a lot of projects related to optimizing concurrency and multithreading at Google, including the Go scheduler and goroutines, Tensorflow and Eigen scheduler for neural networks and LLVM ThreadSanitizer so I expect that it's also fast.

treeform commented 4 years ago

that lib uses setcontext family of functions which is deprecated POSIX standard in 2004 - 16 years ago: https://en.wikipedia.org/wiki/Setcontext

setcontext is part of the benchmark all this lib does is wrap it a little bit.

cgevent switcher appears to be 220 times faster then setcontext

mratsim commented 4 years ago

From IRC

As Vyukov analyzed in there: http://www.1024cores.net/home/lock-free-algorithms/tricks/fibers, setcontext requires 2 syscalls (which are about 150 cycles each) his code uses setjmp combined with setcontext AFAIK to avoid those costs so pure setcontext or pure setjmp benchmarks are probably not applicable.

sinkingsugar commented 4 years ago

FYI don't use ucontext_t https://www.boost.org/doc/libs/1_72_0/libs/context/doc/html/context/performance.html

Boost context fcontext_t is basically very similar (check the asm) and battle tested.

Also FYI nim default coroutines on x86 are also implemented in asm this way, migrating to the full boost codebase would be the ideal thing to do.

Finally this package is not considering the GC at all, I again suggest checking out default nim coros, of course with --gc:arc there is no need for stacks bookkeeping tho.