ldc-developers / ldc

The LLVM-based D Compiler.
http://wiki.dlang.org/LDC
Other
1.22k stars 262 forks source link

Windows fibers slower than DMD #3463

Open SuperDIMMaX opened 4 years ago

SuperDIMMaX commented 4 years ago

Version : ldc2-1.22.0-beta2-windows-x64 OS: Windows 7 This is strange, but... LDC releaseMode with full optimization - 2-2.5 times slower, than DMD in debug... Need confirm. Step to reproduce:

import core.thread;

struct Vec2{
    int x;
    int y;
}

static Vec2[] testArray;

void fiberGen(){
    int x,y;
    while(true){
        testArray ~= Vec2(x++,y++);
        Fiber.yield();
    }
}

int main(string[] args)
{
    auto testFiber = new Fiber(&fiberGen);
    for (int i = 0; i< 1_000_000; i++)
    {
        testFiber.call();
    }
    return 0;
}

UPD: rewrite to Generator with foreach(key; gen.take(1_000_000)) and testArray ~= key; ... and some results :(

p.s.: any solutions to make this work on native speed?

dnadlinger commented 4 years ago

Looking at the assembly output might be illuminating. Also, a time profile should quickly reveal what is going on here, or at least where to look.

kinke commented 4 years ago

I can confirm LDC being almost 3x slower. -O shouldn't have any effect for this trivial code as there isn't anything really to optimize. A quick look at druntime's core.thread package didn't reveal big diffs for LDC, except for LLVM-style asm vs. DMD asm... And _d_arrayappendcTX is identical to upstream, except for being @weak, but that shouldn't matter either.

kinke commented 4 years ago

In my Linux VM (on a Win64 host), the DMD (2.091.1) and LDC (v1.21.0) timings are pretty much identical (~0.33s) - and that's pretty much exactly the time LDC requires on Win64 too, whereas DMD finishes in ~0.11 secs on Windows (both -m64 and -m32).

kinke commented 4 years ago

I've just noticed that version (LDC_Windows) isn't defined anymore in core.thread.fiber after the core.thread-split into a package, so we've largely been using DMD's asm. Using the LLVM asm makes no difference.

dnadlinger commented 4 years ago

Timings seem to be roughly the same on macOS as well, although I didn't test matching DMD/LDC versions.

kinke commented 4 years ago

The timings with -m32 are even slightly worse (~350 vs. 325 msecs). - I'm not really worried, as that 3x speed-up for DMD on Windows vs. Linux feels suspicious to me, while LDC shows no significant changes across OS. Note that the Win64 asm is more involved than all others, as it also saves and restores the XMM registers when switching fibers, contrary to Posix, so a 3x speedup seems hardly justifiable.