Open SuperDIMMaX opened 4 years ago
Looking at the assembly output might be illuminating. Also, a time profile should quickly reveal what is going on here, or at least where to look.
I can confirm LDC being almost 3x slower. -O
shouldn't have any effect for this trivial code as there isn't anything really to optimize. A quick look at druntime's core.thread
package didn't reveal big diffs for LDC, except for LLVM-style asm vs. DMD asm... And _d_arrayappendcTX
is identical to upstream, except for being @weak
, but that shouldn't matter either.
In my Linux VM (on a Win64 host), the DMD (2.091.1) and LDC (v1.21.0) timings are pretty much identical (~0.33s) - and that's pretty much exactly the time LDC requires on Win64 too, whereas DMD finishes in ~0.11 secs on Windows (both -m64 and -m32).
I've just noticed that version (LDC_Windows)
isn't defined anymore in core.thread.fiber
after the core.thread
-split into a package, so we've largely been using DMD's asm. Using the LLVM asm makes no difference.
Timings seem to be roughly the same on macOS as well, although I didn't test matching DMD/LDC versions.
The timings with -m32
are even slightly worse (~350 vs. 325 msecs). - I'm not really worried, as that 3x speed-up for DMD on Windows vs. Linux feels suspicious to me, while LDC shows no significant changes across OS. Note that the Win64 asm is more involved than all others, as it also saves and restores the XMM registers when switching fibers, contrary to Posix, so a 3x speedup seems hardly justifiable.
Version : ldc2-1.22.0-beta2-windows-x64 OS: Windows 7 This is strange, but... LDC releaseMode with full optimization - 2-2.5 times slower, than DMD in debug... Need confirm. Step to reproduce:
UPD: rewrite to Generator with
foreach(key; gen.take(1_000_000))
andtestArray ~= key;
... and some results :(p.s.: any solutions to make this work on native speed?