Open ManDeJan opened 3 years ago
If you decide to work on this, I think you can leverage this repo to measure any performance improvements you may gain: https://github.com/ziglang/gotta-go-fast
Also if there's one thing I've learned, it's that modern processors are very unpredictable when it comes to performance. Things you think might make something faster may actually make things slower. For example, a simple implementation, even if it does more work, may run faster if it's encoded in fewer instructions due to the icache. One of the slowest things a computer does nowadays is fetch memory from ram, so any extra instructions and/or data you need to access, espcially if it's not within the same cache line can have drastic affects on performance.
In any case, that's why it's good for us to use benchmarks where we can measure whether something results in a performance improvement. In your case here, the compiler test suite might be a good benchmark for this.
Relevant snippet from @fengb: https://github.com/fengb/fundude/blob/ad1783281625f2b7b05fbced8ca358d79ff51b7e/src/util.zig#L105-L110
I just ran into this and had to call out to __builtin_memcmp to get performance on par with the ported C code (after noticing the std.mem.eql code standing out prominently in the profiler)
In short, swapping out std.StringHashMap with std.HashMap with a context pointing to a __builtin_memcmp wrapper for the eql
implementation increased performance by 20%.
FWIW there's a memcmp.zig (https://github.com/ziglang/zig/blob/master/lib/compiler_rt/memcmp.zig) that seems to be completely unreferenced in the entire codebase (dead file) and it seems like it could be used for a @memcmp
builtin.
I discovered when I was benchmarking some parser code I was working on. As the title states, the current implementation of
std.mem.eql
is very naive:https://github.com/ziglang/zig/blob/bf67a3fdc9efea7126058b21e687c24868bc1268/lib/std/mem.zig#L485-L493
It iterates over slices byte by byte which causes the compiler to generate a bunch of compare and jump instructions: https://godbolt.org/z/Mobabsq8a
One of the most frequent use cases (95%+ in the standard library) for
std.mem.eql
is comparing a string with a compile time known string. Seeing as comparing memory is so common; especially in high performance code like parsers (and a bunch of other standard library functions). I think it is worth having a less naive more optimized implementation ofstd.mem.eql
. I identified a few points which could improve the implementation:if (a.ptr == b.ptr) return true;
line. (I think this is a premature optimization) It causes a extra comparison and branch in all compilations of the eql function but I can't think of any scenarios where this would acutally be a major time save.I was working on a implementation that did some of the above but haven't been able to get it done yet. There are so many cases to account for and some of the optimizations you can only make if you know something about the length of the comparison at compile time. I think having a dedicated
@memcmp
akin to@memcpy
and@memset
might be a solution, though I'm not sure how much work adding that would be :-)