ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.37k stars 2.51k forks source link

stage2 performance regression regarding struct and packed struct vectorization #13373

Open eLeCtrOssSnake opened 1 year ago

eLeCtrOssSnake commented 1 year ago

Zig Version

0.10.0-dev.4560+828735ac0

Steps to Reproduce and Observed Behavior

https://godbolt.org/z/6ccjvdK6e This benchmark clearly shows performance degradation between stage1 and stage2. Remove -fstage1 compile argument to see stage2 results. Disassembly shows worse vectorization of struct access, especially the packed struct on stage2.

Expected Behavior

No performance regression.

topolarity commented 1 year ago

Performance for this example can be recovered using for(dataset_packed) |*v, k| but you have to be careful to insert a copy in exactly the right place:

for(dataset_packed) |*v, k| {
    dataset_packed[k].a +%= v.a;
    const v_copy = v.*;
    dataset_packed[k].b = v_copy.c and v_copy.d;
    dataset_packed[k].c = v_copy.b and v_copy.d;
    dataset_packed[k].d = v_copy.b and v_copy.c;
}

If v_copy is moved up a line and used for the entire loop body, performance is still bad.

If v.* is not copied at all, then this does not compute the same result as the original code.