Open eLeCtrOssSnake opened 1 year ago
Performance for this example can be recovered using for(dataset_packed) |*v, k|
but you have to be careful to insert a copy in exactly the right place:
for(dataset_packed) |*v, k| {
dataset_packed[k].a +%= v.a;
const v_copy = v.*;
dataset_packed[k].b = v_copy.c and v_copy.d;
dataset_packed[k].c = v_copy.b and v_copy.d;
dataset_packed[k].d = v_copy.b and v_copy.c;
}
If v_copy
is moved up a line and used for the entire loop body, performance is still bad.
If v.*
is not copied at all, then this does not compute the same result as the original code.
Zig Version
0.10.0-dev.4560+828735ac0
Steps to Reproduce and Observed Behavior
https://godbolt.org/z/6ccjvdK6e This benchmark clearly shows performance degradation between stage1 and stage2. Remove -fstage1 compile argument to see stage2 results. Disassembly shows worse vectorization of struct access, especially the packed struct on stage2.
Expected Behavior
No performance regression.