ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
32.13k stars 2.35k forks source link

Inefficient code generation of `@mod` and `@divFloor` #19935

Open ListeriaM opened 1 month ago

ListeriaM commented 1 month ago

Zig Version

0.12.0

Steps to Reproduce and Observed Behavior

When using signed integers and power of 2 denominators, the generated assembly contains a bunch of unnecessary instructions (Compiler Explorer)

Expected Behavior

A simple and or sar (for @mod or @divFloor respectively) instruction would be enough in this case (Compiler Explorer). Note that the example only works for comptime_int, but I'd expect it to work for any comptime known value regardless of the type. Ideally such a workaround wouldn't be necessary and I'd use @mod and @divFloor directly

xdBronch commented 1 month ago

i dont think these transformations are valid when the number is negative. depending on what youre doing and what you can guarantee, using @divExact or adding if (num < 0) unreachable; will give the codegen youre asking for. in any case unless you see an inefficiency in the LLVM IR zig emits, optimizations like this are generally on LLVM not us

for comparison heres the equivalent in C https://godbolt.org/z/EnP6MEaP1 in all cases its identical when you use the builtins that map to what C does

ListeriaM commented 1 month ago

that's not quite the same, these transformations work for negative numerators as well (note that it's using sar instead of shr), I can't speak for the LLVM IR as I'm not as well versed in it. In any case, I'm not sure we should expect LLVM to be able to find a transformation like that.

xdBronch commented 1 month ago

hm yeah my bad that looks right. i dont know where the logic in zig is for this stuff or if it generally attempts any kind of optimization on its own.

ListeriaM commented 1 month ago

In case it helps, I uploaded the code I was writing when I found this to a repo, which has 2 branches, one uses the builtins and the other uses the alternatives mentioned here.