ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
35.25k stars 2.57k forks source link

Suboptimal Shifting Codegen #12285

Open presentfactory opened 2 years ago

presentfactory commented 2 years ago

Zig Version: 0.10.0-dev.3313+cff5d9c80

I was experimenting with how Zig generates code with oddly sized integers with regards to shifting instructions and found a case where I think it is generating poor quality code (unless I am missing some key factor here). For instance this code:

export fn foo(num: u32) u32 {
  return @shlExact(@as(u15, 1), @intCast(u4, num));
}

As far as I know @shlExact makes it UB to shift 1 bits out of the range (a u15 in this case), and @intCast makes it UB to cast from a value outside that range as well. Knowing this I'd expect it simply to generate a mov for the constant 1 and a shift, but instead I get this:

foo:
        and     dil, 15
        mov     eax, 1
        shlx    eax, eax, edi
        and     eax, 32767
        ret

To me these and operations are pointless due to the previously mentioned constraints and I am not sure why they are appearing here. Interestingly as well, when I change the u15 to a u16 I get a different result:

foo:
        and     dil, 15
        mov     eax, 1
        shlx    eax, eax, edi
        ret

That looks more like what I'd expect, other than the strange and on the shift amount still. Perhaps this difference is due to there being one technically invalid shift value for the u15 which can fit in a u4 (the value 15) which it feels the need to mask against, but again that doesn't seem like it'd matter since it's supposed to be undefined behavior anyways.

Also perhaps unsurprisingly when working with more typical integers things work fine:

export fn foo(num: u32) u32 {
  return @shlExact(@as(u32, 1), @intCast(u5, num));
}

Resulting in:

foo:
        mov     eax, 1
        shlx    eax, eax, edi
        ret

Edit: I did find that if you add an assert in to try and inform the compiler of the value limitations this fixes the and-ing of the input at least, so maybe the solution can be implemented in a similar way:

export fn foo(num: u32) u32 {
  std.debug.assert(num < 16);
  return @shlExact(@as(u15, 1), @intCast(u4, num));
}
topolarity commented 2 years ago

Looks like a missed LLVM optimization:

; Function Attrs: nobuiltin nounwind
define i32 @foo(i32 %0) #1 {
Entry:
  %1 = trunc i32 %0 to i4
  %2 = zext i4 %1 to i15
  %3 = shl nuw i15 1, %2
  %4 = zext i15 %3 to i32
  ret i32 %4
}

Unfortunately, llc doesn't take advantage of the nuw even at -O3. That's assuming that this optimization would be legal of course, but I agree that it probably would be.