Open Validark opened 1 month ago
This code (Godbolt link):
export fn clz(x: @Vector(2, u64)) @Vector(2, u64) { return @clz(x); }
Gives me this emit for the Apple M3:
clz: ushr v1.2d, v0.2d, #1 orr v0.16b, v0.16b, v1.16b ushr v1.2d, v0.2d, #2 orr v0.16b, v0.16b, v1.16b ushr v1.2d, v0.2d, #4 orr v0.16b, v0.16b, v1.16b ushr v1.2d, v0.2d, #8 orr v0.16b, v0.16b, v1.16b ushr v1.2d, v0.2d, #16 orr v0.16b, v0.16b, v1.16b ushr v1.2d, v0.2d, #32 orr v0.16b, v0.16b, v1.16b mvn v0.16b, v0.16b cnt v0.16b, v0.16b uaddlp v0.8h, v0.16b uaddlp v0.4s, v0.8h uaddlp v0.2d, v0.4s ret
I think it should do something like this:
export fn clz2(x: @Vector(2, u64)) @Vector(2, u64) { const clz_with_u32_granularity: @Vector(4, u32) = @clz(@as(@Vector(4, u32), @bitCast(x))); const base = @as(@Vector(2, u64), @bitCast(clz_with_u32_granularity)) >> @splat(32); const mask = @select(u32, @as(@Vector(4, u32), @bitCast(base)) == @as(@Vector(4, u32), @splat(32)), clz_with_u32_granularity, @as(@Vector(4, u32), @splat(0)), ); return base + @as(@Vector(2, u64), @bitCast(mask)); }
That gives us this assembly:
clz2: clz v1.4s, v0.4s ushr v0.2d, v1.2d, #32 movi v2.4s, #32 cmeq v0.4s, v0.4s, v2.4s and v0.16b, v1.16b, v0.16b usra v0.2d, v1.2d, #32 ret
Alternatively, the usra could probably have been an add.
usra
add
Assuming I didn't mess anything up, Z3 seems to prove this is a correct transformation? https://alive2.llvm.org/ce/z/878QXU
@llvm/issue-subscribers-backend-aarch64
Author: Niles Salter (Validark)
This code (Godbolt link):
Gives me this emit for the Apple M3:
I think it should do something like this:
That gives us this assembly:
Alternatively, the
usra
could probably have been anadd
.Assuming I didn't mess anything up, Z3 seems to prove this is a correct transformation? https://alive2.llvm.org/ce/z/878QXU