Closed Validark closed 1 month ago
@llvm/issue-subscribers-backend-x86
Author: Niles Salter (Validark)
The AND to clamp the shift amount is irrelevant for the SHL -> PSHUFB lowering as anything out of bounds would be poison anyway. All we need is to be shifting a vXi8 splat constant for this to work.
The AND to clamp the shift amount is irrelevant for the SHL -> PSHUFB lowering as anything out of bounds would be poison anyway. All we need is to be shifting a vXi8 splat constant for this to work.
I included it because you can't really do an out-of-bounds shift in Zig. You have to do a @truncate
which gives you the lower log_2(int) bits or an @intCast
which is a promise that it's already truncated, and oftentimes an AND gets inserted anyway.
This code: (Godbolt link)
Compiles like so for Zen 3:
However, because the bytes resulting from
@truncate(chunk)
are in the range [0, 7], we can precompute all 8 possible answers and use vpshufb instead (Godbolt, full code):