I suspect this is an issue in upstream LLVM. The sse2 version and the unsigned version (_mm256_mulhi_epu16) show the same problem. If a wider register is available (xmm -> ymm -> zmm) that will be used instead of splitting the values between 2 different ones.
I suspect this is an issue in upstream LLVM. The sse2 version and the unsigned version (
_mm256_mulhi_epu16
) show the same problem. If a wider register is available (xmm -> ymm -> zmm) that will be used instead of splitting the values between 2 different ones.Code
https://godbolt.org/z/9Eqb45Keq
I tried this code:
I expected to see this happen: more or less the same codegen as with a -1000 in multiplier
Instead, this happened: it looks like the vector is widened to i32 for no good reason.
Version it worked on
It most recently worked on: Rust 1.74
Version with regression
I checked on godbolt with 1.75-1.81 and whatever beta and nightly are today.