Open TocarIP opened 1 month ago
SLP vectorizer may generate long vectors, even if -slp-max-reg-size is specified. It relies on the codegen ability of long vectors splitting. -slp-max-reg-size does not limit the size of vector, but the size of vector registers. The vector itself still may span across several vector registers.
I'm trying to limit generation of wide AVX instruction to reduce frequency impact/performance regression. For the following example (consecutive FP division): https://godbolt.org/z/reP9c78cM I get vector division :
vdivpd %ymm0, %ymm1, %ymm0
with 256-bit wide register. I've checked IR and SLP indeed generates%5 = fdiv <4 x double> %2, %4
. When I try to limit register size to 128 I get the same results. Even when building with -mllvm -slp-max-reg-size=1 which should basically remove any slp vectorization completely. Wide AVX is know to cause significant performance regression from reduced frequency on some CPUs (especially older ones)