The trick here is performed automatically by the compiler for targets that only support 32-wide vectors. However, on zen4, for example, this trick is not optimal and disables the compiler from generating better code. This change enables the compiler to do this optimization only when necessary.
The trick here is performed automatically by the compiler for targets that only support 32-wide vectors. However, on zen4, for example, this trick is not optimal and disables the compiler from generating better code. This change enables the compiler to do this optimization only when necessary.