I think SIMD should provide in spec operations to do sum, multiplication, min and max on vectors.
sum would be a.x + a.y + a.z + a.w for example. And be polyfilled or emulated in scalar code if not available in hardware. These are often used at the end of the loop for example and many architectures do support it, and it makes code cleaner to express these reductions using functions and no manual access to lanes.
I think SIMD should provide in spec operations to do sum, multiplication, min and max on vectors.
sum
would bea.x + a.y + a.z + a.w
for example. And be polyfilled or emulated in scalar code if not available in hardware. These are often used at the end of the loop for example and many architectures do support it, and it makes code cleaner to express these reductions using functions and no manual access to lanes.