vorner / slipstream

Nudging the compiler to auto-vectorize things
Apache License 2.0
71 stars 4 forks source link

FMA #10

Closed HadrienG2 closed 1 year ago

HadrienG2 commented 1 year ago

Unlike GCC, LLVM does not automatically transform x * y + z into x.mul_add(y, z) unless special flags are passed because they consider that changing floating-point output is unacceptable even if it's in the direction of increased precision.

Therefore, to leverage hardware FMA when it is available, we need to explicitly use the mul_add() operation, which slipstream unfortunately does not expose at the moment.

vorner commented 1 year ago

I have no objection against that being put in place somehow (how to do it in a generic way?). But I don't have the time to do it, so, I'm open to pull requests.