Open tobiasgrosser opened 11 years ago
mentioned in issue llvm/llvm-bugzilla-archive#16275
After a long discussion on the list, this approach can be very problematic for NEON intrinsics (which require that NEON instructions be generated no matter what IEEE status or fast-math flags). Since IR doens't differentiate between code that has been produced by vectorizers or NEON intrinsics, we can't apply any serialization rule indiscriminately.
The only option left would be to have an extra command line option requesting IEEE compliance, and then it would be the user's responsibility to check the existence of NEON intrinsics, hand-crafted IR, etc.
This is also too big a hammer to fix #16275, which already has its own fix.
All in all interesting, but too low on the priority list for me to work on it.
This may be a way to fix the fast-math issue in llvm/llvm-bugzilla-archive#16275 .
Extended Description
On ARM it is difficult to generate optimal code that matches certain floating point precision requirements. The only way that currently exist is to explicitly alter the feature flags of the CPU that we target. This currently causes problems, such that it is e.g. not possible to take advantage of NEON for integer instructions while at the same time NEON is avoided for vector floating point operations.
The following test cases illustrate how I expect llc to behave: