Basic floating-point operations are underspecified

Muon commented 1 year ago

Currently, the LangRef does not specify the results of basic floating-point operations (fadd, fsub, fmul, fdiv) in any detail. APFloat uses IEEE 754 semantics, but the LangRef does not guarantee it. What guarantees are there about the behavior of floating-point code? If IEEE 754 is the intended model, then x87 codegen is completely broken and probably other targets are as well. If not IEEE 754, then what is the intended model?

arsenm commented 7 months ago

I think we just need to document that float=IEEE float. In the past we pretended that targets were allowed to codegen as some other format, but that never realistically worked and is an onerous burden with no known user

RalfJung commented 7 months ago

Cc @jcranmer-intel @jyknight

jcranmer-intel commented 7 months ago

There's some comments on https://github.com/llvm/llvm-project/issues/44218 about the x87 precision issue. It's something that's possible to fix, but given the lessening importance of 32-bit x86 (especially given that SSE-based x86 is a practical option there), the drive to fix it isn't high.

The other main potential issue I can think about is FTZ/DAZ. I'm not a connoisseur of all the architecture manuals for the architectures we support, but on a quick flick through them, I'm not entirely able to rule out the possibility that we target some architecture where the hardware can support subnormals properly per IEEE 754. But in general, our denormal story is already a bit of a mess (see also https://github.com/llvm/llvm-project/pull/80475 and the discussion ongoing there).

Most brutally honestly, our floating-point semantics are "assume IEEE 754 semantics, with IEEE 754-2008 signaling bit position, in terms of layout" with computation rules being "whatever the hardware does, but the compiler pretends it's IEEE 754 in default environment unless there's flags saying otherwise." Not the semantics I'd want to define, but if the user has some guarantees about the reasonableness of the FP hardware, then the compiler can generally uphold those guarantees.

bfloat, x86_fp80 and ppc_fp128 aren't IEEE 754 types. bfloat can be fully described as a IEEE 754 format with different p and emax parameters than half. x86_fp80 very nearly can be, but it also has noncanonical encodings that need clarification of LLVM semantics (as would decimal floating-point types, if/when they are added). For ppc_fp128, I am not prepared to assert anything about how to properly specify semantics beyond my knowledge that it, like x86_fp80 has noncanonical encodings and that ppc_fp128 does not fit into a simple IEEE 754 base/significant/exponent model.

Another area of underspecification is the interaction of things like fast-math flags and strict floating-point mode. We don't actually document what happens if you have an fadd instruction in a strictfp function, for example.

The current LangRef does specify this:

The binary format of half, float, double, and fp128 correspond to the IEEE-754-2008 specifications for binary16, binary32, binary64, and binary128 respectively.

andykaylor commented 7 months ago

with computation rules being "whatever the hardware does, but the compiler pretends it's IEEE 754 in default environment unless there's flags saying otherwise." Not the semantics I'd want to define, but if the user has some guarantees about the reasonableness of the FP hardware, then the compiler can generally uphold those guarantees.

With this definition, we run into problems with constant folding. For example, if the native fdiv instruction for the target hardware doesn't return correctly-rounded results, then constant folding fdiv to a correctly rounded result may be a value-changing transformation.

This question came up recently in a discussion among SYCL developers. Does the LLVM IR fdiv instruction require that the result be correctly rounded? My initial reaction was, "of course it does!" But when you think about this in terms of targets that don't have a correctly rounded fdiv instruction, that's not so obvious anymore. If I'm writing code for a device that I know doesn't have correctly-rounded native division, do I really want the compiler to insert a bunch of extra code to get a correctly-rounded result? Almost certainly not.

So what are the semantics of fdiv? My take on it is that there is no specific requirement for accuracy, but the compiler isn't allowed to do anything that would change the result, which ultimately boils down to something like the description you gave. I think this runs into exactly the kinds of problems that @efriedma-quic was trying to explain to me here (https://discourse.llvm.org/t/propogation-of-fpclass-assumptions-vis-a-vis-fast-math-flags/76554), but I think this is something that we're going to need to learn to live with if we want to support non-CPU targets.

Even without considering non-CPU hardware, we have this problem for things like llvm.cos. What are the semantics of this intrinsic? "Return the same value as a corresponding libm ‘cos’ function but without trapping or setting errno." But we haven't said which implementation of libm we're talking about, and so if the compiler constant folds a call to this intrinsic, but the compiler was linked against a different libm implementation than the program being compiled will use, that may be a value-changing optimization. I've brought this up before (https://discourse.llvm.org/t/fp-constant-folding-of-floating-point-operations/73138) and there wasn't much support for maintaining strict numeric consistency in cases like this, but I still think that should be our default behavior.

jcranmer-intel commented 7 months ago

Putting on a few different hats here, so bear with me:

From the perspective of a language designer on a new programming language, it is useful to have code that works reliably across the hardware spectrum, even if it comes at some performance costs. Consider that newer languages specify that something like i16 evaluates strictly as an i16 in standard in arithmetic operators as opposed to being auto-promoted to the native register size (which would be slightly faster if the hardware doesn't have a dedicated add i16 instruction). With this hat on, floating-point operations should follow IEEE 754 behavior nearly exactly, with hardware that can't support this being "exotic" and having to do extra work to make it work.

From the perspective of someone trying to support "exotic" hardware (whether as a compiler writer or a programmer), it's generally the case that they want to the high-level language operator to map more directly to the instruction. So if you've got a hardware that isn't capable of doing correctly-rounded fdiv, you would probably prefer to have fdiv map to whatever precision you actually get.

Note the tension between these two hats from a user perspective: a user who's writing portable code probably would prefer to get the same results on all platforms, at the cost of performance, whereas a user who's targeting only a particular hardware would probably prefer to get the fastest code on that hardware, at the cost of different results on other hardware. Probably the only feasible way to square this tension is to simply provide both a way to get a fully-accurate result and a way to get a loosened-accuracy result.

Even without considering non-CPU hardware, we have this problem for things like llvm.cos. What are the semantics of this intrinsic? "Return the same value as a corresponding libm ‘cos’ function but without trapping or setting errno." But we haven't said which implementation of libm we're talking about, and so if the compiler constant folds a call to this intrinsic, but the compiler was linked against a different libm implementation than the program being compiled will use, that may be a value-changing optimization. I've brought this up before (https://discourse.llvm.org/t/fp-constant-folding-of-floating-point-operations/73138) and there wasn't much support for maintaining strict numeric consistency in cases like this, but I still think that should be our default behavior.

When the C++ committee talked about making these functions constexpr, I tried to point out this problem out to them and get them to care about it and abjectly failed. The general sentiment being along the lines "it's already a mess, how is this going to make it more a mess". As for what LLVM can do, we have correctly-rounded versions of these functions in llvm-libc, so in theory we could leverage that to at least make the optimizations not dependent on the host library...

Another LLVM libm intrinsic problem is that most of those intrinsics are lowered to calls to libm if there's no hardware support for them, which means good chance they set errno. Another other LLVM libm intrinsic problem is now we have some optimizations that kick in only if you call libm functions directly and other optimizations that only kick in if you use the intrinsics because they are optimized in two different passes with two different sets of rules.

IMHO, we do need to work on a better specification of floating-point semantics in general, but this is also a deeper conversation that's going to require multiple RFCs to the mailing lists, and several changes to LLVM, rather than some back-and-forth in this issue.

Muon commented 7 months ago

Firstly, I think it should be kept in mind that anything less than a formal model will almost certainly result in more soundness bugs like https://github.com/llvm/llvm-project/issues/44218 showing up down the line. Without a formal model, it is impossible to define what a valid optimization even is.

I think LLVM should officially adopt the semantics of IEEE 754 as the intended model for the basic operations. On ISAs which do not natively support these operations (e.g. x87), they should instead be emulated, or failing that, deferred to a softfloat library. If the ISA has instructions implementing nonstandard semantics, then those instructions should be exposed as intrinsics or similar.

Regarding intrinsics for trigonometric and transcendental functions, I think the text should be changed to refer to an "implementation-specific approximation", or similar. In any case, dynamic linking makes this largely impossible to optimize around. (Ideally, these should be correctly-rounded as well in the year 2024. I am pleasantly surprised to hear that llvm-libc has this!)

llvm / llvm-project

Basic floating-point operations are underspecified #60942