Must a `const fn` behave exactly the same at runtime as at compile-time?

oli-obk commented 3 years ago

TLDR: should we allow floating point types in const fn?

Basically the question is whether the following const fn

const fn foo(a: f32, b: f32) -> f32 {
    a / b
}

must yield the same results for the same arguments if it is invoked at runtime or compile-time:

const RES1: f32 = foo(1.0, 0.0);

fn main() {
  let res2: f32 = foo(1.0, 0.0);
  assert_eq!(RES1.to_bits(), res2.to_bits());
}

Depending on the platform's NaN behavior, the result will differ between runtime and compile-time execution of foo(1.0, 0.0). Compile-time execution is determined by the Rust port of apfloat (a soft-float implementation); runtime behavior depends on the actual NaN patterns used by the hardware which are not always fully determined by the IEEE specification.

Note that this is entirely independent of any optimizations; we are discussing here the relationship between code that the user explicitly requests to be executed at compile-time, and regular run-time code. Optimizations apply to all code equally and they treat fn and const fn the same, so the the questions of how floating-point operations can be optimized is an entirely separate from and off-topic for this issue.

cc @rust-lang/wg-const-eval

RalfJung commented 3 years ago

(Replying to an older version of the OP)

I think you are asking two different questions here and treat them as if they are the same. Let me explain. :)

More concretely, when we optimize the following code, are we allowed to const propagate the foo call?

Note that this is a different question from the issue title. The behavior at runtime could be non-deterministic according to the spec, so even if you see one particular runtime behavior and a different compile-time behavior, it would still be correct to do const-propagation. Floating-point operations likely are non-deterministic.

I feel rather strongly that const fn must behave in a way that is allowed to occur at runtime; that is just a different way of saying that CTFE must implement the Rust spec. It would be rather strange if that was not the case. From this alone it already follows that const propagation like you are asking is allowed. I cannot see any reasonable way (assuming a bug-free CTFE engine) in which this optimization is not allowed.

But then there is the separate question, do we want to allow non-deterministic operations in const fn? CTFE inherently has to make some choice to resolve the non-determinism(*), and that choice might be different from what codegen+LLVM happen to currently do, which could be surprising for programmers that do not expect such non-determinism to actually be observable (even though it could, in theory, be observable even without any CTFE being involved). I think this is really the question you are asking here, but it is unrelated to const propagation. Unlike "is this optimization correct", this question cannot be answered by proving a theorem; this is a judgment call we could make either way as part of language design.

(*) Actually that is not entirely true -- allocation base addresses are another example of non-determinism, and there CTFE uses a form of symbolic execution to basically track all possible non-deterministic choices at once, and halt evaluation for cases where that is not possible. This is required because the non-deterministic choice made for a certain allocation must be consistent for a given Rust program across const-time and run-time execution. For runtime code that choice is only made when the program actually starts (thanks to ASLR) and thus CTFE has to be done in a way that is compatible with every possible choice made later. We do not need such a heavy hammer for floating-point operations because their non-deterministic choice is much more local, confined to each individual operation.

bugadani commented 3 years ago

I wonder in what real-world use case would somebody want to do complex calculations with NaN-s in compile time. I also wonder if the answer is the same if a compile-time calculation results in a NaN - is it something somebody actually wants, or does it hide an error?

est31 commented 3 years ago

If NaN is the only concern, then the const fn implementation could just error out every time a NaN is encountered/obtained from computation. I also worry about floating point implementations having minuscle differences even for well-defined floats, but can't come up with an example. I see miri gives an error when a dangling pointer is returned in a constant as well (requires nightly to trigger it), so it's doable.

oli-obk commented 3 years ago

So.. if we have a const F: f32 = A + B; and at runtime a let f: f32 = A + black_box(B);, then it is not necessary for F.to_bits() == f.to_bits(), but it must at least be possible for that to be equal for some execution of the runtime code (even if not observable in practice due to CPU bugs).

I also worry about floating point implementations having minuscle differences even for well-defined floats, but can't come up with an example.

I always thought that even non-NaN floating point math can differ (in miniscule ways) between hardware even if it's the same target triple.

the const fn implementation could just error out every time a NaN is encountered/obtained from computation

That's one way, but it may be expensive. We specifically do not validate in const eval like we do in miri, because that validation is expensive. Doing it for floats may be cheaper, but if we also have to look at floats in large structs or arrays it can get expensive very quickly again. We could of course just check during each operation, which is much more direct and should only impact float ops. I think that if nondeterministic operations include non-NaN numbers, then that doesn't help us a lot though.

I think that we can just make a judgement call on how we make such nondetermism deterministic (by choosing one possibility), even if that choice changes between target platforms, compiler versions, optimization levels or other compiler flags.

The main question is then, how to make that choice I guess. Just put it on the const eval roadmap that I should really really finish and have T-lang sign it off?

RalfJung commented 3 years ago

So.. if we have a const F: f32 = A + B; and at runtime a let f: f32 = A + black_box(B);, then it is not necessary for F.to_bits() == f.to_bits(), but it must at least be possible for that to be equal for some execution of the runtime code (even if not observable in practice due to CPU bugs).

No CPU bugs involved. If the spec says that non-deterministically, A or B can happen, then it is completely okay for runtime execution to always do A. Or to do A on Tuesdays and B every other day of the week. Or to do A only in crates whose name starts with a vowel. Or whatever. And CTFE can make its own choice of A or B completely independently on that.

So in your case, both f.to_bits() and F.to_bits() must be results that are allowed by the spec, but since the spec might allow multiple results, the two do not have to be the same. And that's really all we can say; there is no requirement that running the program many times must eventually produce all possible results or so. (This distinguishes non-determinism from randomness.)

I always thought that even non-NaN floating point math can differ (in miniscule ways) between hardware even if it's the same target triple.

AFAIK IEEE fully specifies what happens for primitive FP operations (except for NaN bits). Basically the operation has to return the best possible approximation to the result of the computation if it were carried out on actual rational numbers. (Transcendental functions are a different game.) The remaining differences here really are down to CPU bugs -- 32bit x86 is notorious here, and I hear some architectures handle subnormals (values very close to 0) incorrectly.

See https://github.com/rust-lang/unsafe-code-guidelines/issues/237 for some more open questions about "what even are our floating-point semantics".

That's one way, but it may be expensive. We specifically do not validate in const eval like we do in miri, because that validation is expensive. Doing it for floats may be cheaper, but if we also have to look at floats in large structs or arrays it can get expensive very quickly again. We could of course just check during each operation, which is much more direct and should only impact float ops.

If the latest analysis in https://github.com/rust-lang/rust/issues/73328 is correct, then only floating-point operations are non-deterministic, but copying around FP values is deterministic. I do not think it would be expensive to check at each floating point addition etc whether the result is deterministic (basically, if the result is non-NaN, but we can easily add further conditions), and raise an error if it is not. This has nothing to do with checking the validity invariant.

I think that if nondeterministic operations include non-NaN numbers, then that doesn't help us a lot though.

Why that? I think it would work just as well. Unless of course all operations are non-deterministic, then this would be equivalent to ruling out FP entirely.

oli-obk commented 3 years ago

Why that? I think it would work just as well. Unless of course all operations are non-deterministic, then this would be equivalent to ruling out FP entirely.

What I mean is if we error whenever any operation returns a NaN, but still have other nondeterministic ops, then we haven't gained anything and should either keep ruling out FP ops or just allowing NaN and not checking anything.

If the latest analysis in #73328 is correct, then only floating-point operations are non-deterministic, but copying around FP values is deterministic.

Oooh, neat. That is an improvement to the info I had when we created min_const_fn

RalfJung commented 3 years ago

What I mean is if we error whenever any operation returns a NaN, but still have other nondeterministic ops, then we haven't gained anything and should either keep ruling out FP ops or just allowing NaN and not checking anything.

Sure, we'd have to capture every possible form of non-determinism. Also everything I say on this topic should be checked by an FP expert, which I am not.^^

Oooh, neat. That is an improvement to the info I had when we created min_const_fn

Yes, new info came up since then. Basically I think we should just copy whatever WebAssembly does -- it has an exhaustive and precise spec, and I am sure they involved enough FP experts to make sure the spec is also realistically implementable. Also this means if LLVM does soemthing else we can complain that they are incompatible with WebAssembly, which gives our complaint more weight. ;)

ecstatic-morse commented 3 years ago

Transcendental functions (e.g. trigonometry and non-integral exponentiation), can give different results between platforms. I don't think we can take a "hard-line" approach, since then functions like f32::tan that will take advantage of hardware support if it exists can never be const fn; Users would have to opt-in to a software emulated version (either in std or in the ecosystem). I view CTFE as just another platform on which floating point is supported. Users don't (or at least shouldn't) expect cross-platform consistency at runtime, why is CTFE any different?

Even trying to promise that the result given by CTFE is consistent between compiler versions would be foolish, since it would lock us into a particular software emulation strategy that may become outdated. I think we should guarantee that CTFE engine for a given compiler at a given optimization level is deterministic between runs and nothing more. I think we might already have this for NaN payloads, although it would be nice to have an actual set of rules for how they get handled instead of "whatever LLVM does".

workingjubilee commented 3 years ago

If NaN is the only concern, then the const fn implementation could just error out every time a NaN is encountered/obtained from computation. I also worry about floating point implementations having minuscle differences even for well-defined floats, but can't come up with an example. -@est31

An example of what you may be thinking about is, for instance, ARMv7 Neon (the SIMD vector unit) flushing subnormal float values to zero. A floating-point operation in a register that flushes subnormal values to zero would potentially cause an alteration of the value even when the mantissa is not supposed to change (e.g. f32x4::abs). This is no longer a concern on ARMv8 Neon (usually "aarch64"), which will not do that if you don't ask it to, as far as I'm aware (and I am reasonably sure I would have found out by now).

Part of the reason that x86_64 is well-behaved while x86 is not is that, as @thomcc told me, on x86_64 the XMM registers (introduced as SIMD registers) are fully compliant and compilers exhibit a preference for them even for "scalar" floating point ops. That may sound like a niche oddity but I have been finding in my reading and experiments that "The floating point registers and SIMD registers are actually the same" is actually a fairly common hardware implementation approach... correct FP math is expensive in transistors to implement, so I suppose one saves some surface area on the die by not doing so twice.

In fact, this got me curious enough to feed some code into Godbolt...

pub fn f32_add(a: f32, b: f32) -> f32 {
    a + b
}

pub fn f32_abs(a: f32) -> f32 {
    a.abs()
}

; Compiled with rustc -C opt-level=3 -Cdebuginfo=0 -Ccodegen-units=1 --target armv7-unknown-linux-gnueabihf
example::f32_add:
        vadd.f32        s0, s0, s1
        bx      lr

example::f32_abs:
        vabs.f32        s0, s0
        bx      lr

Yes, those are vector instructions. I had developed a slightly more thorough platform comparison for x86 and Arm targets at the link (nothing particularly exciting, mind) but I excerpted that because it does seem that at least on some ARMv7 triples there is a similar default to using the (potentially buggy, as mentioned) vector unit for floating point math... a sound choice elsewhere, an implementation concern here.

There's more extensive documentation on all the nuances of Arm's various FP implementations spread across Arm's website but I found a relatively succinct explanation of some of the details on Debian's wiki in relation to their own support choices: https://wiki.debian.org/ArmHardFloatPort#Background_information

tavianator commented 3 years ago

This GCC bug is possibly related to this discussion: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93681. LLVM may have similar issues, I'm not sure.

The moral is that it's important that compile-time float evaluation can't lead the optimizer to make assumptions that may be contradicted at runtime.

thomcc commented 3 years ago

This GCC bug is possibly related to this discussion: gcc.gnu.org/bugzilla/show_bug.cgi?id=93681.

That's different since it's constant propagation done by the compiler as an optimization, and not compile-time function execution like const fn. It might be challenging to fix for similar reasons though, I don't know.

A floating-point operation in a register that flushes subnormal values to zero would potentially cause an alteration of the value even when the mantissa is not supposed to change (e.g. f32x4::abs)

abs/neg shouldn't according to https://www.keil.com/support/man/docs/armasm/armasm_pge1423647771863.htm Generally flush-to-zero just treat subnormal numbers as non-canonical encodings of zero. (This is typically more efficient for them since making abs/neg a bit mask)

Even in the case that the're vabs/vneg, according to https://www.cl.cam.ac.uk/research/srg/han/ACS-P35/zynq/ARMv7-A-R-manual.pdf :

A8.8.280 VABS Vector Absolute takes the absolute value of each element in a vector, and places the results in a second vector. The floating-point version only clears the sign bit

A8.8.355 VNEG Vector Negate negates each element in a vector, and places the results in a second vector. The floating-point version only inverts the sign bit.

Anyway this doesn't really matter a ton since flush-to-zero is still a problem regardless of were it's happening.

@workingjubilee Just a couple quick bits of couple bits of elaboration to prevent future confusion. Sorry if you already are aware of these things.

ARMv7 has both "vector floating point" (VFP) operations, and "Advanced SIMD" (Aka NEON). The instructions in that godbolt you linked are VFP and not NEON. (AFAICT the VFP is operations are moestly for scalar use.)

However: "Advanced SIMD" on these machines is (apparently — I didn't know this) never able to disable flush to zero. VFP operations (which is what scalar float operations tend to use) will flush or not based on a status register bit that user mode can change: FZ bit.

By default, this bit is set, although it can be changed at a performance cost. I think it's plausible if we might want to investigating turning it off before main and making it UB to turn back on if this is a soundness problem (note that the related DN/denormals-are-zero bit should also be turned off).

Another option would be to have floats in const fns that target this platform to emulate the flush-to-zero behavior...

I'll try and dig a bit more later. Neither of these are great.

Basically I think we should just copy whatever WebAssembly does

For CTFE? Canonicalizing would probably be better than nondeterminism. The times when canonicalization is allowed to occur is very exhaustively specified in IEEE-754, and the design rationale for it Wasm indicates this wasn't done for performance reasons.

It's also worth noting that wasm when designed was very worried about doing things that would expose more fingerprinting bits to untrusted code. Rust doesn't have nearly the same set of requirements as it — as a result anywhere without 100% consistency is nondeterministic.

Another concern here is that Wasm has no issue preventing people from e.g. changing rounding modes and such. Rust can't prevent this, so... ultimately it would be good to come to terms with the fact that probably can't match target semantics 1:1 from const fn.

Hell, on Rust as it is currently you can change rounding modes and even enable flush to zero: https://doc.rust-lang.org/nightly/core/arch/x86_64/fn._mm_setcsr.html I've never been clear if this was actually sound, but it doesn't say there are any real problems...

workingjubilee commented 3 years ago

@thomcc Ahhh. I was aware of the ASIMD/Neon behavior having "always-on" FTZ... that was what I mentioned when we first discussed this a few days ago... and I had known about there being a difference between VFP and Neon! But I had indeed gotten confused in the middle of everything re: what I was looking at... that's just my week, I suppose. And I didn't know the normal FP unit also had default FTZ! That's... unfortunately interesting. I see you have managed to leap ahead of me on doing reading on the exact details of floating point weirdness on Arm, so clearly I should catch up. :^)

The important (floating) data point: Arm has too many floating point units and they all until recently had overly interesting behavior.

Regarding x86: Actually there has in fact been an argument that writing the MXCSR register is unsound in the past.

RalfJung commented 3 years ago

Even trying to promise that the result given by CTFE is consistent between compiler versions would be foolish, since it would lock us into a particular software emulation strategy that may become outdated.

If we restrict CTFE to the deterministic subset of Rust, then there is no lock-in beyond "CTFE adheres to the Rust specification" (and I hope we agree that that is a kind of lock-in that we want).

Once we permit CTFE for non-deterministic operations, I fully agree -- CTFE should make no more promises than the Rust spec itself.

I think we might already have this for NaN payloads, although it would be nice to have an actual set of rules for how they get handled instead of "whatever LLVM does".

That is the subject of discussion in https://github.com/rust-lang/rust/issues/73328. The most promising approach (I think) so far is to basically copy what wasm does, and then hope/ensure that LLVM complies with this.

This GCC bug is possibly related to this discussion: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93681. LLVM may have similar issues, I'm not sure.

Yes, this is the kind of thing I mean when I say "LLVM might violate the wasm spec" -- if they assume floating-point oprations are deterministic, then we (and they) are in trouble.

Basically I think we should just copy whatever WebAssembly does

For CTFE?

No, for the Rust spec. CTFE then should choose one legal implementation of that non-determinism (or none, if we decide to not support non-deterministic operations in CTFE for now).

But I think we have to resolve the Rust spec question (https://github.com/rust-lang/rust/issues/73328) before we stabilize any of this for CTFE.

Regarding rounding modes, AFAIK LLVM's stance basically is that changing them is UB, and Rust inherits that.

Also, this issue is drifting away from CTFE and towards "what even is the Rust spec". ;) That is not too surprising, as I think we have to figure out the spec before we can really say much about CTFE, but it means there is a lot of overlap with other issues such as https://github.com/rust-lang/rust/issues/73328.

RalfJung commented 3 years ago

I just realized another aspect of this discussion: if NaNs are non-deterministic, and if we allow computations with NaNs in const context, then we are departing from the idea that const fn must be deterministic. Centril would have been strongly opposed and I tend to agree we should tread carefully here -- there are some nice ideas out there for unsafe (runtime) code requiring const fn arguments and exploiting the purity of that computation; those plans would be much harder or might even become impossible if we permit non-determinism in const fn.

So right now I lean rather strongly towards not permitting any computation in CTFE that would be non-deterministic. This would mean adding checks to our floating-point operations and bailing out when there is a NaN, i.e., FP operations would be "unconst". Incidentally, this is also what @thomcc suggested after a long discussion we had recently, but for totally different reasons. ;)

thomcc commented 3 years ago

this is also what @thomcc suggested after a long discussion we had recently, but for totally different reasons.

I mean, wanting to keep float arithmetic deterministic was part of my desire there.

Anyway my suggestion here for NaN in const fn (Note: just NaN — I'm still thinking about the issue for subnormals on ARM) is:

Performing operations on NaN in const fn is an error.
- Note: Operations is defined precisely by IEEE-754 2019 in clause 5 if that's a concern but it more or less means what you'd assume — arithmetic, special functions, casting, etc.
- Signbit ops like Neg/Abs/Copysign should probably be okay, but it also might be weird to allow some ops and not others, and it's unclear how useful these actually are.
- Also, a handful of other operations (mostly 5.7.2's stuff, like is_nan, for example) are probably worth allowing. The precise details here feel like a libs concern, though.
Producing a NaN in const fn except via {float}::from_bits is an error.
- This is probably too late to allow for literals that define const/static items, so perhaps for them it produces the qNaN with the correct sign for the expression and all-bits-zero payload. (This is not actually what the hardware would have produced in all cases, but I doubt we do the right thing here as-is, and that ship has sailed)

At runtime the operations would behave as on the target, e.g. unlike some const eval errors these wouldn't panic at runtime.

In cases like Miri, by default it these error, with a flag you can set to say "I know that the bit-pattern of these NaNs may or may not match that of the target" (Ralf informs me that my initial idea of "defer the error to when you read the bits" would be a lot of work, and doing it this way would actually turn miri into a more valuable tool for debugging numerical code).

All of this has the benefits of:

Keeping const fn fully deterministic.
Keeping floating point math fully deterministic on targets that implement it deterministically.
Behaving the same at runtime as at compile time (It forbids operations that would behave differently).
Preserving target semantics around NaN without having to explicitly model each platform.
Producing very similar results on all targets.
- Not totally identical though — "qNaN with the correct sign for the expression and all-bits-zero payload" actually is different on a small number of easily enumerated targets — MIPS most notably swaps the meaning of qNaN and sNaN.
Miri could be used to catch when NaN is produced.
- Note: This is a unintentional benefit (and one I'd be willing to give up if needed), but it's a huge nice-to-have — I've probably spent literal months of my life trying to track down which expression it was that produced a the NaN.

Not specifically relevant to const fn but related: For constant propagation done as a compiler optimization, I think should just stop constant propagating when it hits a NaN unless it's willing to implement the correct target semantics.

It's very hard for me to imagine this is such a common case that the optimization is very profitable anyway, and sounds way less error-prone than any other option.

Additionally, it follows the logic of "compiler optimizations should not change the observable behavior of the program". (Of course if rust ever gets "fast math" options, one might be to disable this and constant propagate despite it being wrong)

workingjubilee commented 3 years ago

My understanding is that on "subnormals are zero, flush to zero" hardware, the subnormal will effectively zero when interacted with. So if Rust produces a subnormal from const compilation, then at runtime the value would be read as zero. The problem is then if an operation that behaves differently if it would see a non-zero subnormal or zero interacts with that number. Even then, if it is an arithmetic op and the result would also be a subnormal then it actually is of no consequence to continue to const eval float operations.

So it seems there is an option of implementing a similar set of rules around subnormal as you would propose around NaN.

There is also an option of choosing to only const eval floats using subnormal number support, on the thesis that reading the resulting number at zero at runtime is of minimal consequence. This produces a slight compilation vs. runtime deviation, unfortunately, but also is actually pretty simple. It would also imply never advancing hosts which cannot support it (if there are any) to tier 1, unless soft floats become involved.

thomcc commented 3 years ago

My understanding is that on "subnormals are zero, flush to zero" hardware, the subnormal will effectively zero when interacted with. So if Rust produces a subnormal from const compilation, then at runtime the value would be read as zero

This is true (that's what the subnormals-are-zero flag does), but unfortunately there are a lot of computations which end up having very different (normal) results if an intermediate step flushes subnormals to zero vs if they don't.

In general, I'd be devastated to give up subnormals (or more broadly, IEEE-754 compatibility, but subnormals are very important for floats actually behaving well in practice) because of some mistakes ARM made on one of their older architectures.

The options I see here are:

assuming the option I suggested above for NaN is viable, it might be fine just to stop const eval when a subnormal is produced.
emulate arm32 behavior in miri when compiling to arm32.
accept that the result will be different, and compute it using proper IEEE754 semantics. (This is also arguably the closest to the spirit of IEEE754 which is that operations compute things at infinite precision and you get the rounded result).
enable correct handling of subnormals on ARM32 at startup.

I suspect 4 is unrealistic because of bad performance, and people will just turn it back on (when I worked in games I did way worse things than this to fix performance bugs). Also IDK if we can do it from some use cases (shared libraries), etc.

Number 2 keeps consistency between const eval and runtime code... but only if someone else doesn't mess with the floating point flags. In practice, I don't know how common that is but I know if I had to write numerically sensitive code on Arm32 I'd probably at least try turning them on to see how bad the slowdown is (and whether it only happens on computations involving subnormal numbers).

For 1 vs 3, it's tricky. I've never worked in a situation where powerful const eval tools were available for numerically sensitive code. My closest comparison is stuff that ran at build time as part of an asset pipeline. For example, when I worked in games, I maintained a program that ran in the asset pipeline and took a 3d model as input, and spat out the best possible convex hull it could find that had no more than (say) 16 vertices. I think in practice this might be a bit too slow for a const fn to do, but it would be cool, so I'm going to use it as a hypothetical example of numerically intensive code in a const fn.

I don't know for a fact, but strongly suspect that it had computations that went through subnormals, as it normalized all vertices to be in the -1 .. 1 in order to operate where there's more precision (50% of all floats are in that range). (This is also valid and lossless, because the output of the convex hull function was just the triangles that make up the hull)

In a case like that, I'd prefer number 3. Number 1 is tricky because IDK what I'd have been able to do if I had an input that happened to hit that in an intermediate computation. If it's in a const fn, it might be very hard to move to a build script, since it might be using a bunch of other code in internal modules. It's not like NaN where as soon as NaN shows up, you're pretty much hosed anyway — the computation may just briefly dip into the subnormals.

That said, number 2 would have been the worst, because I'd have silent data corruption that I have no way to defend against beyond e.g. printing the result out. I'd wonderfully wait until runtime to learn that my hulls are weird because (a - b) * something became 0 when a and be were too close (this can never happen for a != b when subnormals are enabled).

None of these are great, sadly.

(Damnit ARM, why didn't you just support the standard...)

Lokathor commented 3 years ago

Note about option 2: changing the FP state causes LLVM UB with the standard IR operations. You have to use special IR ops that account for non-standard floating point state if that's what you want. And I'm pretty sure that no part of rust does that (yet?).

RalfJung commented 3 years ago

All of this has the benefits of:

The major drawback, as mentioned above, is that floating-point operations become "unconst". So far my plan for "unconst" operations was to basically make them unsafe in const context and to say that violating these conditions is CTFE UB; that would certainly not be tenable here, so we'd need to figure out a better unconst story.

The subnormal flushing story on ARM is sad, but then it is not really worse than the mess that is x86_32, is it? So one approach here would be to say that yes these targets are supported for Rust, but their FP support is sub-par and you can unpredictably get non-IEEE754-conforming results.

Number 2 keeps consistency between const eval and runtime code... but only if someone else doesn't mess with the floating point flags.

We already assume for correct operation that the FP environment is left at its default, so e.g. changing the rounding mode is effectively UB (or at least, there is no telling which FP operations are running with which rounding mode, as LLVM will happily move them around even if that means crossing a mode change). This sounds similar. (EDIT: Ah that's what Lokathor already wrote.)

thomcc commented 3 years ago

changing the rounding mode is effectively UB (or at least, there is no telling which FP operations are running with which rounding mode, as LLVM will happily move them around even if that means crossing a mode change).

I'm actually more familiar than I'd like with LLVM moving operations around a mode change (The only time I've ever used LLVM as a programmer was working on a LLVM JIT to accelerate interval arithmetic expressions — the naive implementation of these changes rounding mode once per operation. It went poorly).

Anyway, that might be "UB"... but currently nothing that bad will happen even in the wildest of cases beyond LLVM not respecting your rounding, and I suspect a compiler_fence would be enough to force the issue. Most cases of UB I'm aware of in rust are serious issues that are major cause for concern. For this, it seems very likely that people are going to expose Rust code to mobile in contexts where the default float env has been change to something less pathological.

For example: Wouldn't a WASM intepreter have to change the rounding mode? An interpreter for a language that — Hell, we could just be a shared library called from code where the rounding mode has been changed.

These being UB seems like a bad outcome, since it either means that code has a serious, serious bug, or we're stretching the definition of UB very broadly. Either way — I strongly feel that even if it is UB, in practice it shouldn't behave exceptionally poorly, otherwise we've introduced a very surprising way for rust programs to have security issues that other languages don't.

All that said... emulating ARM32's default float env when compiling to that target wouldn't bother me — Maybe even we can even add a lint eventually to warn you if the computation goes through denormals (that said, I don't know how plausible adding a lint from inside miri is).

The subnormal flushing story on ARM is sad, but then it is not really worse than the mess that is x86_32, is it?

Ehh... x86_32 gives you your result at a higher precision than you asked for, which in some sense is great, although it comes with a lot of side effects that are not so great, and in general the x87 stack is a really idiosyncratic beast. That said, for the most part binary80 itself is a pretty natural extension to IEEE754.

So one approach here would be to say that yes these targets are supported for Rust, but their FP support is sub-par and you can unpredictably get non-IEEE754-conforming results.

That seems like a fine thing to say to me. I mean, it's true now after all. That doesn't answer what to do for const fn though.

RalfJung commented 3 years ago

Anyway, that might be "UB"... but currently nothing that bad will happen even in the wildest of cases beyond LLVM not respecting your rounding, and I suspect a compiler_fence would be enough to force the issue.

Being a formal methods person I have to note that this is not fully correct. I could totally write some code that does some FP operations, casts the result to raw bits, inspects them for matching exactly what IEEE754 says, and deref's a NULL pointer if they do not match. This is a correct UB-free program, until changing the FP mode (or running on x86_32 or arm32) messes is up.

No sane person would write such code, of course. But to my knowledge there is unfortunately no principled way in which "misbehavior due to unexpected rounding" is bounded; it can cause arbitrary changes in program behavior.

These being UB seems like a bad outcome, since it either means that code has a serious, serious bug, or we're stretching the definition of UB very broadly. Either way — I strongly feel that even if it is UB, in practice it shouldn't behave exceptionally poorly, otherwise we've introduced a very surprising way for rust programs to have security issues that other languages don't.

I am happy for any proposal that makes them not UB. :) But I think that's very non-trivial. For example, only very recently has there been the first formal work trying to define precisely what the semantics of fast-math are.

Ehh... x86_32 gives you your result at a higher precision than you asked for, which in some sense is great, although it comes with a lot of side effects that are not so great, and in general the x87 stack is a really idiosyncratic beast. That said, for the most part binary80 itself is a pretty natural extension to IEEE754.

I don't think there is any guarantee that overall precision is higher, is there? Precision of some individual operations might be higher, but that does not imply overall precision is higher. These things are not monotone. (Also see some of the discussion in https://github.com/rust-lang/rfcs/pull/2686, specifically this.)

And even if it is true, "higher precision" does not imply "less UB". This can still make conforming programs cause UB, if only notorious programs like what I mentioned above.

That doesn't answer what to do for const fn though.

If the spec says "unpredictably get non-IEEE754-conforming results", i.e. some operations are conforming and some or not, then const fn can just use IEEE754. Basically what this would boil down to is that on these platforms, the affected operations are non-deterministic. This is not sound of course if the optimizer at the same time assumes them to be deterministic. But well, I'd be willing to note that down as a platform bug and move on... that's what everyone else seems to do and it seems to work for them^^ At least x86_32 is "almost dead" as a platform. arm32 less so I guess.

workingjubilee commented 3 years ago

There is also an option of choosing to only const eval floats using subnormal number support, on the thesis that reading the resulting number at zero at runtime is of minimal consequence. This produces a slight compilation vs. runtime deviation, unfortunately, but also is actually pretty simple. —@workingjubilee

accept that the result will be different, and compute it using proper IEEE754 semantics. (This is also arguably the closest to the spirit of IEEE754 which is that operations compute things at infinite precision and you get the rounded result). [..] In a case like that, I'd prefer number 3. —@thomcc

If the spec says "unpredictably get non-IEEE754-conforming results", i.e. some operations are conforming and some or not, then const fn can just use IEEE754. Basically what this would boil down to is that on these platforms, the affected operations are non-deterministic. This is not sound of course if the optimizer at the same time assumes them to be deterministic. But well, I'd be willing to note that down as a platform bug and move on... that's what everyone else seems to do and it seems to work for them^^ —@RalfJung

So it seems like there is at least some perceived wisdom in something like this (including myself, granted), regarding subnormal handling.

If I were making all the decisions re: const fn and floats, I would say we should

specify CTFE as using IEEE754 floats with default rounding mode
only const-eval floats in explicit const contexts, so no implicit const folding of floats
specify that we accept weirdness in older platforms' float behavior
introduce lints and diagnostics as needed to steer people correctly around this

This means that if the platform has what I might call a "cursed" implementation of floating point (contra "buggy", because really, it's not a bug, it's a feature), then Rust currently makes no effort to change that, but avoids introducing unintended deviance that isn't specified by the programmer... or the LLVM optimizer, which is a barrier but which we can, at least to some extent, say "that's LLVM's problem, we're trying our best!"

I believe this is the least work for the most correctness and the least limitations on future choices in Rust. Also, while setting FZ during runtime might invite programmer meddling, we can at least confidently set floating point environment flags to enable subnormal handling during compilation without worrying if they might be changed at runtime, so that barring other bugs we can even try to compile correctly on armv7 hosts.

RalfJung commented 3 years ago

only const-eval floats in explicit const contexts, so no implicit const folding of floats

So you are saying even on "well-behaved" platforms we shouldn't do any constant propagation even if we fully know what result the platform would produce? Or are you suggesting this only for "ill-behaved"/"cursed" platforms or operations where IEEE754 leaves wiggle-room?

workingjubilee commented 3 years ago

I am not entirely sure, honestly. I would be inclined to say that we would want implicit folding of floating point operations to be as invisible as possible to the programmer, and that given that many arches explicitly support executing code compiled for different targets, we should reasonably question our assumptions about what the target "really is".

RalfJung commented 3 years ago

Optimization, by definition, must not change program behavior. That makes them "invisible to the programmer", in a sense.

But when there is non-determinism in the spec, optimizations can stlll lead to different non-deterministic choices being made, which is in some sense "visible".

workingjubilee commented 3 years ago

What I would personally prefer to do would be to by-default disable implicit const folding floats on all platforms and then gradually reenable it for all verified platforms. This would give us a moment of certainty that we are behaving Correctly, and then from that point forward we can at least hope that if we make an optimization, the test suite says it's OK, and no one screams? We've probably not busted anyone.

RalfJung commented 3 years ago

That assumes a very good test suite (which first needs to be written), but sure. :)

RalfJung commented 3 years ago

One interesting point caem up in the LLVM discussion -- a reason that NaN payloads are non-deterministic in practice: some platforms choose the payload of the left/right operand for things like multiplication, which means when the compiler commutes the operands, the NaN payload changes. IOW, on those platforms, FP addition and multiplication are not commutative, but LLVM pretends the are -- and the only way I can think of to explain this (without making FP addition/multiplication, or observing the NaN bits, unsafe) is non-determinism.

tavianator commented 3 years ago

IEEE 754-2008 and -2019 have their last section (§11) dedicated to "reproducible results" (i.e. deterministic) which along with the implementation requirements impose these requirements on the programmer:

...

Do not use fusedMultiplyAdd(0, ∞, c) or fusedMultiplyAdd(∞, 0, c) where c is a quiet NaN.

Do not use signaling NaNs. ...

Do not depend on quiet NaN propagation, payloads, or sign bits.

Do not depend on the underflow and inexact exceptions and flags. ...

So I think at least these things have to be specified as nondeterministic. @ecstatic-morse's point that transcendental functions may not be rounded deterministically (due to the table-maker's dilemma) is also valid. While IEEE 754 recommends that the transcendental functions it lists be correctly rounded (and thus deterministic between platforms) within their domains, many (most?) platforms don't make that guarantee.

I think it should be feasible for CTFE to perform deterministic floating-point computations, even involving NaNs, by canonicalizing. The results may not match ones computed at runtime, but that's okay. IEEE 754 is not really so picky as to require bit-exact results in most cases anyway (see §10.4). The "literal meaning" of a floating-point expression is always an allowed method of evaluation.

If CTFE is to support floating point calculations, I don't think it should error out on NaNs (at least, not by default). There are useful algorithms that produce NaNs in intermediate calculations and handle them correctly (example).

Side note: I was going to mention that all this complexity is why C++ doesn't support floats as template parameters, but that seems not to be true any more as of C++20: https://wg21.link/p1714r1 https://wg21.link/p1907r1.html. They at least don't have constexpr transcendental functions though. I couldn't find much discussion of the impact of NaN payloads on those proposals.

workingjubilee commented 3 years ago

@RalfJung Yeah, we absolutely need a totally ludicrous amount of tests for floating point assumptions. I am going to be trying to develop such as part of developing portable SIMD.

@tavianator Hmm. Once we agree that we're OK with const fn explicitly const-compiling to a form that may not perfectly match runtime, but does match IEEE754, and has minimal unexpected deviation, we are essentially agreeing that we're theoretically OK with const evaluation working that way to handle NaN issues, and it's mostly hashing out the details. It does seem potentially soluble, and that NaN canonicalization is in fact likely the way to go here.

However, I think we're still best off with erroring/linting against const ops on NaNs as our "step 0", and gradually unwinding such a restriction as we go along, on the same thesis that we want to get some experience with having a reliable behavior in a well-known space first, and then move it forward into other spaces.

And yeah, I doubt transcendentals will ever be const for us. It feels like the prerequisites for doing so in a way that make sense across multiple platforms is really "unlock a new revolution in mathematics and computer science right here in rust-lang". Which, you know, I like that we're ambituous as a language, but that might be expecting a bit much of ourselves.

RalfJung commented 3 years ago

I think it should be feasible for CTFE to perform deterministic floating-point computations, even involving NaNs, by canonicalizing. The results may not match ones computed at runtime, but that's okay.

It's not okay for some of the uses of const fn in unsafe code that Centril envisioned -- which would rely on the fact that const fn are deterministic at runtime. This is true as of right now, so breaking it should only be done deliberately.

oli-obk commented 3 years ago

Even if we forbid floats forever in const fn, I think that const fn foo<T: Add>(a: T, b: T) -> T::Output { a + b } will become legal at some point, and then you can invoke foo(5f32, non_canonical_float) at runtime, even if floats are never allowed in const fn. So I don't think we'll be able to guarantee runtime determinism of const fn anyway.

RalfJung commented 3 years ago

@oli-obk to exploit const-determinism for generic functions, of course you have to ensure that all the impls involved are themselves const.

workingjubilee commented 3 years ago

Can we vary a given Trait impl for two types based on constness? I think I saw an RFC for that, but I do not believe it had achieved acceptance.

RalfJung commented 3 years ago

I don't understand what you mean by "vary a given Trait impl for two types based on constness", but you might be referring to https://github.com/rust-lang/rfcs/pull/2632.

Lokathor commented 3 years ago

I think they mean "have a trait impl that's const (and used in const contexts) and also the same impl as non-const (and used in non-const contexts)."

RalfJung commented 3 years ago

Ah okay. Yes that's the plan, similar to how const fn can be called both from const- and non-const contexts.

oli-obk commented 3 years ago

Yes, my worry was that even without impl const Add for f32, we'll be able to call this generic const fn at runtime with float arguments and then (even if the same call isn't possible at compile-time), we can have multiple runtime calls that differ, which I understood to also be an issue from the discussion above.

workingjubilee commented 3 years ago

Are there any problems with const fn foo<T: Add>(a: T, b: T) -> T::Output { a + b } implying T: const Add that would prevent making that a blocker? Ergonomics, I suppose?

RalfJung commented 3 years ago

The proposed behavior of const fn foo<T: Add>(a: T, b: T) in https://github.com/rust-lang/rfcs/pull/2632 is to say that this only required T to be const Add when called in const-context. This will be what most people want most of the time: enabling more const-callers without restricting runtime-callers.

dlight commented 3 years ago

@RalfJung

We do not need such a heavy hammer for floating-point operations because their non-deterministic choice is much more local, confined to each individual operation.

But it in principle we could run abstract operations on floats, right? How expensive really is that?

oli-obk commented 3 years ago

The expense in CPU and memory is likely irrelevant. The heaviness of that approach is in the complexity of such an implementation. Everything in const eval needs to support abstract floats in some manner from that point on. While a lot of infrastructure could be shared with abstract pointers/relocations I still think that the complexity would be too high for such a "niche" (from my perspective) feature.

dlight commented 3 years ago

Maybe it would be feasible if there were a high quality library for abstract floats?

oli-obk commented 3 years ago

it's not the computations themselves that are the complexity problem, it is the fact that every operation that doesn't even care about floats can now encounter a float. Without abstract floats, it would just get a bit pattern and process that, no matter what created the bit pattern. Yes, we can just bail out everywhere, but we have had to do a lot of work to get abstract pointers to the point they are now. Introducing another abstract-thing-layer is not isolated to just the operations on that layer.

workingjubilee commented 3 years ago

IEEE754-2019 has some recommendations for reproducible floating point results and one of them is that you work with an "arithmetic" float that is also an "interchange" float. That means working with binary{16,32,64,128}, which means no abstract floats. Given that we already have a tough row to hoe with const flops, I think we should take IEEE754 seriously about that, and I don't think that would get better if we used a different format for our reals, either.

RalfJung commented 3 years ago

IEEE754-2019 has some recommendations for reproducible floating point results and one of them is that you work with an "arithmetic" float that is also an "interchange" float. That means working with binary{16,32,64,128}, which means no abstract floats.

I have no idea what this means.

But I agree with @oli-obk, adding support for what I'd call "symbolic floats" to the CTFE engine would indeed be very invasive. It is possible in principle, but many things are possible in principle. ;) It would be the pain we are having with symbolic pointers right now all over again. We do already have a high-quality (or at least sufficient quality) softfloat library in rustc, that is not the problem; the problem is threading through and supporting those symbolic values everywhere.

Moreover, it would still not achieve feature parity: converting a float to its underlying bytes would not be possible any more (at least not if it is a NaN).

tema3210 commented 3 years ago

@RalfJung

We do not need such a heavy hammer for floating-point operations because their non-deterministic choice is much more local, confined to each individual operation.

But it in principle we could run abstract operations on floats, right? How expensive really is that?

This might block const-ness of to_bits() and co. conversions: what if one does pure computation using bits from float? How do we allow it? Gurantees on floats repr are not so good decision: there are platforms which use unusual float representation, it might break user logic. Maybe: #cfg(??)?

workingjubilee commented 3 years ago

Pardon? I have evaluated many platforms and have a good understanding of the nuances of float repr. All of them are well-defined by the IEEE754 floating point standard nowadays, because a lot of work was put into making that the standard, and those that deviate have well-known deviations that still place the representation somewhere within the standard's expectations. Some have decimal floats, but that does not say much because that is still a well-defined float repr that the standard accounts for and conversions between binary and decimal floats are well-defined.

workingjubilee commented 3 years ago

I have done some more thought about the very nature of the question presented regarding symbolic math, and the only thing that would make sense for abstract computations on floats would be to perform a series of intermediate calculations on floats of an infinite length, which is an explicitly permitted action in the standard, and then to round to a defined definite length. However, Rust has deliberately chosen another path of preferring to calculate in and round to explicitly user-defined float lengths, and a motion for more implicit conversions did not succeed. There is no real need for extending the language here, as a result, merely for someone to write a library. I would be surprised if one does not already exist, in fact, given arbitrary precision float libraries are not unknown.

workingjubilee commented 3 years ago

IEEE754-2019 has some recommendations for reproducible floating point results and one of them is that you work with an "arithmetic" float that is also an "interchange" float. That means working with binary{16,32,64,128}, which means no abstract floats.

I have no idea what this means.

Apologies for not explaining this more fully:

An "arithmetic" float is one that supports IEEE754 operations.
An "interchange" float is a float with a specified interchange format.

The standard recommends that for reproducible results we should use floating point numbers that satisfy both of these criteria, and that means we should use binary32 binary64, AKA f32 and f64 which satisfy both definitions. A symbolic float would not. I believe the reproducibility criteria for IEEE754 do not represent "exactly what we should do for const float compilation" because these are largely just recommendations and we have other complications and needs not foreseen in the standard, but they do represent what I would call a moderate-strength advisory.

rust-lang / rust

Must a `const fn` behave exactly the same at runtime as at compile-time? #77745