rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.08k stars 12.54k forks source link

Should std::{f32,f64}::NAN be a QNAN or SNAN ? #52897

Open gnzlbg opened 6 years ago

gnzlbg commented 6 years ago

Right now, whether they are a QNAN or an SNAN depends on the architecture. Currently, they are an SNAN on MIPS (EDIT: r5 and older) at least, and a QNAN on most others.

It probably makes sense to offer consistent behavior here independently of whether LLVM preservers payloads, signaling/quietness, etc.

cc @rkruppe @draganmladjenovic

est31 commented 6 years ago

Note that "on MIPS at least" is imprecise: which bit pattern is considered quiet and which pattern is considered signaling depends on the used MIPS version and configuration. See #46246 for more info.

gnzlbg commented 6 years ago

@est31 indeed. #52666 suggests leaving the current mips targets for the "older" mips versions (r5 and older), and to add a new targets (e.g. mipsisa64r6-unknown-linux-gnuabi64) for the newer versions (r6 and newer).

MIPS in the issue refers to r5 and older since currently we don't have any targets for r6. I've added an edit to clarify that.

hanna-kruppe commented 6 years ago

It probably makes sense to offer consistent behavior here independently of whether LLVM preservers payloads, signaling/quietness, etc.

Why? Considering how LLVM behaves (and the differences in how hardware handle payloads, too!), I can't think of any (edit: portable) use case that can reliably use any particular NAN constant for a bitwise equality check. It seems to me you'd always want to test with is_nan().

varkor commented 5 years ago

The guarantees provided by std::{f32,f64}::NAN should be documented. At the moment, there is no information, but it might be helpful to be explicit about the fact that nothing is guaranteed other than being a NaN.

silwol commented 4 years ago

In Debian, we hit what I believe to be the issue at hand. I bisected the rustup versions of compilers, and it seems the behavior changed from rustc 1.36 to 1.37. Was this intentional, or maybe triggered by an update of a library used by rustc, e.g. llvm?

The issue surfaced in https://github.com/rust-num/num-traits/issues/151 where the tests worked before and don't work any longer now in mipsel and mips64el.

nikic commented 4 years ago

@silwol I'd expect that minnum/maxnum use libcall legalization on mips, so you're likely seeing libm behavior here. Can you check whether that's the case?

Per the C standard, I believe fmin/fmax should be treating SNaN the same as QNaN (contrary to IEEE754).

silwol commented 4 years ago

@nikic I hope I checked this the right way. If not, please give me a hint how / where to try next. The important information here is that it's not the num-traits code that fails, but instead there is a doctest inside the project which falls back to std's min and max implementation if their std feature is enabled, and that is what fails.

Here is what I tried:

$ cat example.c 
#include <math.h>
#include <stdio.h>

int main() {
    double a = 1;
    double b = NAN;
    printf("fmin(%f, %f) = %f\n", a, b, fmin(a, b));
    return 0;
}
$ clang -Wall -Werror -std=c99 -lm -fno-builtin -o example example.c 
$ ./example
fmin(1.000000, nan) = 1.000000
$

This behaves exactly the same on Debian mipsel and amd64.

In contrast, the following behaved the same on these architectures with rustc up to v1.36.0 (installed through rustup.rs), and fails on mipsel since rustc v1.37.0.

General information on my mipsel VM:

$ uname -a
Linux debian 5.4.0-3-4kc-malta #1 SMP Debian 5.4.13-1 (2020-01-19) mips GNU/Linux
$ cat src/lib.rs 
#[cfg(test)]
mod tests {
    #[test]
    fn min() {
        let nan = 0f64 / 0f64;
        assert_eq!(1f64.min(nan), 1f64);
    }

    #[test]
    fn max() {
        let nan = 0f64 / 0f64;
        assert_eq!(1f64.max(nan), 1f64);
    }
}

Trying with latest stable rustc installed with rustup, behaves the same with the rustc 1.40.0 installed from the Debian repository:

$ rustc --version
rustc 1.41.0 (5e1a79984 2020-01-27)
$ cargo test
    Finished test [unoptimized + debuginfo] target(s) in 0.57s
     Running target/debug/deps/floatextremestest-23756e6931b17aa8

running 2 tests
test tests::max ... FAILED
test tests::min ... FAILED

failures:

---- tests::max stdout ----
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `NaN`,
 right: `1.0`', src/lib.rs:12:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

---- tests::min stdout ----
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `NaN`,
 right: `1.0`', src/lib.rs:6:9

failures:
    tests::max
    tests::min

test result: FAILED. 0 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out

Trying with latest known-to-work stable rustc:

$ rustc --version
rustc 1.36.0 (a53f9df32 2019-07-03)
$ cargo test
    Finished dev [unoptimized + debuginfo] target(s) in 9.38s
     Running target/debug/deps/floatextremestest-3c9ba83b13a3b688

running 2 tests
test tests::max ... ok
test tests::min ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests floatextremestest

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

I'm not very familiar with compiler internals, so please let me know if that might be a different problem from this one, then I'll move it to a new issue. For the record, this is tracked in Debian at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=950583

est31 commented 4 years ago

@silwol which mips version does your cpu have? E.g. what does running cat /proc/cpuinfo | grep "model name" show?

silwol commented 4 years ago

For checking if the error is reproducible, I started a qemu mipsel VM with qemu-system-mipsel -M malta …. /proc/cpuinfo doesn't contain a "model name" entry here, but instead has this line:

cpu model       : MIPS 24Kc V0.0  FPU V0.0

Regarding the Debian infrastructure, I don't have direct access to these machines. The failing test logs can be found on https://buildd.debian.org/status/package.php?p=rust-num-traits from where you can follow the link to the machine information in the Buildd column. There you have a "Build machine info" entry somewhere in the head, showing a list of links to the details of each machine. The affected num-traits crate has been built multiple times on different machines. The details are on https://buildd.debian.org/status/logs.php?pkg=rust-num-traits&ver=0.2.11-1&arch=mips64el and https://buildd.debian.org/status/logs.php?pkg=rust-num-traits&ver=0.2.11-1&arch=mipsel (corresponding the "all" link in the "Logs" column).

silwol commented 4 years ago

I just looked up the information about the physical machines that built the project. They're all listed with Processor: Rhino Labs UTM8. There are machines available with Processor: LS3A-RS780-1w (Quad Core Loongson 3A) and Processor: Lemote 3A ITX-A1101 (Quad Core Loongson 3A), but I don't know if there is any way to assign builds explicitly to a specific machine type.

est31 commented 4 years ago

The f32::max docs say:

If one of the arguments is NaN, then the other argument is returned.

Which is clearly not the case here.

f32::max itself is only a wrapper of the maxnumf32 intrinsic, which maps to llvm.maxnum.f32.

The docs of that LLVM intrinsic say:

Follows the IEEE-754 semantics for maxNum except for the handling of signaling NaNs. This matches the behavior of libm’s fmax.

If either operand is a NaN, returns the other non-NaN operand.

So this promise seems to be violated as well. However, this does not imply a bug in llvm just yet, as we provide some intrinsics to llvm. It might be the intrinsics that are wrong. The Rust based libm implementation has this fmaxf code. So maybe the bug lurks there? The code looks alright though.

silwol commented 4 years ago

Thanks @est31 for the hints. I found f64::from_bits docs mentioning a difference between MIPS and the other platforms in the signaling bit. Also the fmaxf code you referenced has a comment telling that sNaN is not handled due to not being supported yet. Interestingly enough, the implementation that the num_traits crate uses for no-std environments succeeds in the test. The question remains what changed from rustc 1.36 to 1.37 to reveal these circumstances.

est31 commented 4 years ago

@silwol yeah there is a fundamental difference between older MIPS targets and newer MIPS targets in that the signaling bit is opposite to IEEE 754-2008 vs follows the standard. But even older MIPS targets follow IEEE 754-1985 which clearly specifies the bit patterns of NaNs (it only doesn't specify the bit pattern distinction between qNaN and sNaN).

@silwol if you have access to a MIPS machine you could try finding the nightly that caused the regression using cargo-bisect-rustc.

Either way, I think you should file a proper issue for your bug report.

vinc17fr commented 3 years ago

IEEE 754-2008 is obsolete. You should follow IEEE 754-2019, where the poorly designed minNum and maxNum operations (as being non-associative) have been replaced by other operations, which handle qNaN and sNaN in the same way (except for the signaled invalid operation exception in case of sNaN).

est31 commented 3 years ago

@vinc17fr there is a discussion about which IEEE 754 version to use in #83984 .