Open gnzlbg opened 6 years ago
Note that "on MIPS at least" is imprecise: which bit pattern is considered quiet and which pattern is considered signaling depends on the used MIPS version and configuration. See #46246 for more info.
@est31 indeed. #52666 suggests leaving the current mips targets for the "older" mips versions (r5 and older), and to add a new targets (e.g. mipsisa64r6-unknown-linux-gnuabi64
) for the newer versions (r6 and newer).
MIPS in the issue refers to r5
and older since currently we don't have any targets for r6
. I've added an edit to clarify that.
It probably makes sense to offer consistent behavior here independently of whether LLVM preservers payloads, signaling/quietness, etc.
Why? Considering how LLVM behaves (and the differences in how hardware handle payloads, too!), I can't think of any (edit: portable) use case that can reliably use any particular NAN
constant for a bitwise equality check. It seems to me you'd always want to test with is_nan()
.
The guarantees provided by std::{f32,f64}::NAN
should be documented. At the moment, there is no information, but it might be helpful to be explicit about the fact that nothing is guaranteed other than being a NaN.
In Debian, we hit what I believe to be the issue at hand. I bisected the rustup versions of compilers, and it seems the behavior changed from rustc 1.36 to 1.37. Was this intentional, or maybe triggered by an update of a library used by rustc, e.g. llvm?
The issue surfaced in https://github.com/rust-num/num-traits/issues/151 where the tests worked before and don't work any longer now in mipsel and mips64el.
@silwol I'd expect that minnum/maxnum use libcall legalization on mips, so you're likely seeing libm behavior here. Can you check whether that's the case?
Per the C standard, I believe fmin/fmax should be treating SNaN the same as QNaN (contrary to IEEE754).
@nikic I hope I checked this the right way. If not, please give me a hint how / where to try next. The important information here is that it's not the num-traits
code that fails, but instead there is a doctest inside the project which falls back to std
's min
and max
implementation if their std
feature is enabled, and that is what fails.
Here is what I tried:
$ cat example.c
#include <math.h>
#include <stdio.h>
int main() {
double a = 1;
double b = NAN;
printf("fmin(%f, %f) = %f\n", a, b, fmin(a, b));
return 0;
}
$ clang -Wall -Werror -std=c99 -lm -fno-builtin -o example example.c
$ ./example
fmin(1.000000, nan) = 1.000000
$
This behaves exactly the same on Debian mipsel
and amd64
.
In contrast, the following behaved the same on these architectures with rustc
up to v1.36.0 (installed through rustup.rs
), and fails on mipsel
since rustc
v1.37.0.
General information on my mipsel VM:
$ uname -a
Linux debian 5.4.0-3-4kc-malta #1 SMP Debian 5.4.13-1 (2020-01-19) mips GNU/Linux
$ cat src/lib.rs
#[cfg(test)]
mod tests {
#[test]
fn min() {
let nan = 0f64 / 0f64;
assert_eq!(1f64.min(nan), 1f64);
}
#[test]
fn max() {
let nan = 0f64 / 0f64;
assert_eq!(1f64.max(nan), 1f64);
}
}
Trying with latest stable rustc
installed with rustup
, behaves the same with the rustc
1.40.0 installed from the Debian repository:
$ rustc --version
rustc 1.41.0 (5e1a79984 2020-01-27)
$ cargo test
Finished test [unoptimized + debuginfo] target(s) in 0.57s
Running target/debug/deps/floatextremestest-23756e6931b17aa8
running 2 tests
test tests::max ... FAILED
test tests::min ... FAILED
failures:
---- tests::max stdout ----
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `NaN`,
right: `1.0`', src/lib.rs:12:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
---- tests::min stdout ----
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `NaN`,
right: `1.0`', src/lib.rs:6:9
failures:
tests::max
tests::min
test result: FAILED. 0 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out
Trying with latest known-to-work stable rustc:
$ rustc --version
rustc 1.36.0 (a53f9df32 2019-07-03)
$ cargo test
Finished dev [unoptimized + debuginfo] target(s) in 9.38s
Running target/debug/deps/floatextremestest-3c9ba83b13a3b688
running 2 tests
test tests::max ... ok
test tests::min ... ok
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Doc-tests floatextremestest
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
I'm not very familiar with compiler internals, so please let me know if that might be a different problem from this one, then I'll move it to a new issue. For the record, this is tracked in Debian at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=950583
@silwol which mips version does your cpu have? E.g. what does running cat /proc/cpuinfo | grep "model name"
show?
For checking if the error is reproducible, I started a qemu mipsel VM with qemu-system-mipsel -M malta …
. /proc/cpuinfo
doesn't contain a "model name"
entry here, but instead has this line:
cpu model : MIPS 24Kc V0.0 FPU V0.0
Regarding the Debian infrastructure, I don't have direct access to these machines. The failing test logs can be found on https://buildd.debian.org/status/package.php?p=rust-num-traits from where you can follow the link to the machine information in the Buildd column. There you have a "Build machine info" entry somewhere in the head, showing a list of links to the details of each machine.
The affected num-traits
crate has been built multiple times on different machines. The details are on https://buildd.debian.org/status/logs.php?pkg=rust-num-traits&ver=0.2.11-1&arch=mips64el and https://buildd.debian.org/status/logs.php?pkg=rust-num-traits&ver=0.2.11-1&arch=mipsel (corresponding the "all" link in the "Logs" column).
I just looked up the information about the physical machines that built the project. They're all listed with Processor: Rhino Labs UTM8. There are machines available with Processor: LS3A-RS780-1w (Quad Core Loongson 3A) and Processor: Lemote 3A ITX-A1101 (Quad Core Loongson 3A), but I don't know if there is any way to assign builds explicitly to a specific machine type.
The f32::max
docs say:
If one of the arguments is NaN, then the other argument is returned.
Which is clearly not the case here.
f32::max itself is only a wrapper of the maxnumf32
intrinsic, which maps to llvm.maxnum.f32
.
The docs of that LLVM intrinsic say:
Follows the IEEE-754 semantics for maxNum except for the handling of signaling NaNs. This matches the behavior of libm’s fmax.
If either operand is a NaN, returns the other non-NaN operand.
So this promise seems to be violated as well. However, this does not imply a bug in llvm just yet, as we provide some intrinsics to llvm. It might be the intrinsics that are wrong. The Rust based libm implementation has this fmaxf code. So maybe the bug lurks there? The code looks alright though.
Thanks @est31 for the hints.
I found f64::from_bits docs mentioning a difference between MIPS and the other platforms in the signaling bit. Also the fmaxf code you referenced has a comment telling that sNaN is not handled due to not being supported yet. Interestingly enough, the implementation that the num_traits
crate uses for no-std environments succeeds in the test.
The question remains what changed from rustc 1.36 to 1.37 to reveal these circumstances.
@silwol yeah there is a fundamental difference between older MIPS targets and newer MIPS targets in that the signaling bit is opposite to IEEE 754-2008 vs follows the standard. But even older MIPS targets follow IEEE 754-1985 which clearly specifies the bit patterns of NaNs (it only doesn't specify the bit pattern distinction between qNaN and sNaN).
@silwol if you have access to a MIPS machine you could try finding the nightly that caused the regression using cargo-bisect-rustc.
Either way, I think you should file a proper issue for your bug report.
IEEE 754-2008 is obsolete. You should follow IEEE 754-2019, where the poorly designed minNum and maxNum operations (as being non-associative) have been replaced by other operations, which handle qNaN and sNaN in the same way (except for the signaled invalid operation exception in case of sNaN).
@vinc17fr there is a discussion about which IEEE 754 version to use in #83984 .
Right now, whether they are a QNAN or an SNAN depends on the architecture. Currently, they are an SNAN on MIPS (EDIT: r5 and older) at least, and a QNAN on most others.
It probably makes sense to offer consistent behavior here independently of whether LLVM preservers payloads, signaling/quietness, etc.
cc @rkruppe @draganmladjenovic