rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.94k stars 12.68k forks source link

Miscompilation in _mm512_reduce_add_pd, assumes values not NaN #120720

Closed orlp closed 8 months ago

orlp commented 8 months ago

The following example should obviously compile to a panic, as the documentation of _mm512_reduce_add_pd mentions nothing about NaNs not being respected:

#![feature(stdsimd)]
use core::arch::x86_64::*;

#[no_mangle]
unsafe fn test() -> f64 {
    let ret = _mm512_reduce_add_pd(_mm512_set_pd(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, f64::NAN));
    if ret.is_nan() {
        panic!("hi")
    }
    ret
}

However, when compiled with -C opt-level=3 -C target-cpu=cannonlake on the latest nightly there is no panic generated, despite the output being NaN regardless:

.LCPI0_0:
        .quad   0x7ff8000000000000
test:
        vmovsd  xmm0, qword ptr [rip + .LCPI0_0]
        ret
apiraino commented 8 months ago

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-medium +T-libs +requires-nightly

quaternic commented 8 months ago

Currently the intrinsic is implemented as simd_reduce_add_unordered, which is lowered to LLVM with all the fast-math flags set. That is fixed in PR #120718 .

However, for these _mm512_reduce_{add,mul,min,max}_* intrinsics, the behaviour should match how they are defined in https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=float*reduce*ps&expand=4560&ig_expand=5298,5343,5302,5396,5377,5302,5303

All of them follow the same pattern:

  1. (For the masked variants, substitute masked values with an operation-specific identity value.)
  2. Split in half, [lower,upper] = tmp;
  3. Do the operation elementwise as tmp[i] = OP(lower[i], upper[i]);
  4. Repeat from 1. until only one element remains.