shibatch / sleef

SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT
https://sleef.org
Boost Software License 1.0
638 stars 129 forks source link

Incorrect Builds on macOS for Debug #301

Open miakramer opened 4 years ago

miakramer commented 4 years ago

On macOS (not sure about other OS, haven't tried), when building (with system clang) a Debug build, some of the AVX and AVX2 tests fail. The SSE tests pass, and my CPU doesn't have AVX512. RelWithDebInfo also seems to build correctly. The tests that fail are:

The AVX/2 tests that pass are:

Please let me know if there's any more information I can provide.

Output from setting -DSLEEF_SHOW_CONFIG:

-- Could NOT find OpenSSL, try to set the path to OpenSSL root folder in the system variable OPENSSL_ROOT_DIR (missing: OPENSSL_INCLUDE_DIR)
-- Configuring build for SLEEF-v3.4.0
   Target system: Darwin-19.3.0
   Target processor: x86_64
   Host system: Darwin-19.3.0
   Host processor: x86_64
   Detected C compiler: AppleClang @ /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
-- Using option `-Wall -Wno-unused -Wno-attributes -Wno-unused-result -ffp-contract=off -fno-math-errno -fno-trapping-math` to compile libsleef
-- Building shared libs : ON
-- MPFR : /usr/local/lib/libmpfr.dylib
-- MPFR header file in /usr/local/include
-- GMP : /usr/local/lib/libgmp.dylib
-- RT :
-- FFTW3 : /usr/local/lib/libfftw3.dylib
-- OPENSSL :
-- SDE : SDE_COMMAND-NOTFOUND
-- RUNNING_ON_TRAVIS : 0
-- COMPILER_SUPPORTS_OPENMP :
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/Mia/git/sleef/build
miakramer commented 4 years ago

Oh, I forgot to mention. I noticed in my ray tracer on a debug build: atan2_8f seems correct for the first four lanes, but always returns 0 in the last four. acos_8f is similar, returning 1.5707964 (presumably pi/2) in the last four.

shibatch commented 4 years ago

The problem with Debug build should be now fixed.

https://github.com/shibatch/sleef/tree/Fix_build_error_with_Debug_mode

As for atan2_8f, please show a code for demonstrating the bug.

miakramer commented 4 years ago

Sorry, the fix doesn't seem to be working. I haven't finished all of the tests again yet, but so far at least iutavx2 and iutyavx2 have failed.

For the functions I mentioned, I meant that seemed to be the broken behaviour. I'll see if I can make a small sample.

shibatch commented 4 years ago

Please tell me the model of your mac and the exact commands you gave.

shibatch commented 4 years ago

It should be easy to modify the sample source code at sleef.org.

https://sleef.org/hellox86.c

miakramer commented 4 years ago

It's a 2014 MacBook Pro, the CPU is an Intel i7-4870HQ. CPU features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C RDWRFSGS TSC_THREAD_OFFSET BMI1 AVX2 SMEP BMI2 ERMS INVPCID FPU_CSDS MDCLEAR IBRS STIBP L1DF SSBD

The commands I used:

git clone https://github.com/shibatch/sleef.git
cd sleef
git checkout origin/Fix_build_error_with_Debug_mode
mkdir build && cd build
cmake -DSLEEF_SHOW_CONFIG=TRUE -DCMAKE_BUILD_TYPE=Debug -DBUILD_DFT=FALSE ..
make -j 8
make test
shibatch commented 4 years ago

It does not reproduce on my environment.

shibatch commented 4 years ago

Please try once again from the beginning, and paste the result of "make test."

miakramer commented 4 years ago

Ok, I will try again.

Using the debug build that fails the test, I get this output:

Should be: 0.00;  0.45;  0.90;  1.35;  1.80;  2.24;  2.69;  3.14
Is:        0.00;  0.45;  0.90;  1.35;  1.57;  1.57;  1.57;  1.57

From this program:

#include <stdio.h>
#include <x86intrin.h>
#include <sleef.h>

int main(int argc, char **argv) {
  float in[] = {
    1.0,
    0.9009688679024191,
    0.6234898018587336,
    0.22252093395631445,
   -0.22252093395631434,
   -0.6234898018587335,
   -0.900968867902419,
   -1.0
  };
  float out[] = {
    0.0,
    0.4487989505128276,
    0.8975979010256552,
    1.3463968515384828,
    1.7951958020513104,
    2.243994752564138,
    2.6927937030769655,
    3.141592653589793
  };

  __m256 vin, vout;

  vin  = _mm256_loadu_ps(in);

  vout = Sleef_acosf8_u10avx2(vin);

  float res[8];

  _mm256_storeu_ps(res, vout);

  printf("Should be: %.2f;  %.2f;  %.2f;  %.2f;  %.2f;  %.2f;  %.2f;  %.2f\n", out[0], out[1], out[2], out[3], out[4], out[5], out[6], out[7]);
  printf("Is:        %.2f;  %.2f;  %.2f;  %.2f;  %.2f;  %.2f;  %.2f;  %.2f\n", res[0], res[1], res[2], res[3], res[4], res[5], res[6], res[7]);
}
shibatch commented 4 years ago

I think this is not a problem in sleef, and it does not reproduce on my environment.

miakramer commented 4 years ago

I tried again from scratch, and it seems to be Apple clang-specific. Apple clang Debug fails the same tests as above, but regular LLVM clang Debug passed them all. Apple clang Release does work, though. I'm assuming SLEEF doesn't do anything special for Apple vs. regular clang, so I guess a compiler bug is possible?

shibatch commented 4 years ago

It is hard to identify the cause. I also tried Apple clang.

-- The C compiler identification is AppleClang 9.0.0.9000039
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - works
miakramer commented 4 years ago

I'll take a look in a few days when I have more time, but in the meantime I posted the generated LLVM from -DSLEEF_ENABLE_LLVM_BITCODE in case it's helpful. I can also upload the .dylib file somewhere if it would be helpful. These are .ll files, I just changed them to .txt so github would let me upload them.