kokkos / kokkos-kernels

Kokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels
Other
304 stars 96 forks source link

Various complex<double> unit test fails with XL/16.1 OpenMP build #344

Open ndellingwood opened 5 years ago

ndellingwood commented 5 years ago

Output from Jenkins:

Failed Test 1

06:46:10 [ RUN      ] openmp.gemm_complex_double
06:46:10 /home/jenkins/white/workspace/KokkosKernels_White_XL_16_1_OpenMP_Serial/kokkos-kernels/unit_test/blas/Test_Blas3_gemm.hpp:147: Failure
06:46:10 Value of: (diff_C_average < 1.05*diff_C_expected )
06:46:10   Actual: false
06:46:10 Expected: true
etc.

Failed Test 2

06:46:10 [ RUN      ] openmp.batched_scalar_team_trsm_l_u_nt_n_dcomplex_dcomplex
06:46:10 /home/jenkins/white/workspace/KokkosKernels_White_XL_16_1_OpenMP_Serial/kokkos-kernels/unit_test/../test_common/KokkosKernels_TestUtils.hpp:87: Failure
06:46:10 The difference between double(AT1::abs(val1)) and double(AT2::abs(val2)) is 0.99556106126897947, which exceeds double(AT3::abs(tol)), where
06:46:10 double(AT1::abs(val1)) evaluates to 0.99556106126897947,
06:46:10 double(AT2::abs(val2)) evaluates to 0, and
06:46:10 double(AT3::abs(tol)) evaluates to 2.2204460492503131e-13.

Failed Test 3

06:46:10 [ RUN      ] openmp.batched_scalar_team_trsm_l_u_nt_n_dcomplex_double
06:46:10 /home/jenkins/white/workspace/KokkosKernels_White_XL_16_1_OpenMP_Serial/kokkos-kernels/unit_test/../test_common/KokkosKernels_TestUtils.hpp:87: Failure
06:46:10 The difference between double(AT1::abs(val1)) and double(AT2::abs(val2)) is 0.99580002746667939, which exceeds double(AT3::abs(tol)), where
06:46:10 double(AT1::abs(val1)) evaluates to 0.99580002746667939,
06:46:10 double(AT2::abs(val2)) evaluates to 0, and
06:46:10 double(AT3::abs(tol)) evaluates to 2.2204460492503131e-13.
ndellingwood commented 5 years ago

@kyungjoo-kim would you have time to look at the gemm test? @vqd8a would you have time to look at the trsm tests?

kyungjoo-kim commented 5 years ago

Due to prohibitive compile time from the XL compiler, it is very difficult (almost impossible) to debug with the compiler. As the other compilers and platforms are okay, the listed failures are probably related to compiler super-scalar ordering. I am not sure if spending our time for this is meaningful. I suggest to disable the entire complex testing with XL. Since we test wtih Kokkos::complex, it would be interesting if this failures are reproduced from std::complex. However, I also think that investigating the difference between kokkos::complex and std::complex is not meaningful.

ndellingwood commented 5 years ago

@kyungjoo-kim sounds good, compile times are very long with XL, thanks for looking into it.

mhoemmen commented 5 years ago

@kyungjoo-kim Are there any Kokkos::parallel_reduce or atomic updates on Kokkos::complex<double>? If so, it's possible this is a Kokkos bug, due to POWER's different memory model.

kyungjoo-kim commented 5 years ago

@mhoemmen I do not use reduce but I suspect that kokkos complex is problematic. When I populate random numbers, I use the max range as value_type(1.0) where value_type is kokkos complex. Lately I found that complex(1.0) populates random number with zero imaginary as the imaginary range is zero. The same test fails when it tests complex with zero imaginary but it passes with double. That is why I think that it is necessary to test with std::complex to tell if the issue comes from different memory model or from complex arithmetic overloading. Testig these takes too much time.

crtrott commented 5 years ago

@kyungjoo-kim As discussed previously if you ask for a range of (1.0,0.0) (which you do since you implicitly construct from a real value) your range on the imaginary part is zero as you asked for it ;-).

@mhoemmen Reductions and Atomics should just work (and they should do the right thing, i.e. proper atomics). I might look into this and figure out whats going on with XL here.

brian-kelley commented 4 years ago

@kyungjoo-kim @mhoemmen I was able to replicated the failures on openmp.gemm_complex_double and serial.gemm_complex_double with GCC 6.4.0 on white using the CMake test script, and also with several compilers (intel, gcc, ibm) on white and bowman using the Makefile test script. So I think that one might be an actual bug, not an IBM compiler bug.

mhoemmen commented 4 years ago

@brian-kelley Is that code calling the system BLAS or is it calling a hand-written matrix-matrix multiply?

brian-kelley commented 4 years ago

@mhoemmen It's the hand-written KokkosBlas::gemm.

brian-kelley commented 4 years ago

Hmm, today I'm not seeing gemm_complex_double fail except with IBM compilers. I started these spot checks this morning so they don't have the fixes from #550. Not sure what else changed, but I'm not going to worry about it.

mhoemmen commented 4 years ago

@brian-kelley Are we perhaps assuming things about alignment of Kokkos::complex and std::complex that only cause issues with IBM compilers? That should only be an issue with CUDA, where std::complex<double> only needs 8-byte alignment (per the Standard) but CUDA's equivalent complex type needs 16-byte alignment.

brian-kelley commented 4 years ago

@mhoemmen I doubt it's an alignment issue, for that reason. Even if IBM handles std::complex in a special way, I think this test only involves Kokkos::complex which is just a plain old struct, with member-wise alignment requirements.

srajama1 commented 4 years ago

@crtrott : This is showing up in spot-checks and not allowing us having clean spot-checks to push. We need to resolve this to make progress on other things.