Open ndellingwood opened 5 years ago
@kyungjoo-kim would you have time to look at the gemm test? @vqd8a would you have time to look at the trsm tests?
Due to prohibitive compile time from the XL compiler, it is very difficult (almost impossible) to debug with the compiler. As the other compilers and platforms are okay, the listed failures are probably related to compiler super-scalar ordering. I am not sure if spending our time for this is meaningful. I suggest to disable the entire complex testing with XL. Since we test wtih Kokkos::complex, it would be interesting if this failures are reproduced from std::complex. However, I also think that investigating the difference between kokkos::complex and std::complex is not meaningful.
@kyungjoo-kim sounds good, compile times are very long with XL, thanks for looking into it.
@kyungjoo-kim Are there any Kokkos::parallel_reduce
or atomic updates on Kokkos::complex<double>
? If so, it's possible this is a Kokkos bug, due to POWER's different memory model.
@mhoemmen I do not use reduce but I suspect that kokkos complex is problematic. When I populate random numbers, I use the max range as value_type(1.0)
where value_type is kokkos complex. Lately I found that complex(1.0) populates random number with zero imaginary as the imaginary range is zero. The same test fails when it tests complex with zero imaginary but it passes with double. That is why I think that it is necessary to test with std::complex to tell if the issue comes from different memory model or from complex arithmetic overloading. Testig these takes too much time.
@kyungjoo-kim As discussed previously if you ask for a range of (1.0,0.0) (which you do since you implicitly construct from a real value) your range on the imaginary part is zero as you asked for it ;-).
@mhoemmen Reductions and Atomics should just work (and they should do the right thing, i.e. proper atomics). I might look into this and figure out whats going on with XL here.
@kyungjoo-kim @mhoemmen I was able to replicated the failures on openmp.gemm_complex_double
and serial.gemm_complex_double
with GCC 6.4.0 on white using the CMake test script, and also with several compilers (intel, gcc, ibm) on white and bowman using the Makefile test script. So I think that one might be an actual bug, not an IBM compiler bug.
@brian-kelley Is that code calling the system BLAS or is it calling a hand-written matrix-matrix multiply?
@mhoemmen It's the hand-written KokkosBlas::gemm.
Hmm, today I'm not seeing gemm_complex_double fail except with IBM compilers. I started these spot checks this morning so they don't have the fixes from #550. Not sure what else changed, but I'm not going to worry about it.
@brian-kelley Are we perhaps assuming things about alignment of Kokkos::complex
and std::complex
that only cause issues with IBM compilers? That should only be an issue with CUDA, where std::complex<double>
only needs 8-byte alignment (per the Standard) but CUDA's equivalent complex type needs 16-byte alignment.
@mhoemmen I doubt it's an alignment issue, for that reason. Even if IBM handles std::complex in a special way, I think this test only involves Kokkos::complex which is just a plain old struct, with member-wise alignment requirements.
@crtrott : This is showing up in spot-checks and not allowing us having clean spot-checks to push. We need to resolve this to make progress on other things.
Output from Jenkins:
Failed Test 1
Failed Test 2
Failed Test 3