Closed huttered40 closed 5 years ago
@huttered40 I edited your post just to avoid undesired Markdown formatting in the compiler output. In the future, please enclose verbatim text in triple backticks. Thanks!
@huttered40 could you post more info about your configuration and build? I was able to build kokkos-kernels on the pascal queue (rhel7G) on White. Here is my setup:
Tested with VOTD develop branch of kokkos and kokkos-kernels:
kokkos SHA: kokkos/kokkos@b18689e41716b7cb8d3f30e637f3ac500756f4cc
kokkos-kernels SHA: b26f4461655bd64b374827c1858b0ee2d9aa7219
Modules:
module load devpack/20180521/openmpi/2.1.2/gcc/7.2.0/cuda/9.2.88
Configuration:
I have kokkos and kokkos-kernels clone to my $HOME
directory.
KOKKOS_PATH=${HOME}/kokkos #path to kokkos source
KOKKOSKERNELS_SCALARS=double #the scalar types to instantiate =double,float...
KOKKOSKERNELS_LAYOUTS=LayoutLeft,LayoutRight #the layout types to instantiate.
KOKKOSKERNELS_ORDINALS=int #ordinal types to instantiate
KOKKOSKERNELS_OFFSETS=int #offset types to instantiate
KOKKOSKERNELS_PATH=../.. #path to kokkos-kernels top directory.
CXX=${KOKKOS_PATH}/bin/nvcc_wrapper #icpc #
KOKKOSKERNELS_OPTIONS=eti-only #options for kokkoskernels
KOKKOS_DEVICES="Cuda,Serial"
KOKKOS_ARCHS="Power8,Pascal60"
CXXFLAGS="-pedantic -O3 -g -Wshadow -Wsign-compare -Wtype-limits -Wuninitialized"
../../scripts/generate_makefile.bash --kokkoskernels-path=${KOKKOSKERNELS_PATH} --with-scalars=${KOKKOSKERNELS_SCALARS} --with-ordinals=${KOKKOSKERNELS_ORDINALS} --with-offsets=${KOKKOSKERNELS_OFFSETS} --kokkos-path=${KOKKOS_PATH} --with-devices=${KOKKOS_DEVICES} --arch=${KOKKOS_ARCHS} --compiler=${CXX} --with-options=${KOKKOSKERNELS_OPTIONS} --cxxflags="${CXXFLAGS}"
Interactive node session:
bsub -Is -n 1 -q rhel7G bash
Build library then tests:
make install-lib -j16
cd unit_test
make -j
Interactive node session:
bsub -Is -q rhel7G -n 32 bash
Module:
devpack/20180521/openmpi/3.1.0/gcc/7.2.0/cuda/9.2.88
Kokkos:
branch: develop
most recent commit hash: b18689e
Kokkos-kernels:
branch: develop
most recent commit hash: b26f446
Relevant part of Makefile:
CXXFLAGS = -O3 --expt-extended-lambda --expt-relaxed-constexpr# -std=c++14
KOKKOS_CXX_STANDARD=c++14 # Currently only works when using the develop branch of kokkos
LINK = ${CXX}
LDFLAGS =
EXE = test.cuda
KOKKOS_DEVICES = "Cuda"
KOKKOS_ARCH = "Power8,Pascal60" # For rhel-7G queue on White
KOKKOS_CUDA_OPTIONS += "enable_lambda"
My application is calling KokkosBatched::Experimental::TeamGemm<TransposeAType,TransposeBType,GemmAlgType>::invoke(...)
The error again is:
../kokkos-kernels/src/batched/KokkosBatched_Gemm_Team_Internal.hpp:136:27: internal compiler error: in maybe_undo_parenthesized_ref, at cp/semantics.c:1705 const int i = ij/nq*mb, j = ij%nq*nb;
@huttered40 if you modify the way to generate your makefile like below it should work (it worked for me) - use the --with-cuda-options
argument to set enable_lambda (this takes care of --expt-extended-lambda
) and set KOKKOS_CXXFLAGS="--expt-relaxed-constexpr"
KOKKOS_PATH=${HOME}/kokkos #path to kokkos source
KOKKOSKERNELS_SCALARS=double #the scalar types to instantiate =double,float...
KOKKOSKERNELS_LAYOUTS=LayoutLeft,LayoutRight #the layout types to instantiate.
KOKKOSKERNELS_ORDINALS=int #ordinal types to instantiate
KOKKOSKERNELS_OFFSETS=int #offset types to instantiate
KOKKOSKERNELS_PATH=../.. #path to kokkos-kernels top directory.
CXX=${KOKKOS_PATH}/bin/nvcc_wrapper #icpc #
KOKKOS_CXX_STANDARD=c++14
KOKKOS_CXXFLAGS="--expt-relaxed-constexpr"
KOKKOSKERNELS_OPTIONS=eti-only #options for kokkoskernels
KOKKOS_DEVICES="Cuda,Serial"
KOKKOS_ARCHS="Power8,Pascal60"
KOKKOS_CUDA_OPTION="enable_lambda" #"enable_lambda,force_uvm,rdc"
CXXFLAGS="-pedantic -O3 -g -Wshadow -Wsign-compare -Wtype-limits -Wuninitialized"
../../scripts/generate_makefile.bash --kokkoskernels-path=${KOKKOSKERNELS_PATH} --with-scalars=${KOKKOSKERNELS_SCALARS} --with-ordinals=${KOKKOSKERNELS_ORDINALS} --with-offsets=${KOKKOSKERNELS_OFFSETS} --kokkos-path=${KOKKOS_PATH} --with-devices=${KOKKOS_DEVICES} --arch=${KOKKOS_ARCHS} --compiler=${CXX} --with-options=${KOKKOSKERNELS_OPTIONS} --cxxflags="${CXXFLAGS}" --with-cuda-options=${KOKKOS_CUDA_OPTION}
@huttered40 oop, I didn't properly set KOKKOS_CXX_STANDARD=c++14
, when I did that I saw your failure. Cross-referencing your PR with fix here: #350
I don't think that we can handle the compiler error. The code is header only code and it is compiled within your code. It is a very unlucky case but I don't think that we can give much of help for this compiler error.
Probably have to grind this down to a reproducer for Nvidia since c++14 should be supported...
@ndellingwood Does kokkos officially support c++14 ?
@kyungjoo-kim good point, there isn't nightly testing with c++14 enabled so we shouldn't claim it is officially supported. I put in PR kokkos/kokkos#1913 so we can enable c++14 through generated makefiles and begin testing.
I am reopening this. We have multiple requests to support C++14. It doesn't have to be every version of every compiler with C++14 support as this is evolving. However, we do have to support gcc 7.2. Trilinos is moving the PR testing to gcc 7.2 very soon.
Adding @crtrott he said he'd also help look into this.
Apparently fixed in GCC 7.3: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=882855
At least the last 4 entries of the call stack are the same:
Debian Bug:
0x102ebfe3 maybe_undo_parenthesized_ref(tree_node*)
../.././gcc/cp/semantics.c:1704
0x1034eacf cp_fold
../.././gcc/cp/cp-gimplify.c:2141
0x1034f8b7 cp_fold_maybe_rvalue
../.././gcc/cp/cp-gimplify.c:2003
0x1034e5b7 cp_fold
../.././gcc/cp/cp-gimplify.c:2110
0x1022385b store_init_value(tree_node*, tree_node*, vec<tree_node*, va_gc, vl_embed>**, int)
../.././gcc/cp/typeck2.c:841
KokkosKernels:
0x102ebfe3 maybe_undo_parenthesized_ref(tree_node*)
../.././gcc/cp/semantics.c:1704
0x1034eacf cp_fold
../.././gcc/cp/cp-gimplify.c:2141
0x1034f8b7 cp_fold_maybe_rvalue
../.././gcc/cp/cp-gimplify.c:2003
0x1034e5b7 cp_fold
../.././gcc/cp/cp-gimplify.c:2110
0x102ba7db cp_build_binary_op(unsigned int, tree_code, tree_node*, tree_node*, int)
../.././gcc/cp/typeck.c:5243
What's the chance for work-around in GCC 7.2?
Looking into it. My guess is that it is pretty good that we can work around this. The compiler gets tripped up in something related to figuring out whether something is an rvalue or so. So adding some parenthesis, explicit casts, using a temporary instead of inline computing the value, etc. may avoid the trigger.
Ok found two options for this original code. The offending thing is capturing idx_j by reference in the inner most layer, where part of idx_j is coming from the argument to another inlined lambda.
Kokkos::parallel_for(Kokkos::TeamThreadRange(team,blockDim_j), [&] (const int j) {
const int idx_j = offset_j+j;
Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,blockDim_i), [&] (const int i) {
const int idx_i = offset_i+i;
A_scr(i,j) = idx_i<A.extent_int(0) && idx_j<A.extent_int(1) ? A(idx_i,idx_j) : ATV::zero();
});
});
Option 1: Capture by value in innermost-lambda:
Kokkos::parallel_for(Kokkos::TeamThreadRange(team,blockDim_j), [&] (const int j) {
const int idx_j = offset_j+j;
Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,blockDim_i), [=] (const int i) {
const int idx_i = offset_i+i;
A_scr(i,j) = idx_i<A.extent_int(0) && idx_j<A.extent_int(1) ? A(idx_i,idx_j) : ATV::zero();
});
});
Option2: move the offset calculation in the innermost loop:
Kokkos::parallel_for(Kokkos::TeamThreadRange(team,blockDim_j), [&] (const int j) {
Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,blockDim_i), [&] (const int i) {
const int idx_j = offset_j+j;
const int idx_i = offset_i+i;
A_scr(i,j) = idx_i<A.extent_int(0) && idx_j<A.extent_int(1) ? A(idx_i,idx_j) : ATV::zero();
});
});
My guess is that the second option is better. In any case we can ifdef this with C++ standard and GCC version.
Btw. this applies to all similar places in the code: 91, 118, 145, ...
@crtrott Is the original code still legal in C++ standards (nesting two lambdas and capruting values by reference) ? I have many places that follow the same pattern of this.
This is legal C++ (depending on what you do it might not be legal Kokkos though: remember the code must be valid when capturing by value, but capturing by reference may get better performance).
I also prefer the second option. Anyway you are a magician. How do you know that the compiler problem is due to capturing values by refernce ?
If you look at the call stack, the functions name indicate that it tries to optimize away expressions (fold), it tries to figure out if something is an rvalue and then crashes when it tries to optimize some reference access inside a parenthesis. This is all just educated guesses but looks like I guessed right ;-).
Ah I am working on the proper fix and will issue a pull request.
thanks.
Found a couple more places which could be resolved by making temporaries non-const ... I didn't ifdef those but put a comment in.
If somebody can run all the testing that would be great. Gotta get some other stuff done now.
Cross-reference #357 PR by @crtrott
The fix is in develop now.
Just wanted to let you know that I still got internal compiler error
on gcc-7.4.0 on Summit:
A_scr(i,j) = idx_i<A.extent_int(0) && idx_j<A.extent_int(1) ? A(idx_i,idx_j) : ATV::zero();
The following fixed it for me:
diff --git a/packages/kokkos-kernels/src/blas/impl/KokkosBlas3_gemm_impl.hpp b/packages/kokkos-kernels/src/blas/impl/KokkosBlas3_gemm_impl.hpp
index e68d031..da8a6a6 100644
--- a/packages/kokkos-kernels/src/blas/impl/KokkosBlas3_gemm_impl.hpp
+++ b/packages/kokkos-kernels/src/blas/impl/KokkosBlas3_gemm_impl.hpp
@@ -48,7 +48,7 @@
#ifdef KOKKOS_ENABLE_CXX14
#ifdef KOKKOS_COMPILER_GNU
-#if KOKKOS_COMPILER_GNU<=720
+#if KOKKOS_COMPILER_GNU<=740
#define KOKKOS_IMPL_BATCHED_GEMM_GCC_CXX14_WORKAROUND
#endif
#endif
Strange ... Did some one actually report the error ? @ndellingwood can we put a patch in ?
@srajama1 yeah, let me test this more carefully to confirm this also works with gcc/7.3; white has gcc/7.4 available but was not yet added to test_all_sandia which is how this slipped through, I'll make sure to get this coverage added as well.
I wasn't able to reproduce the issue testing with gcc/7.4 in a serial build on White with c++14 support enabled, here is my generated makefile options:
Generating Makefiles with options CXX=/home/projects/ppc64le/gcc/7.4.0/bin/g++ KOKKOS_DEVICES=Serial KOKKOS_ARCH=Power8 CXXFLAGS="-O3 -Werror -Wall -Wshadow -pedantic -Wsign-compare -Wtype-limits -Wignored-qualifiers -Wempty-body -Wclobbered -Wuninitialized " KOKKOS_CXX_STANDARD="c++14" LDFLAGS="-O3 " GTEST_PATH=/ascldap/users/ndellin/kokkos/tpls/gtest KOKKOSKERNELS_OPTIONS=eti-only,blas-mangle_ KOKKOS_PATH=/ascldap/users/ndellin/kokkos KOKKOSKERNELS_PATH=/ascldap/users/ndellin/kokkos-kernels
I'll test with the changes suggested by @aprokop next.
Same test config passed with the suggested change, I'll test more completely and then put in the PR with the change and updated scripts to make sure gcc/7.4 is also tested..
Of note, it was part of the Trilinos config, and I used -DCMAKE_CXX_STANDARD=14
and not specify any other cxx11 related flags (like -DTrilinos_CXX11_FLAGS
.
I am getting an internal compiler error when running KokkosBatched::Experimental::TeamGemm on White machine - rhel 7G queue. The GCC compiler version is 7.2.0 and I tried 6.4.0 as well, both with same issue. This error does not occur when running on Bowman with GCC 4.9.3. Most of the stack trace is posted below: