trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.21k stars 566 forks source link

Teuchos stacked timer build failure in all ATDM builds #4145

Closed fryeguy52 closed 5 years ago

fryeguy52 commented 5 years ago

CC: @trilinos/teuchos, @jwillenbring (Trilinos Framework Product Lead), @bartlettroscoe, @fryeguy52

Next Action Status

Merge of fixing PR #4146 to 'develop' on 1/7/2019 resulted in 100% clean ATDM CUDA builds on 1/9/2019.

Description

As shown in this query the test TeuchosComm_stacked_timer_MPI_2 is not being run in all ATDM builds due to a build failure

Error building packages/teuchos/comm/test/StackedTimer/CMakeFiles/TeuchosComm_stacked_timer.dir/stacked_timer.cpp.o

/scratch/rabartl/Trilinos.base/NightlyBuilds/Trilinos-atdm-cee-rhel6-clang-5.0.1-openmpi-1.10.2-serial-static-opt/SRC_AND_BUILD/Trilinos/packages/teuchos/comm/test/StackedTimer/stacked_timer.cpp:357:3: error: use of undeclared identifier 'Kokkos'
  Kokkos::initialize(argc,argv);
  ^
/scratch/rabartl/Trilinos.base/NightlyBuilds/Trilinos-atdm-cee-rhel6-clang-5.0.1-openmpi-1.10.2-serial-static-opt/SRC_AND_BUILD/Trilinos/packages/teuchos/comm/test/StackedTimer/stacked_timer.cpp:365:7: error: use of undeclared identifier 'Kokkos'
  if (Kokkos::is_initialized())
      ^
/scratch/rabartl/Trilinos.base/NightlyBuilds/Trilinos-atdm-cee-rhel6-clang-5.0.1-openmpi-1.10.2-serial-static-opt/SRC_AND_BUILD/Trilinos/packages/teuchos/comm/test/StackedTimer/stacked_timer.cpp:366:5: error: use of undeclared identifier 'Kokkos'
    Kokkos::finalize_all();
    ^
3 errors generated.

The new commits on the day this started failing are:

65c3205:  Teuchos: Fix StackedTimer reporting
Author: Roger Pawlowski <rppawlo@sandia.gov>
Date:   Thu Jan 3 19:16:17 2019 -0700

M   packages/teuchos/comm/src/Teuchos_StackedTimer.hpp
M   packages/teuchos/comm/test/StackedTimer/stacked_timer.cpp

2dbd8c5:  Teuchos: StackedTimer now supports kokkos profiling space_time_stack
Author: Roger Pawlowski <rppawlo@sandia.gov>
Date:   Thu Jan 3 18:10:29 2019 -0700

M   packages/panzer/adapters-stk/example/MixedPoissonExample/main.cpp
M   packages/teuchos/comm/src/Teuchos_StackedTimer.hpp
M   packages/teuchos/comm/src/Teuchos_TimeMonitor.cpp
M   packages/teuchos/comm/test/StackedTimer/CMakeLists.txt
M   packages/teuchos/comm/test/StackedTimer/stacked_timer.cpp

@rppawlo can you look into this?

Steps to Reproduce

One should be able to reproduce this failure on with a sems rhel6 environment as described in:

More specifically, the commands given for with a sems rhel6 environment are provided at:

The exact commands to reproduce this issue should be:

$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh Trilinos-atdm-sems-rhel6-gnu-opt-openmp
$ cmake \
 -GNinja \
 -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
 -DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Teuchos=ON \
 $TRILINOS_DIR
$ make NP=16
$ ctest -j8
rppawlo commented 5 years ago

@fryeguy52 - the commands to reproduce don't seem to work:

[rppawlo@gge BUILD4]$ source /ascldap/users/cmake/std/atdm/load-env.sh Trilinos-atdm-sems-rhel6
users/
[rppawlo@gge BUILD4]$ source /ascldap/users/cmake/std/atdm/load-env.sh Trilinos-atdm-sems-rhel6
users/
[rppawlo@gge BUILD4]$ source /ascldap/users/rppawlo/Trilinos/cmake/std/atdm/load-env.sh Trilinos-atdm-sems-rhel6
Hostname 'gge.srn.sandia.gov' matches known ATDM host 'sems-rhel6' and system 'sems-rhel6'
Setting compiler and build options for buld name 'Trilinos-atdm-sems-rhel6'

***
*** ERROR: A compiler was not specified in 'Trilinos-atdm-sems-rhel6'!
***
Using SEMS RHEL6 compiler stack GNU to build DEBUG code with Kokkos node type SERIAL
[rppawlo@gge BUILD4]$ 
fryeguy52 commented 5 years ago

@rppawlo - sorry about that Roger I updated the instructions above. It should say

source $TRILINOS_DIR/cmake/std/atdm/load-env.sh Trilinos-atdm-sems-rhel6-gnu-opt-openmp
bartlettroscoe commented 5 years ago

CC: @rppawlo, @fryeguy52

@trilinos/framework, given that this is failing in in every ATDM Trilinos build, then how did this make it through the Trilinos PR testing? What is different about the PR test builds from all of the ATDM Trilinos builds that allowed this build failure to slip through PR testing? We need to figure that out so that at least one of the Trilinos PR builds.

rppawlo commented 5 years ago

It looks like this build either disables kokkos or kokkos profiling define (my bet is the latter).

rppawlo commented 5 years ago

fix is trivial. will have a PR up in a minute.

fryeguy52 commented 5 years ago

Thanks @rppawlo

bartlettroscoe commented 5 years ago

@rppawlo said:

It looks like this build either disables kokkos or kokkos profiling define (my bet is the latter).

Yup, that is it. The ATDM Trilinos configuration has Kokkos_ENABLE_Profiling=OFF set. That came for the EMPIRE configuration for Trilinos.

Is there a good reason why we should not just remove that disable from the ATDM Trilinos configuration? Seems it would be best to change a few options as we need to change from a basic configuration of Trilinos.

mhoemmen commented 5 years ago

We should enable Kokkos profiling by default. The only cost to Kokkos profiling is that entrance and exit of parallel regions does an if on a function pointer to see if it's null.

bartlettroscoe commented 5 years ago

All of our CUDA builds are clean today a shown here after the merge of PR #4146.

Closing as complete.

Thanks @rppawlo !