trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.21k stars 568 forks source link

Did CMake 3.10.0 requirement break the check-in test script? #3628

Closed mhoemmen closed 3 years ago

mhoemmen commented 6 years ago

@trilinos/framework @bartlettroscoe

The latest changes that require CMake 3.10.0 seem to have broken the check-in test script. I invoked the script like this:

.../Trilinos/checkin-test.py --ctest-timeout=400 --disable-packages=PyTrilinos,Claps,TriKota,Domi,STKSearch,Moertel,Shards --skip-case-no-email --allow-no-pull --enable-all-packages=off --default-builds= --extra-builds=MPI_DEBUG_EX --enable-packages=TpetraCore,Zoltan2,Amesos2 --configure

with the following modules loaded:

  1) sems-env                         3) sems-cmake/3.12.2                5) sems-openmpi/1.10.1              7) sems-boost/1.59.0/base           9) sems-hdf5/1.8.12/parallel       11) sems-zlib/1.2.8/base
  2) kokkos-env                       4) sems-gcc/4.9.3                   6) sems-python/2.7.9                8) sems-superlu/4.3/base           10) sems-netcdf/4.4.1/exo_parallel  12) sems-parmetis/4.0.3/parallel

I get the following output:

...
B) Do the configuration with CMake (MPI_DEBUG_EX) ...

Running: rm CMakeCache.txt

Running: rm -rf CMakeFiles

Running: ./do-configure

  Writing console output to file configure.out ...

  Runtime for command = 0.302733 minutes

Configure failed returning 1!

Traceback (most recent call last):
  File "/scratch/prj/Trilinos/Trilinos/cmake/tribits/ci_support/CheckinTest.py", line 1563, in runBuildTestCase
    raise Exception("Configure failed!")
Exception: Configure failed!

E) Analyze the overall results and send email notification (MPI_DEBUG_EX) ...

E.1) Determine what passed and failed ...

The pull step was not performed!

The configure FAILED!

Should I consider the check-in test script dead? It was a useful tool & I'm sad to see it go.

bartlettroscoe commented 6 years ago

@mhoemmen, the checkint-test.py script is not dead. I use it every day to test ATDM Trilinos builds.

Can you attach the generated file MPI_DEBUG_EX/configure.out?

mhoemmen commented 6 years ago

Thanks @bartlettroscoe ! Here are the relevant bits of that file. This may be a CMake options issue.

-- Found TPL 'Boost' include dirs '/projects/sems/install/rhel6-x86_64/sems/tpl/boost/1.59.0/gcc/4.9.3/base/include'
-- TPL_Boost_INCLUDE_DIRS='/projects/sems/install/rhel6-x86_64/sems/tpl/boost/1.59.0/gcc/4.9.3/base/include'
Processing enabled TPL: ParMETIS (enabled explicitly, disable with -DTPL_ENABLE_ParMETIS=OFF)
-- ParMETIS_LIBRARY_NAMES='parmetis;metis'
-- Searching for libs in ParMETIS_LIBRARY_DIRS='/projects/sems/install/rhel6-x86_64/sems/tpl/parmetis/4.0.3/gcc/4.9.3/openmpi/1.10.1/lib'
-- Searching for a lib in the set "parmetis":
--   Searching for lib 'parmetis' ...
-- NOTE: Did not find a lib in the lib set "parmetis" for the TPL 'ParMETIS'!
-- ERROR: Could not find the libraries for the TPL 'ParMETIS'!
-- TIP: If the TPL 'ParMETIS' is on your system then you can set:
     -DParMETIS_LIBRARY_DIRS='<dir0>;<dir1>;...'
   to point to the directories where these libraries may be found.
   Or, just set:
     -DTPL_ParMETIS_LIBRARIES='<path-to-libs0>;<path-to-libs1>;...'
   to point to the full paths for the libraries which will
   bypass any search for libraries and these libraries will be used without
   question in the build.  (But this will result in a build-time error
   if not all of the necessary symbols are found.)
-- ERROR: Failed finding all of the parts of TPL 'ParMETIS' (see above), Aborting!

-- Performing Test HAVE_PARMETIS_VERSION_4_0_3
-- Performing Test HAVE_PARMETIS_VERSION_4_0_3 - Success
-- NOTE: The find module file for this failed TPL 'ParMETIS' is:
     /scratch/prj/Trilinos/Trilinos/cmake/TPLs/FindTPLParMETIS.cmake
   which is pointed to in the file:
     /scratch/prj/Trilinos/Trilinos/TPLsList.cmake

TIP: Even though the TPL 'ParMETIS' was explicitly enabled in input,
it can be disabled with:
  -DTPL_ENABLE_ParMETIS=OFF
which will disable it and will recursively disable all of the
downstream packages that have required dependencies on it.
When you reconfigure, just grep the cmake stdout for 'ParMETIS'
and then follow the disables that occur as a result to see what impact
this TPL disable has on the configuration of Trilinos.

CMake Error at cmake/tribits/core/package_arch/TribitsProcessEnabledTpl.cmake:144 (MESSAGE):
  ERROR: TPL_ParMETIS_NOT_FOUND=TRUE, aborting!
Call Stack (most recent call first):
  cmake/tribits/core/package_arch/TribitsGlobalMacros.cmake:1711 (TRIBITS_PROCESS_ENABLED_TPL)
  cmake/tribits/core/package_arch/TribitsProjectImpl.cmake:202 (TRIBITS_PROCESS_ENABLED_TPLS)
  cmake/tribits/core/package_arch/TribitsProject.cmake:93 (TRIBITS_PROJECT_IMPL)
  CMakeLists.txt:90 (TRIBITS_PROJECT)

-- Configuring incomplete, errors occurred!
bartlettroscoe commented 6 years ago

@mhoemmen can you please provide exact instructions to reproduce this error?

Using the standard checkin-test-sems.sh script, I was not able to reproduce this problem. For Trilinos 'develop' version 72985ec:

72985ec "Merge Pull Request #3621 from bartlettroscoe/Trilinos/3611-remove-sundance-inserted-package"
Author: trilinos-autotester <trilinos-autotester@trilinos.org>
Date:   Sun Oct 14 19:46:11 2018 -0600 (22 hours ago)

I ran:

$ ./checkin-test-sems.sh --enable-packages=Teuchos --no-enable-fwd-packages --local-do-all

and the configure output showed:

Processing enabled TPL: ParMETIS (enabled explicitly, disable with -DTPL_ENABLE_ParMETIS=OFF)
-- ParMETIS_LIBRARY_NAMES='parmetis;metis'
-- Searching for libs in ParMETIS_LIBRARY_DIRS='/projects/sems/install/rhel6-x86_64/sems/tpl/parmetis/4.0.3/gcc/4.8.4/openmpi/1.10.1/parallel/lib'
-- Searching for a lib in the set "parmetis":
--   Searching for lib 'parmetis' ...
--     Found lib '/projects/sems/install/rhel6-x86_64/sems/tpl/parmetis/4.0.3/gcc/4.8.4/openmpi/1.10.1/parallel/lib/libparmetis.a'
-- Searching for a lib in the set "metis":
--   Searching for lib 'metis' ...
--     Found lib '/projects/sems/install/rhel6-x86_64/sems/tpl/parmetis/4.0.3/gcc/4.8.4/openmpi/1.10.1/parallel/lib/libmetis.a'
-- TPL_ParMETIS_LIBRARIES='/projects/sems/install/rhel6-x86_64/sems/tpl/parmetis/4.0.3/gcc/4.8.4/openmpi/1.10.1/parallel/lib/libparmetis.a;/projects/sems/install/rhel6-x86_64/sems/tpl/parmetis/4.0.3/gcc/4.8.4/openmpi/1.10.1/parallel/lib/libmetis.a'
-- TPL_ParMETIS_INCLUDE_DIRS='/projects/sems/install/rhel6-x86_64/sems/tpl/parmetis/4.0.3/gcc/4.8.4/openmpi/1.10.1/parallel/include'
-- Performing Test HAVE_PARMETIS_VERSION_4_0_3
-- Performing Test HAVE_PARMETIS_VERSION_4_0_3 - Success
Processing enabled TPL: Zlib (enabled explicitly, disable with -DTPL_ENABLE_Zlib=OFF)

...

Finished configuring Trilinos!

-- Configuring done
-- Generating done
-- Build files have been written to: /home/rabartl/Trilinos.base/BUILDS/CHECKIN/MPI_RELEASE_DEBUG_SHARED_PT_OPENMP
mhoemmen commented 6 years ago

@bartlettroscoe Let me see if I need to fix my CMake options -- thanks!

bartlettroscoe commented 6 years ago

@mhoemmen, note that there is no automated testing that I know of with sems-cmake/3.12.2 so there might be a defect in that version of CMake (or behavior that is causing this problem). Not sure, but I coulid try to reproduce if you give me exact reproduciblility instructions.

dridzal commented 6 years ago

I can confirm that the latest changes have broken the script, although this may have to do with the default sems environments and not just cmake. I get a bunch of errors like:

sems-openmpi/1.10.1(34):ERROR:102: Tcl command execution failed: if {[module-info mode switch]} {
  set local_compiler_version $env(SEMS_OPENMPI_LOCAL_COMPILER_VERSION)
} elseif {[module-info mode remove]} {
  set local_compiler_version $env(SEMS_OPENMPI_LOCAL_COMPILER_VERSION)
  unsetenv SEMS_OPENMPI_LOCAL_COMPILER_VERSION
} else {
  set local_compiler_version [semsModuleSupport::getCurrentVersion gcc]
  setenv SEMS_OPENMPI_LOCAL_COMPILER_VERSION $local_compiler_version
}

for every loaded sems module. Have the requirements changed in terms of how to load modules prior to running these scripts? It all used to be automatic. Is there a module purge needed somewhere?

dridzal commented 6 years ago

@bartlettroscoe @mhoemmen @trilinos/framework OK, this is actually pretty bad. After running the script, 'module purge' in my terminal window fails. Is there a new requirement on how a bashrc file must be set up if we want to use the checkin script? Mine simply contains

module load sems-devpack-gcc/6.1.0 module load sems-gdb module load sems-doxygen module load sems-git module load sems-cmake module load sems-tex module load sems-subversion

After I run the script, and run module purge, I get a bunch of errors. Then, module list returns:

Currently Loaded Modulefiles:
  1) /projects/sems/modulefiles/rhel6-x86_64/sems/compiler/sems-gcc/6.1.0
  2) /projects/sems/modulefiles/rhel6-x86_64/sems/compiler/sems-openmpi/1.10.1
  3) /projects/sems/modulefiles/rhel6-x86_64/sems/compiler/sems-python/2.7.9
  4) /projects/sems/modulefiles/rhel6-x86_64/sems/tpl/sems-boost/1.63.0/base
  5) /projects/sems/modulefiles/rhel6-x86_64/sems/tpl/sems-hdf5/1.8.12/parallel
  6) /projects/sems/modulefiles/rhel6-x86_64/sems/tpl/sems-netcdf/4.4.1/exo_parallel
  7) /projects/sems/modulefiles/rhel6-x86_64/sems/tpl/sems-parmetis/4.0.3/64bit_parallel
  8) /projects/sems/modulefiles/rhel6-x86_64/sems/tpl/sems-scotch/6.0.3/nopthread_64bit_parallel
  9) /projects/sems/modulefiles/rhel6-x86_64/sems/tpl/sems-superlu/5.2.1/base
 10) /projects/sems/modulefiles/rhel6-x86_64/sems/tpl/sems-yaml_cpp/0.5.3/base
 11) /projects/sems/modulefiles/rhel6-x86_64/sems/tpl/sems-zlib/1.2.8/base

Contrast that with running module list in the terminal before the script is run:

Currently Loaded Modulefiles:
  1) sems-env                                    12) sems-gcc/6.1.0
  2) sems-devpack-gcc/6.1.0                      13) sems-openmpi/1.10.1
  3) sems-gdb/7.9.1                              14) sems-python/2.7.9
  4) sems-doxygen/1.8.8                          15) sems-boost/1.63.0/base
  5) sems-git/2.10.1                             16) sems-hdf5/1.8.12/parallel
  6) sems-cmake/3.10.3                           17) sems-netcdf/4.4.1/exo_parallel
  7) sems-tex/2015                               18) sems-parmetis/4.0.3/64bit_parallel
  8) sems-apr/1.5.2                              19) sems-scotch/6.0.3/nopthread_64bit_parallel
  9) sems-apr_util/1.5.4                         20) sems-superlu/5.2.1/base
 10) sems-serf/1.3.8                             21) sems-yaml_cpp/0.5.3/base
 11) sems-subversion/1.7.19                      22) sems-zlib/1.2.8/base

Note the differences in the presence/absence of directory prefixes. Something doesn't add up ...

dridzal commented 6 years ago

Another piece of information: module purge fails in the terminal window regardless of the checkin script, with a bunch of messages of the type

Tcl command execution failed

Is this a SEMS issue?

bartlettroscoe commented 6 years ago

@dridzal, let me talk a look and see what is happening by running the checkin-test-sems.sh script. STay tuned.

dridzal commented 6 years ago

@bartlettroscoe , before you do that, just try running module purge in your terminal window. I get a bunch of errors. I load modules through bashrc. This started happening after I logged out and logged back into the system (so you may have to do the same). My modules are:

module load sems-devpack-gcc/6.1.0 module load sems-gdb module load sems-doxygen module load sems-git module load sems-cmake module load sems-tex module load sems-subversion

I have filed an issue with SEMS.

bartlettroscoe commented 6 years ago

@dridzal,

On my CEE LAN RHEL6 machine 'ceerws11131 that loads the SEMS NFS env, I just ran:

$ module purge
[rabartl@ceerws1113 Trilinos (develop)]$ . cmake/load_sems_dev_env.sh 
[rabartl@ceerws1113 Trilinos (develop)]$ module list
Currently Loaded Modulefiles:
  1) sems-env                                     6) atdm-ninja_fortran/1.7.2                    11) sems-hdf5/1.8.12/parallel
  2) atdm-env                                     7) sems-gcc/4.8.4                              12) sems-netcdf/4.4.1/exo_parallel
  3) sems-python/2.7.9                            8) sems-openmpi/1.10.1                         13) sems-parmetis/4.0.3/parallel
  4) atdm-cmake/3.11.1                            9) sems-boost/1.63.0/base                      14) sems-scotch/6.0.3/nopthread_64bit_parallel
  5) sems-git/2.10.1                             10) sems-zlib/1.2.8/base                        15) sems-superlu/4.3/base

And then I did:

. cmake/load_sems_dev_env.sh sems-gcc/6.1.0
[rabartl@ceerws1113 Trilinos (develop)]$ module list
Currently Loaded Modulefiles:
  1) sems-env                                     6) atdm-ninja_fortran/1.7.2                    11) sems-hdf5/1.8.12/parallel
  2) atdm-env                                     7) sems-gcc/6.1.0                              12) sems-netcdf/4.4.1/exo_parallel
  3) sems-python/2.7.9                            8) sems-openmpi/1.10.1                         13) sems-parmetis/4.0.3/parallel
  4) atdm-cmake/3.11.1                            9) sems-boost/1.63.0/base                      14) sems-scotch/6.0.3/nopthread_64bit_parallel
  5) sems-git/2.10.1                             10) sems-zlib/1.2.8/base                        15) sems-superlu/4.3/base
dridzal commented 6 years ago

@bartlettroscoe I think the issue may be in loading the modules automatically through bashrc. Do you do this, or do you always load them manually? If I comment out the modules in bashrc, log out, and log back in, and then manually load the modules, it all seems to work. I don't know why this behavior has changed in the last few weeks.

bartlettroscoe commented 6 years ago

Hum, looks like something is wrong with the SEMS modules on the CEE LAN. For the standard GCC 4.8.4 build for Trilinos with:

$ ./checkin-test-sems.sh --enable-packages=Kokkos --no-enable-fwd-packages \
  --local-do-all --wipe-clean

I get all failing tests:

  Configure: Passed (0.25 min)
  Build: Passed (2.45 min)
  Test: FAILED (0.01 min)

  0% tests passed, 27 tests failed out of 27

  Subproject Time Summary:
  Kokkos    =   3.96 sec*proc (27 tests)

  Total Test time (real) =   0.46 sec

  The following tests FAILED:
      1 - KokkosCore_UnitTest_Serial_MPI_1 (Failed)
      2 - KokkosCore_UnitTest_OpenMP_MPI_1 (Failed)
      3 - KokkosCore_UnitTest_OpenMPInterOp_MPI_1 (Failed)
      4 - KokkosCore_UnitTest_Default_MPI_1 (Failed)
      5 - KokkosCore_UnitTest_PushFinalizeHook_MPI_1 (Failed)
      6 - KokkosCore_UnitTest_PushFinalizeHook_terminate (Failed)
      7 - KokkosCore_UnitTest_DefaultInit_1_MPI_1 (Failed)
      8 - KokkosCore_UnitTest_DefaultInit_2_MPI_1 (Failed)
      9 - KokkosCore_UnitTest_DefaultInit_3_MPI_1 (Failed)
     10 - KokkosCore_UnitTest_DefaultInit_4_MPI_1 (Failed)
     11 - KokkosCore_UnitTest_DefaultInit_5_MPI_1 (Failed)
     12 - KokkosCore_UnitTest_DefaultInit_6_MPI_1 (Failed)
     13 - KokkosCore_UnitTest_DefaultInit_7_MPI_1 (Failed)
     14 - KokkosCore_UnitTest_DefaultInit_8_MPI_1 (Failed)
     15 - KokkosCore_UnitTest_DefaultInit_9_MPI_1 (Failed)
     16 - KokkosCore_UnitTest_DefaultInit_10_MPI_1 (Failed)
     17 - KokkosCore_UnitTest_DefaultInit_11_MPI_1 (Failed)
     18 - KokkosCore_UnitTest_DefaultInit_12_MPI_1 (Failed)
     19 - KokkosCore_UnitTest_DefaultInit_13_MPI_1 (Failed)
     20 - KokkosCore_UnitTest_DefaultInit_14_MPI_1 (Failed)
     21 - KokkosCore_UnitTest_DefaultInit_15_MPI_1 (Failed)
     22 - KokkosCore_UnitTest_DefaultInit_16_MPI_1 (Failed)
     23 - KokkosCore_UnitTest_HWLOC_MPI_1 (Failed)
     24 - KokkosCore_UnitTest_HostBarrier_MPI_1 (Failed)
     25 - KokkosContainers_UnitTest_Serial_MPI_1 (Failed)
     26 - KokkosContainers_UnitTest_OpenMP_MPI_1 (Failed)
     27 - KokkosAlgorithms_UnitTest_MPI_1 (Failed)
  Errors while running CTest

The runtime error shows:

1: /scratch/rabartl/Trilinos.base/BUILDS/CHECKIN/MPI_RELEASE_DEBUG_SHARED_PT_OPENMP/packages/kokkos/core/unit_test/KokkosCore_UnitTest_Serial.exe: /projects/sems/install/rhel6-x86_64/sems/compiler/gcc/4.8.4/base/lib64/libgomp.so.1: version `GOMP_4.0' not found (required by /scratch/rabartl/Trilinos.base/BUILDS/CHECKIN/MPI_RELEASE_DEBUG_SHARED_PT_OPENMP/packages/kokkos/core/unit_test/KokkosCore_UnitTest_Serial.exe)

I am guessing that the updated the CEE LAN RHEL6 machines and now some of the SEMS envs are now broken on these machines?

How is it that the Trilinos PR builds are not showing problems like this?

bartlettroscoe commented 6 years ago

@dridzal, on my CEE LAN machine 'ceerws1113', I don't do anything to get the SEMS modules defined in my .bashrc file or my .bash_profile file. As soon as I log into that machine I have:

$ module list
No Modulefiles Currently Loaded.

$ module load sems-env
$ module load sems-git/2.10.1

$ module list
Currently Loaded Modulefiles:
  1) sems-env          2) sems-git/2.10.1

What happens for you?

bartlettroscoe commented 6 years ago

FYI: The sems-gcc/7.3.0 env seems to be fine on CEE RHEL6 machines. I just did:

$ . /scratch/rabartl/Trilinos.base/Trilinos/cmake/load_sems_dev_env.sh sems-gcc/7.3.0

$ module list
Currently Loaded Modulefiles:
  1) sems-env                                     6) atdm-ninja_fortran/1.7.2                    11) sems-hdf5/1.8.12/parallel
  2) atdm-env                                     7) sems-gcc/7.3.0                              12) sems-netcdf/4.4.1/exo_parallel
  3) sems-python/2.7.9                            8) sems-openmpi/1.10.1                         13) sems-parmetis/4.0.3/parallel
  4) atdm-cmake/3.11.1                            9) sems-boost/1.63.0/base                      14) sems-scotch/6.0.3/nopthread_64bit_parallel
  5) sems-git/2.10.1                             10) sems-zlib/1.2.8/base                        15) sems-superlu/4.3/base

$  env TRILINOS_CHECKIN_TEST_SEMS_SKIP_MODULE_LOAD=1 \
  ./checkin-test-sems.sh --enable-packages=Kokkos --no-enable-fwd-packages --local-do-all \
 --wipe-clean

and it returned:

  Configure: Passed (0.22 min)
  Build: Passed (2.71 min)
  Test: Passed (0.61 min)

  100% tests passed, 0 tests failed out of 27

  Subproject Time Summary:
  Kokkos    = 106.76 sec*proc (27 tests)

  Total Test time (real) =  36.86 sec

I don't understand how the Trilinos GCC 4.8.4 PR build is not also broken like this.

dridzal commented 6 years ago

@bartlettroscoe Well, if I don't put anything into bashrc then I can load the modules manually like you did and things seem to work. But that means that every time I open a terminal I would have to load the modules. Even if I'm not working on code development, I need tex or git or subversion, etc. What is the proper way to automatically load modules?

mhoemmen commented 6 years ago

It's somewhat to be expected that you might have to purge & reload modules now and then, no?

bartlettroscoe commented 6 years ago

I logged out and logged back in again and this time when I ran:

$ ./checkin-test-sems.sh --enable-packages=Kokkos --no-enable-fwd-packages \
  --local-do-all --wipe-clean

which returned:

  Configure: Passed (0.18 min)
  Build: Passed (2.22 min)
  Test: Passed (0.71 min)

  100% tests passed, 0 tests failed out of 27

  Subproject Time Summary:
  Kokkos    = 117.39 sec*proc (27 tests)

  Total Test time (real) =  42.38 sec

So everything seems to be okay.

dridzal commented 6 years ago

@mhoemmen the issue is that if I load the modules in bashrc, then 'module purge' fails. I believe that the checkin script will attempt a purge when loading the sems environemnt -- right, @bartlettroscoe ? I guess I could write an alias to load the modules I need, and then run that alias command every time I open a terminal window, but that's still inconvenient. If I load the modules in bashrc then 'module purge' fails. 'module clear' will work, but you're prompted for confirmation, so we can't replace purge with clear in the checkin script.

bartlettroscoe commented 6 years ago

the checkin script will attempt a purge when loading the sems environemnt -- right

Correct. It has to.

My CEE LAN setup loads the modules for .bash_profile by default and I have no problem. I can send you my .bashrc and .bash_profile files from my CEE RHEL6 machine offline for you to view and perhaps try out.

bartlettroscoe commented 6 years ago

FYI:

After I re-synced the SEMS env to my CSRI RHEL6 machine, I logged back in again and did:

$  ./checkin-test-sems.sh --enable-packages=Kokkos --no-enable-fwd-packages \
   --local-do-all --wipe-clean

and it passed 100% with:

  Configure: Passed (0.11 min)
  Build: Passed (0.89 min)
  Test: Passed (0.67 min)

  100% tests passed, 0 tests failed out of 27

  Subproject Time Summary:
  Kokkos    = 108.49 sec*proc (27 tests)

  Total Test time (real) =  40.31 sec

So that is it. You just need to make sure your SEMS env is up to date and you need to log out and log back in again and everything should work on a CEE RHEL6 machine and a CSRI (COE) RHEL6 machine using the SEMS modules.

dridzal commented 6 years ago

@bartlettroscoe thanks for your input; following your example, I moved the module loads to the .bash_profile script (from the .bashrc script). After logging out, this seems to have done the trick, i.e., 'module purge' works again and the checkin script can proceed. The question remains why this behavior changed on my RHEL6 machine, and why it coincided with the move to cmake 3.10.0. In any case, this is resolved.

bartlettroscoe commented 6 years ago

@dridzal, glad to hear this is resolved for you. Is this issue ready to close?

dridzal commented 6 years ago

Yes, closed.

bartlettroscoe commented 6 years ago

@dridzal, are their problems not yet resolved related this this?

dridzal commented 6 years ago

@bartlettroscoe I'm reopening this issue and adding @fryeguy52 . There seems to be a fundamental difference in how the environment variables are processed by cmake or FindTPL after the cmake upgrade. The issue has been confirmed under RHEL6 by me and under RHEL7 by @fryeguy52 . For example, before this upgrade, I would load the modules, and cmake (and/or FindTPL) would find all TPLs. Now, I have to add the following lines to my configure script or my bash_profile:

export NetCDF_ROOT=$SEMS_NETCDF_ROOT export PNetCDF_ROOT=$SEMS_NETCDF_ROOT export HDF5_ROOT=$SEMS_HDF5_ROOT export Boost_ROOT=$SEMS_BOOST_ROOT export BOOST_ROOT=$SEMS_BOOST_ROOT

Obviously, doing this for every TPL and maintaining the script is error-prone. Just figuring out the capitalization of the variables is a huge pain. The question here is why NetCDF_ROOT, etc., were found prior to the cmake upgrade, and why they now must be manually set to the corresponding SEMS variable (SEMS_NETCDF_ROOT, etc.). On a related note, this doesn't seem to be an issue for check-in testing (!!), so I wonder how you got around manually setting the expected TPL variables to their SEMS equivalents. Further, when you load the SEMS modules, does

echo $NetCDF_ROOT

give you anything?

bartlettroscoe commented 6 years ago

@dridzal, my advice is to follow the example of the Trilinos/cmake/std/sems/SEMSDevEnv.cmake file and just explicitly set the include directories and libraries. That remove any changes in find behavior for different versions of CMake or the find modules. Auto-find behavior changing is the number-one problem with porting problems with the configuration and building of software.

Can't you just use the SEMS env modules for your work?

jhux2 commented 6 years ago

Now, I have to add the following lines to my configure script or my bash_profile:

export NetCDF_ROOT=$SEMS_NETCDF_ROOT export PNetCDF_ROOT=$SEMS_NETCDF_ROOT export HDF5_ROOT=$SEMS_HDF5_ROOT export Boost_ROOT=$SEMS_BOOST_ROOT export BOOST_ROOT=$SEMS_BOOST_ROOT

@dridzal Fwiw, I've always specified TPLs locations in a base configure script:

  -D SuperLU_LIBRARY_DIRS:PATH="${SEMS_SUPERLU_LIBRARY_PATH}"
  -D SuperLU_INCLUDE_DIRS:PATH="${SEMS_SUPERLU_INCLUDE_PATH}"

All my other configure scripts call this base script. Whether a TPL is actually enabled is handled in the upper scripts, e.g., -D TPL_ENABLE_SuperLU:BOOL=ON.

dridzal commented 6 years ago

my advice is to follow the example of the Trilinos/cmake/std/sems/SEMSDevEnv.cmake file and just explicitly set the include directories and libraries.

@bartlettroscoe Looking at this file, you set, for example,

SEMS_SELECT_TPL_ROOT_DIR(NETCDF Netcdf_ROOT
  PARALLEL_EXT "exo_parallel" SERIAL_EXT "exo")
#PRINT_VAR(Netcdf_ROOT)
SET(TPL_Netcdf_INCLUDE_DIRS "${Netcdf_ROOT}/include;${TPL_HDF5_INCLUDE_DIRS}"
  CACHE PATH "Set in SEMSDevEnv.cmake")
SET(Netcdf_LIBRARY_DIRS "${Netcdf_ROOT}/lib;${HDF5_LIBRARY_DIRS}"
  CACHE PATH "Set in SEMSDevEnv.cmake")

Who gives you Netcdf_ROOT? This is not available when you load the sems modules (it may have been at some point, but it definitely no longer is). Currently only SEMS_NETCDF_ROOT is available to query.

@jhux2 it seems like this (trivial) mapping of tpl_path to SEMS_tpl_path should be automated. This used to work; I never had to specify the mapping explicitly. So, somehow, in the recent upgrades we lost some of this functionality.

bartlettroscoe commented 6 years ago

Who gives you Netcdf_ROOT? This is not available when you load the sems modules (it may have been at some point, but it definitely no longer is). Currently only SEMS_NETCDF_ROOT is available to query.

@dridzal, if you dig into SEMS_SELECT_TPL_ROOT_DIR(), you will see it grabs the location of Netcdf from the env var SEMS_NETCDF_ROOT. The var Netcdf_ROOT is the returned CMake var, not an env var. That function SEMS_SELECT_TPL_ROOT_DIR() works for MPI and non-MPI builds. This approach is a little more complex but it gives you both MPI and non-MPI builds with one *.cmake file.

dridzal commented 6 years ago

@bartlettroscoe , I see, thanks. So, how would I load the full SEMS env without having to worry about setting the paths manually? I basically want the full SEMS checkin test environment.

bartlettroscoe commented 6 years ago

So, how would I load the full SEMS env without having to worry about setting the paths manually? I basically want the full SEMS checkin test environment.

@dridzal, everything you should need to know bout the SEMS env should be explained in:

The SEMS_<TPLNAME>_ROOT vars are automatically set by the SEMS modules. So if you just source Trilinos/cmake/load_sems_dev_env.sh with the right arguments and then pass in -C $TRILINOS_DIR/cmake/std/SEMSDevEnv.cmake or (better in my opinion) -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/SEMSDevEnv.cmake then that should find all of the SEMS-supported TPLs correctly.

NOTE: We we will be adding SuperLUDist and other TPLs as they are supported by SEMS (or installed in the 'trilinos' project space).

dridzal commented 6 years ago

@bartlettroscoe , to recap, adding

-D Trilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/SEMSDevEnv.cmake \

to my config script should do the trick? And I wouldn't have to add any TPLs explicitly?

bartlettroscoe commented 6 years ago

to recap, adding

-D Trilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/SEMSDevEnv.cmake \

to my config script should do the trick? And I wouldn't have to add any TPLs explicitly?

@dridzal, if you load the right SEMS modules, then yes. But note that you still have to enable the TPLs that you want to use such as with:

  ...
  -D TPL_ENABLE_HDF5=ON \
  -D TPL_ENABLE_Netcdf=ON \
  ...

If you use Trilinos/cmake/load_sems_dev_env.sh to load the SEMS modules that should load all of the correct modules. The post-push CI build that I run and posts to:

uses these files so if they should every break we would know it right away.

That is all there is to it. Let me know if that does not work.

dridzal commented 6 years ago

This no longer works. I have included the configure options file line as you suggested. Netcdf is enable because SEACAS is enabled. I get the error:

Processing enabled TPL: Netcdf (enabled by SEACASExodus, disable with -DTPL_ENABLE_Netcdf=OFF)
-- Using FIND_PACKAGE(Netcdf ...) ...
CMake Error at cmake/tribits/common_tpls/find_modules/FindNetCDF.cmake:163 (message):
  Can not locate NetCDF include directory
Call Stack (most recent call first):
  cmake/tribits/common_tpls/FindTPLNetcdf.cmake:66 (find_package)
  cmake/tribits/core/package_arch/TribitsProcessEnabledTpl.cmake:106 (INCLUDE)
  cmake/tribits/core/package_arch/TribitsGlobalMacros.cmake:1711 (TRIBITS_PROCESS_ENABLED_TPL)
  cmake/tribits/core/package_arch/TribitsProjectImpl.cmake:202 (TRIBITS_PROCESS_ENABLED_TPLS)
  cmake/tribits/core/package_arch/TribitsProject.cmake:93 (TRIBITS_PROJECT_IMPL)
  CMakeLists.txt:90 (TRIBITS_PROJECT)

CMake Error at cmake/tribits/common_tpls/find_modules/FindNetCDF.cmake:274 (message):
  Can not locate NetCDF C library
Call Stack (most recent call first):
  cmake/tribits/common_tpls/FindTPLNetcdf.cmake:66 (find_package)
  cmake/tribits/core/package_arch/TribitsProcessEnabledTpl.cmake:106 (INCLUDE)
  cmake/tribits/core/package_arch/TribitsGlobalMacros.cmake:1711 (TRIBITS_PROCESS_ENABLED_TPL)
  cmake/tribits/core/package_arch/TribitsProjectImpl.cmake:202 (TRIBITS_PROCESS_ENABLED_TPLS)
  cmake/tribits/core/package_arch/TribitsProject.cmake:93 (TRIBITS_PROJECT_IMPL)
  CMakeLists.txt:90 (TRIBITS_PROJECT)

-- NetCDF does not require HDF5
-- NetCDF does not require PNetCDF
-- Could NOT find NetCDF (missing: NetCDF_LIBRARIES NetCDF_INCLUDE_DIRS) 
-- NetCDF Version: 
--  NetCDF_NEEDS_HDF5        = 
--  NetCDF_NEEDS_PNetCDF     = 
--  NetCDF_PARALLEL          = 
--  NetCDF_INCLUDE_DIRS      = NetCDF_INCLUDE_DIR-NOTFOUND
--  NetCDF_LIBRARIES         = NetCDF_C_LIBRARY-NOTFOUND
--  NetCDF_BINARIES          = ncdump;ncgen;nccopy
-- Netcdf_LIBRARY_NAMES='netcdf'
-- Searching for libs in Netcdf_LIBRARY_DIRS=''
-- Searching for a lib in the set "netcdf":
--   Searching for lib 'netcdf' ...
-- NOTE: Did not find a lib in the lib set "netcdf" for the TPL 'Netcdf'!
-- ERROR: Could not find the libraries for the TPL 'Netcdf'!

So, it seems like this script doesn't work.

dridzal commented 6 years ago

I've now tried:

source ../../cmake/load_sems_dev_env.sh sems-gcc/6.1.0

from my build directory, and I get the same error as above (missing Netcdf). My guess is that the checkin test script will no longer work either, or that if it does, it's a matter of auto-find luck.

dridzal commented 6 years ago

Here is my config script:

EXTRA_ARGS=$@

cmake \
-D Trilinos_ENABLE_Fortran=OFF \
-D CMAKE_BUILD_TYPE:STRING=RELEASE \
-D TPL_ENABLE_MPI:BOOL=ON \
-D BUILD_SHARED_LIBS:BOOL=ON \
-D Trilinos_ENABLE_EXPLICIT_INSTANTIATION:BOOL=ON \
-D TPL_ENABLE_Boost=ON \
-D Trilinos_ENABLE_Panzer:BOOL=ON \
-D Trilinos_ENABLE_ALL_OPTIONAL_PACKAGES:BOOL=ON \
-D Trilinos_ENABLE_TESTS:BOOL=OFF \
-D Trilinos_ENABLE_EXAMPLES:BOOL=OFF \
-D Panzer_ENABLE_TESTS:BOOL=ON \
-D Panzer_ENABLE_EXAMPLES:BOOL=ON \
-D TPL_BLAS_LIBRARIES:STRING=/usr/lib64/libblas.so.3 \
-D TPL_LAPACK_LIBRARIES:STRING=/usr/lib64/liblapack.so.3 \
$EXTRA_ARGS \
../../../Trilinos

Just try it. Load the SEMS env with

source ../../cmake/load_sems_dev_env.sh sems-gcc/6.1.0

then run the config script, and you'll immediately run into an issue with boost not being found. If you specify boost includes explicitly, and you get past them, then Netcdf won't be found. Etc.

dridzal commented 6 years ago

The problem is that after running load_sems_dev_env.sh the variables Boost_INCLUDE_DIRS, NetCDF_INCLUDE_DIRS, etc., are not present in the shell. I'll try this on a CEE machine now.

bartlettroscoe commented 6 years ago

@dridzal,

Sorry, there was a path typo at:

(I fixed it.)

You want add:

-D Trilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/sems/SEMSDevEnv.cmake \

to your cmake configure line. So that would give the configure script:

#!/bin/bash
cmake \
-D Trilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/sems/SEMSDevEnv.cmake \
-D Trilinos_ENABLE_Fortran=OFF \
-D CMAKE_BUILD_TYPE:STRING=RELEASE \
-D TPL_ENABLE_MPI:BOOL=ON \
-D BUILD_SHARED_LIBS:BOOL=ON \
-D Trilinos_ENABLE_EXPLICIT_INSTANTIATION:BOOL=ON \
-D TPL_ENABLE_Boost=ON \
-D Trilinos_ENABLE_Panzer:BOOL=ON \
-D Trilinos_ENABLE_ALL_OPTIONAL_PACKAGES:BOOL=ON \
-D Trilinos_ENABLE_TESTS:BOOL=OFF \
-D Trilinos_ENABLE_EXAMPLES:BOOL=OFF \
-D Panzer_ENABLE_TESTS:BOOL=ON \
-D Panzer_ENABLE_EXAMPLES:BOOL=ON \
"$@" \
../../../Trilinos

I put that in the executable script dridzal-configure and did:

$ cd <build-dir>/

$ source ../../../Trilinos/cmake/load_sems_dev_env.sh sems-gcc/6.1.0

$  time ./dridzal-configure &> configure.out

real    0m32.138s
user    0m18.816s
sys     0m12.103s

That configured successfully for me. I am running the build now but I think that has it.

Let me know if this does not work for you.

Sorry again for the typo.

dridzal commented 6 years ago

@bartlettroscoe that worked, thanks! I'm still amazed that the previous builds, with cmake-3.5.2, worked and that they found all the right SEMS libraries. I see two long-term solutions for these issues. One, we provide a mapping of TPL variables to SEMS variables like @jhux2 suggested, with all possible options relevant to Trilinos, and then have one cmake script load another "base" script. Two, we just use load_sems_dev_env, with the caveat that the selection of modules would have to be expanded.

Suggestions?

bartlettroscoe commented 6 years ago

FYI: The checkin-test-sems.sh script should be pretty well protected by the post-push CI build which uses these core files. Any errors in the checkin-test-sems.sh not tested by that post-push CI build would be trivial to fix (as was the case previously).

bartlettroscoe commented 6 years ago

Suggestions?

If you are using SEMS modules, always include cmake/std/sems/SEMSDevEnv.cmake to pull in TPLs, period. As stated at:

at the bottom in the "NOTES" section:

So just load the module any way you would like. But honestly if people can not figure out how to type module list after sourcing load_sems_dev_env.sh (which that page shows you explicitly) then I don't know if we can help them.

mhoemmen commented 6 years ago

@bartlettroscoe btw, how do we get the latest SEMS modules on a workstation or blade? Is there an FAQ somewhere? Thanks! :-)

dridzal commented 6 years ago

@bartlettroscoe this is clear; what I meant by "the selection of modules would have to be expanded" is that the load_sems_dev_env.sh script only loads a few TPLs currently. Can we expand it to load additional TPLs? There shouldn't be any harm in loading more than needed for any particular build. Is this planned already?

dridzal commented 6 years ago

@mhoemmen does your question go beyond

module load sems-env

and then module-loading what you need? For a generic Trilinos build, you can bypass the manual module-loading as described above, through load_sems_dev_env.sh and by including -D Trilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/sems/SEMSDevEnv.cmake in your config script.

dridzal commented 6 years ago

@mhoemmen there is also this

https://sems.sandia.gov/confluence/display/SEMSKB/How+to+mount+NFS+TPL+server+on+Linux+workstation?src=contextnavpagetreemode

and a bunch of related articles.

bartlettroscoe commented 6 years ago

Can we expand it to load additional TPLs? There shouldn't be any harm in loading more than needed for any particular build. Is this planned already?

@dridzal, we should update the file load_sems_dev_env.sh to load every TPL that SEMS provides that works with Trilinos and then we should update the file SEMSDevEnv.cmake to pull in the info for all of those TPLs. The only TPL that SEMS provide that is not already handed in SuperLUDist. Are there others as well? We need to expect the set of TPLs that we test Trilinos with in PR, CI and Nightly testing to include all that we can that Trilinos customers are using.

bartlettroscoe commented 6 years ago

btw, how do we get the latest SEMS modules on a workstation or blade? Is there an FAQ somewhere? Thanks! :-)

@mhoemmen, you mean a CEE LAN blade? It already does. I use this SEMS Dev Env stuff all the time on my CEE LAN RHEL6 machine 'ceerws1113'.

mhoemmen commented 6 years ago

I'll try the sems article above; thanks!

dridzal commented 6 years ago

The only TPL that SEMS provide that is not already handed in SuperLUDist. Are there others as well?

I think that's it.