trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.19k stars 564 forks source link

TriBits: undefined variable breaks nightly builds #4796

Closed lucbv closed 5 years ago

lucbv commented 5 years ago

@bartlettroscoe (there is not good team for that type of issues...)

Expectations

triBits changes should not break current nightly builds.

Current Behavior

The nightly builds from MueLu are all failing at configure time due to an undefined variable: ${${PROJECT_NAME}_TRIBITS_DIR} in file cmake/tribits/core/utils/MessageWrapper.cmake at line 45.

Motivation and Context

This has taken down all the nightly MueLu builds which means that we cannot detect bugs in our specialize and experimental tracks that usually are not tested by ATDM or Continuous builds.

Possible Solution

Is seems that changes done last week in triBits are to blame, see commit 2283e955

Steps to Reproduce

Attempting to run any build using the cmake/ctest/drivers/{enigma,geminga,rocketman,trappist} will fail.

bartlettroscoe commented 5 years ago

@lucbv, can you give more context? What is the actual error message with the stack track?

lucbv commented 5 years ago

@bartlettroscoe here is the log we get from the nightly builds (I only posted the relevant part but I could mail you the log if it helps). As explained above the issue is related to

CMake Error at /storage/lberge/nightlyTests/Trilinos/cmake/tribits/core/utils/MessageWrapper.cmake:45 (INCLUDE):
  INCLUDE could not find load file:

The part of the log where the error occurs

C) Configure /storage/lberge/nightlyTests/TDD_BUILD ...
SetCTestConfiguration:BuildDirectory:/storage/lberge/nightlyTests/TDD_BUILD
SetCTestConfiguration:SourceDirectory:/storage/lberge/nightlyTests/Trilinos/cmake/ctest/drivers
SetCTestConfiguration:ConfigureCommand:"/projects/sems/install/rhel7-x86_64/sems/utility/cmake/3.10.3/bin/cmake" "-GUnix Makefiles" "/storage/lberge/nightlyTests/Trilinos/cmake/ctest/drivers"
Configure project
Configure with command: "/projects/sems/install/rhel7-x86_64/sems/utility/cmake/3.10.3/bin/cmake" "-GUnix Makefiles" "/storage/lberge/nightlyTests/Trilinos/cmake/ctest/drivers"
Run command: "/projects/sems/install/rhel7-x86_64/sems/utility/cmake/3.10.3/bin/cmake" "-GUnix Makefiles" "/storage/lberge/nightlyTests/Trilinos/cmake/ctest/drivers"
-- TDD_FORCE_INNER_CMAKE_INSTALL='1'
-- ENV_TRIBITS_TDD_USE_SYSTEM_CTEST='1'
-- TRIBITS_TDD_USE_SYSTEM_CTEST='1'
CMake Error at /storage/lberge/nightlyTests/Trilinos/cmake/tribits/core/utils/MessageWrapper.cmake:45 (INCLUDE):
  INCLUDE could not find load file:

    /core/utils/GlobalSet.cmake
Call Stack (most recent call first):
  /storage/lberge/nightlyTests/Trilinos/cmake/tribits/core/package_arch/TribitsGeneralMacros.cmake:42 (INCLUDE)
  /storage/lberge/nightlyTests/Trilinos/cmake/tribits/core/package_arch/TribitsConfigureCTestCustom.cmake:40 (INCLUDE)
  /storage/lberge/nightlyTests/Trilinos/cmake/tribits/dashboard_driver/TribitsDriverCMakeLists.cmake:76 (include)
  CMakeLists.txt:19 (include)

site='trappist.sandia.gov'
site='trappist.sandia.gov' MATCHES directory name dir='trappist'
-- TDD_DRIVER_SUBDIRECTORY='trappist'
TDD_DRIVER_SUBDIRECTORY='trappist'
TRIBITS_DRIVER_ADD_DASHBOARD:  'CLANG_OPENMPI_1.10.0_RELEASE'  'ctest_linux_nightly_mpi_release_muelu_trappist.clang.cmake' [CTEST_INSTALLER_TYPE;release;RUN_SERIAL;TIMEOUT_MINUTES;330]
-- Skipping CMake install tests because TRIBITS_TDD_USE_SYSTEM_CTEST==1
-- Configuring incomplete, errors occurred!
See also "/storage/lberge/nightlyTests/TDD_BUILD/CMakeFiles/CMakeOutput.log".
Command exited with the value: 1
Error(s) when configuring the project
 Add coverage exclude regular expressions.
SetCTestConfiguration:CMakeCommand:/projects/sems/install/rhel7-x86_64/sems/utility/cmake/3.10.3/bin/cmake
bartlettroscoe commented 5 years ago

@lucbv, let me see if I can figure out what is going on with this.

NOTE: This system was written by a contractor and never had any automated tests so it has been very hard to support because of this and other reasons. See:

We don't use it for the ATDM Trilinos builds.

jhux2 commented 5 years ago

Btw, setting TRIBITS_PROJECT_ROOT in the crontab environment doesn't resolve this issue. It would appear the variable value isn't propagating. I can see from a log that it's at least initially set correctly:


Starting nightly Trilinos development testing on rocketman: Tue Apr  9 10:58:02 PDT 2019

Configuration = default
SEMS_GCC_LOCAL_PYTHON_VERSION=2.6.6
MANPATH=/projects/sems/install/rhel6-x86_64/sems/compiler/python/2.7.9/share/man:/projects/sems/install/rhel6-x86_64/sems/compiler/python/2.7.9/man:/projects/sems/install/rhel6-x86_64/sems/tpl/netcdf/4.4.1/gcc/5.3.0/openmpi/1.10.1/exo_parallel/share/man:/projects/sems/install/rhel6-x86_64/sems/compiler/gcc/5.3.0/openmpi/1.10.1/share/man:/projects/sems/install/rhel6-x86_64/sems/compiler/gcc/4.4.7/openmpi/1.10.1/share/man:/projects/sems/install/rhel6-x86_64/sems/compiler/gcc/5.3.0/base/share/man:/projects/sems/install/rhel6-x86_64/sems/utility/cmake/3.10.3/share/man:/projects/sems/install/rhel6-x86_64/sems/utility/cmake/3.10.3/man:/usr/local/share/man
TDD_HTTP_PROXY=http://sonproxy.sandia.gov:80
TRIBITS_TDD_USE_SYSTEM_CTEST=1
SEMS_NETCDF_LIBRARY_PATH=/projects/sems/install/rhel6-x86_64/sems/tpl/netcdf/4.4.1/gcc/5.3.0/openmpi/1.10.1/exo_parallel/lib
SEMS_MPI_NAME=openmpi
SEMS_SUPERLU_INCLUDE_PATH=/projects/sems/install/rhel6-x86_64/sems/tpl/superlu/4.3/gcc/5.3.0/base/include
SEMS_OPENMPI_INCLUDE_PATH=/projects/sems/install/rhel6-x86_64/sems/compiler/gcc/5.3.0/openmpi/1.10.1/include
MPICC=mpicc
MATLABPATH=/home/jhu/software/matlab/utilities
SHELL=/bin/bash
SEMS_SUPERLU_LOCAL_PYTHON_VERSION=2.6.6
SEMS_OPENMPI_LOCAL_PYTHON_VERSION=2.6.6
CTEST_CONFIGURATION=default
TDD_FORCE_CMAKE_INSTALL=0
Trilinos_TRIBITS_DIR=/home/nightlyTesting/trilinos
bartlettroscoe commented 5 years ago

@jhux2, the problem is the the vars PROJECT_NAME and/or ${PROJECT_NAME}_TRIBITS_DIR are not getting set correctly in TribitsDriverCMakeLists.cmake (because they must not have been needed before).

Can you try the patch shown below and see if that fixes this?


diff --git a/cmake/tribits/dashboard_driver/TribitsDriverCMakeLists.cmake b/cmake/tribits/dashboard_driver/TribitsDriverCMakeLists.cmake
index 79fe491..29ca940 100644
--- a/cmake/tribits/dashboard_driver/TribitsDriverCMakeLists.cmake
+++ b/cmake/tribits/dashboard_driver/TribitsDriverCMakeLists.cmake
@@ -61,6 +61,9 @@ IF (NOT TRIBITS_ROOT)
 ENDIF()
 get_filename_component(TRIBITS_ROOT "${TRIBITS_ROOT}" ABSOLUTE)

+set(PROJECT_NAME DummyProject)
+set(${PROJECT_NAME}_TRIBITS_DIR "${TRIBITS_ROOT}")
+
 set(CMAKE_MODULE_PATH
   ${CMAKE_CURRENT_LIST_DIR}
   ${TRIBITS_ROOT}/core/utils

Might just have to bite the bullet and start writing some automated tests for this sticking dashboard driver system (written by a contractor years ago who did not write any automated tests for this).

jhux2 commented 5 years ago

@bartlettroscoe I tried your suggestion, but get the same error.

bartlettroscoe commented 5 years ago

@jhux2 said:

@bartlettroscoe I tried your suggestion, but get the same error.

Okay, I will revert the changes to those files and see if we can get this working. I will try to set up a manual testing scenario to see if this will fix the problem.

lucbv commented 5 years ago

@jhux2, another option is to use the same logic as ATDM to run our nightly tests. I am attempting to setup such a build for trappist. This is still a work in progress but if you look at the dashboard you can see that I have a test build that was able to post results in the nightly track. Now I only need to have it actually test something...

bartlettroscoe commented 5 years ago

@lucbv said:

another option is to use the same logic as ATDM to run our nightly tests

If that would not be too much trouble, that would be my advice. It is pretty simple. Just clone an "outer" Trilinos and then set up SRC_AND_BUILD and allow the ctest -S script.cmake run in there. The big disadvantage is that you will not see results on a CDash site, only in log files on the machine where you run the scripts. Just make sure you update that "outer" Trilinos before running the individual builds.

Now that we have ninja and since configuration is pretty fast (unless you have a mounted disk) there is no advantage to running more than one build at a time so a simple loop over your builds does the trick.

But I will still fix this for other builds out there.

lucbv commented 5 years ago

@bartlettroscoe, I can see results on testing.sandia.gov as you can see my builds are posting in the nightly track. I am also pretty sure that we can all see the results in the ATDM track, so could you clarify what you mean here:

The big disadvantage is that you will not see results on a CDash site, only in log files on the machine where you run the scripts.

bartlettroscoe commented 5 years ago

@lucbv asked:

so could you clarify what you mean here?

I mean you can't see the STDOUT output from the ctest -S <script>.cmake invocation. Usually you don't need to see that if everything is going well but if things don't go well then you will need to see that to fix problems if they occur.

lucbv commented 5 years ago

@bartlettroscoe that's fine with me, I do hope to set things up once and for all and then only have to touch up sporadically. As long as I get the results from my nightly builds correctly on the dashboard that will be OK. At the moment it does not seem to build correctly the packages that are enabled, I am not sure why...

bartlettroscoe commented 5 years ago

@lucbv said:

At the moment it does not seem to build correctly the packages that are enabled, I am not sure why...

Would this documentation help:

?

Otherwise, I will be posting a PR with a fix for the old deprecated TriBITS Dashboard Driver system shortly.

bartlettroscoe commented 5 years ago

FYI: PR #4859 should fix this. Please approve the PR.

Sorry this took me so long to get to this. Given the Trilinos PR builds and the ATDM Trilinos builds, hopefully there were not too many holes in testing in this time.

bartlettroscoe commented 5 years ago

Sorry, the commit TriBITSPub/TriBITS@e155f5d closed this issue when it should not have. Re-opening.

bartlettroscoe commented 5 years ago

FYI: The PR #4859 that should fix this was just merged.

Just a little history here. The commit that broke this was merged way back on 3/28/2019 as part of PR #4750. But no one seemed to notice that results on CDash were missing until 5 days later on 4/2/2019 when this Issue was created and someone else did not notice this until 4/4/2019 (7 days after 3/28/2019) when duplicate #4809 was created. That suggests that these results on CDash are not really being looked after very carefully.

If you guys are interested, I can show you how to set up the tool cdash_analayze_and_report.py (still being developed but working pretty well for ATDM) so that you would now the day after if results go missing on CDash. Let me know.

Putting this in review to see that results show up starting tomorrow.

lucbv commented 5 years ago

@bartlettroscoe said:

That suggests that these results on CDash are not really being looked after very carefully.

sorry for being at a conference while the #4750 was pushed, I would have complained earlier otherwise!

lucbv commented 5 years ago

@bartlettroscoe for info, unless I am away, these builds are looked at every morning, hence me catching them and filing the issue the day I got back...

bartlettroscoe commented 5 years ago

@lucbv said:

sorry for being at a conference while the #4750 was pushed, I would have complained earlier otherwise!

Understood. Are you the only person who looks at these builds? Are you interested in getting a summary email once a day for the builds you care about?

jhux2 commented 5 years ago

But no one seemed to notice that results on CDash were missing until 5 days later on 4/2/2019 when this Issue was created and someone else did not notice this until 4/4/2019 (7 days after 3/28/2019) when duplicate #4809 was created. That suggests that these results on CDash are not really being looked after very carefully.

As @lucbv noted, this was just bad timing -- lots of MueLu develops on travel or vacation. Btw, 3/28 is a Thursday. Even in a normal week, this might not have been flagged until the next Monday.

bartlettroscoe commented 5 years ago

@lucbv and @jhux2,

Looking at full Trilinos dashboard yesterday and compare it to the builds from 2019-04-02 it looks like all of the various non-ATDM builds are posting again so I will assume that my PR #4859 fixed this.

Can we close this?

jhux2 commented 5 years ago

Thanks for fixing this. It's fine with me to close this issue.

bartlettroscoe commented 5 years ago

Sorry this slipped through. May ha e to find a way set up automated tests for that system.