trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.21k stars 565 forks source link

Modify existing GCC 4.8.4 CI build to match selected auto PR build #2462

Closed bartlettroscoe closed 6 years ago

bartlettroscoe commented 6 years ago

CC: @trilinos/framework, @mhoemmen, @rppawlo, @ibaned, @crtrott

Next Action Status

Post-push CI build and checkin-test-sems.sh script is now updated to use updated GCC 4.8.4 + OpenMPI 1.10.1 + OpenMP build. Consideration for using this build in auto PR testing being addressed in #2788.

Description

This Issue is to scope out and track efforts to upgrade the existing SEMS-based Trilinos CI build (see #482 and #1304) to match the selected GCC 4.8.4 auto PR build as described in https://github.com/trilinos/Trilinos/issues/2317#issuecomment-376551457. The existing GCC 4.8.4 CI build shown here has been running for 1.5+ years and has been maintained over that time. That build has many but not all of the settings of the selected GCC 4.8.4 auto PR build listed here. The primary changes that need to be made are:

The most difficult change will likely be to enable OpenMP because of the problem of the threads all binding to the same cores as described in #2422. Therefore, the initial auto PR build may not have OpenMP enabled due to these challenges.

Tasks:

  1. Set Xpetra_ENABLE_Experimental=ON and MueLu_ENABLE_Experimental=ON in CI build ... Merged in #2467 and was later removed in 7481c760699d8b0c30034782cb2ef0c742ce6657 [DONE]
  2. Switch current CI build from OpenMPI 1.6.5 to 1.10.1 (see build GCC-4.8.4-OpenMPI-1.10.1-MpiReleaseDebugSharedPtOpenMP in #2688) [DONE]
  3. Enable Trilinos_ENABLE_OpenMP=ON and OMP_NUM_THREADS=2 (see build GCC-4.8.4-OpenMPI-1.10.1-MpiReleaseDebugSharedPtOpenMP in #2688) [DONE]
  4. Set up nightly build and clean up tests (see #2691 and #2712) ... IN PROGRESS ...
  5. Switch auto PR tester to use updated GCC 4.8.4 configuration ...

Related Issues:

mhoemmen commented 6 years ago

@bartlettroscoe Can we enable OpenMP but force OMP_NUM_THREADS=1? Some of those Xpetra and MueLu "experimental" build options may not have any effect unless OpenMP is enabled.

bartlettroscoe commented 6 years ago

Can we enable OpenMP but force OMP_NUM_THREADS=1? Some of those Xpetra and MueLu "experimental" build options may not have any effect unless OpenMP is enabled.

I guess I can try that. But I wonder even with OMP_NUM_THREADS=1 if all of the threads will be bound to the same core or not.

Also, note that there are ATDM builds of Trilinos that enable experimental MueLu code that build and run tests with a serial Kokkos node as shown at:

mhoemmen commented 6 years ago

@csiefer2 would know for sure whether disabling OpenMP is adequate. My guess is no, because some of the sparse matrix-matrix multiply code takes different paths if OpenMP is enabled.

csiefer2 commented 6 years ago

OpenMPNode and SerialNode trigger different code paths in chunks of Tpetra. AFAIK MueLu does not do node type specialization (except for Epetra).

What you choose to test for PR doesn't really matter, but they both need to stay working (more or less).

bartlettroscoe commented 6 years ago

OpenMPNode and SerialNode trigger different code paths in chunks of Tpetra. AFAIK MueLu does not do node type specialization (except for Epetra).

What you choose to test for PR doesn't really matter, but they both need to stay working (more or less).

The GCC 4.8.4 PR build will test OpenMP path and Intel 17.x build will test the Serial node path. And the ATDM builds of Trilinos are already testing both paths and have been for many weeks now as you can see at:

mhoemmen commented 6 years ago

@bartlettroscoe Cool, then I'm OK with this :)

bartlettroscoe commented 6 years ago

I submitted PR #2467 to enable Xpetra and MueLu experimental code in the standard CI build. If someone can quickly review that, then I can merge.

bartlettroscoe commented 6 years ago

I tested the full CI build going from OpenMPI 1.6.5 to 1.8.7 in the branch 2462-openmpi-1.6.5-to-1.8.7 in my fork of Trilinos git@github.com:bartlettroscoe/Trilinos.git and it caused 30 tests to time out (see details below). I can't tell if these are hangs or just that MPI communication is taking longer. Someone would need to research that. In any case, we are a no-go for upgrading from OpenMPI 1.6.5 to 1.8.7.

I will try updating from OpenMPI 1.6.5 to 1.10.1 (which is the only other OpenMPI implementation that SEMS provides) and see how that goes.

DETAILED NOTES (click to expand) **(3/27/2018)** I created the branch `2462-openmpi-1.6.5-to-1.8.7` in my fork of Trilinos. I added the commit d36479f to change from OpenMPI 1.6.5 to 1.8.7. I tested this with: ``` $ ./checkin-test-sems.sh --enable-all-packages=on --local-do-all ``` and it returned: ``` FAILED: Trilinos/MPI_RELEASE_DEBUG_SHARED_PT: passed=2557,notpassed=30 Tue Mar 27 19:21:00 MDT 2018 Enabled Packages: Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Packages Hostname: crf450.srn.sandia.gov Source Dir: /home/rabartl/Trilinos.base/Trilinos/cmake/tribits/ci_support/../../.. Build Dir: /home/rabartl/Trilinos.base/BUILDS/CHECKIN/MPI_RELEASE_DEBUG_SHARED_PT CMake Cache Varibles: -DTrilinos_TRIBITS_DIR:PATH=/home/rabartl/Trilinos.base/Trilinos/cmake/tribits -DTrilinos_ENABLE_TESTS:BOOL=ON -DTrilinos_TEST_CATEGORIES:STRING=BASIC -DTrilinos_ALLOW_NO_PACKAGES:BOOL=OFF -DDART_TESTING_TIMEOUT:STRING=300.0 -DBUILD_SHARED_LIBS=ON -DTrilinos_DISABLE_ENABLED_FORWARD_DEP_PACKAGES=ON -DTrilinos_ENABLE_SECONDARY_TESTED_CODE:BOOL=OFF -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/MpiReleaseDebugSharedPtSettings.cmake,cmake/std/BasicCiTestingSettings.cmake,cmake/std/sems/SEMSDevEnv.cmake -DTrilinos_ENABLE_ALL_OPTIONAL_PACKAGES:BOOL=ON -DTrilinos_ENABLE_ALL_PACKAGES:BOOL=ON -DTrilinos_ENABLE_ALL_FORWARD_DEP_PACKAGES:BOOL=ON -DTrilinos_ENABLE_PyTrilinos:BOOL=OFF -DTrilinos_ENABLE_Claps:BOOL=OFF -DTrilinos_ENABLE_TriKota:BOOL=OFF Make Options: -j16 CTest Options: -j16 Pull: Not Performed Configure: Passed (3.29 min) Build: Passed (92.01 min) Test: FAILED (26.13 min) 99% tests passed, 30 tests failed out of 2587 Label Time Summary: Amesos = 6.08 sec (14 tests) Amesos2 = 3.33 sec (9 tests) Anasazi = 34.87 sec (71 tests) AztecOO = 5.50 sec (17 tests) Belos = 35.43 sec (72 tests) Domi = 138.39 sec (125 tests) Epetra = 43.01 sec (61 tests) EpetraExt = 4.18 sec (11 tests) FEI = 51.74 sec (43 tests) Galeri = 7.70 sec (9 tests) GlobiPack = 3.08 sec (6 tests) Ifpack = 26.56 sec (53 tests) Ifpack2 = 54.12 sec (35 tests) Intrepid = 1467.90 sec (152 tests) Intrepid2 = 336.96 sec (144 tests) Isorropia = 1.74 sec (6 tests) Kokkos = 379.99 sec (23 tests) KokkosKernels = 701.31 sec (4 tests) ML = 25.33 sec (34 tests) MiniTensor = 0.72 sec (2 tests) MueLu = 1217.24 sec (84 tests) NOX = 413.98 sec (106 tests) OptiPack = 2.69 sec (5 tests) Panzer = 808.72 sec (154 tests) Phalanx = 21.44 sec (27 tests) Pike = 4.37 sec (7 tests) Piro = 25.49 sec (12 tests) ROL = 3183.59 sec (153 tests) RTOp = 11.01 sec (24 tests) Rythmos = 1083.47 sec (83 tests) SEACAS = 50.54 sec (14 tests) STK = 109.42 sec (12 tests) Sacado = 122.29 sec (292 tests) Shards = 1.77 sec (4 tests) ShyLU_Node = 2.29 sec (3 tests) Stokhos = 436.12 sec (75 tests) Stratimikos = 167.50 sec (40 tests) Teko = 362.38 sec (19 tests) Tempus = 7650.88 sec (36 tests) Teuchos = 50.89 sec (137 tests) ThreadPool = 5.48 sec (10 tests) Thyra = 35.63 sec (81 tests) Tpetra = 71.71 sec (162 tests) TrilinosCouplings = 335.97 sec (24 tests) Triutils = 0.38 sec (2 tests) Xpetra = 27.11 sec (18 tests) Zoltan = 26.70 sec (19 tests) Zoltan2 = 60.17 sec (101 tests) Total Test time (real) = 1567.87 sec The following tests FAILED: 173 - KokkosKernels_graph_serial_MPI_1 (Timeout) 1984 - MueLu_UnitTestsTpetra_MPI_1 (Timeout) 1994 - MueLu_ParameterListInterpreterEpetra_MPI_1 (Timeout) 1998 - MueLu_ParameterListInterpreterTpetra_MPI_1 (Timeout) 2099 - Rythmos_BackwardEuler_ConvergenceTest_MPI_1 (Timeout) 2103 - Rythmos_IntegratorBuilder_ConvergenceTest_MPI_1 (Timeout) 2129 - Tempus_BackwardEuler_MPI_1 (Timeout) 2131 - Tempus_BackwardEuler_Staggered_FSA_MPI_1 (Timeout) 2133 - Tempus_BackwardEuler_ASA_MPI_1 (Timeout) 2134 - Tempus_BDF2_MPI_1 (Timeout) 2135 - Tempus_BDF2_Combined_FSA_MPI_1 (Timeout) 2136 - Tempus_BDF2_Staggered_FSA_MPI_1 (Timeout) 2138 - Tempus_BDF2_ASA_MPI_1 (Timeout) 2139 - Tempus_ExplicitRK_MPI_1 (Timeout) 2140 - Tempus_ExplicitRK_Combined_FSA_MPI_1 (Timeout) 2141 - Tempus_ExplicitRK_Staggered_FSA_MPI_1 (Timeout) 2143 - Tempus_ExplicitRK_ASA_MPI_1 (Timeout) 2145 - Tempus_DIRK_MPI_1 (Timeout) 2146 - Tempus_DIRK_Combined_FSA_MPI_1 (Timeout) 2147 - Tempus_DIRK_Staggered_FSA_MPI_1 (Timeout) 2149 - Tempus_DIRK_ASA_MPI_1 (Timeout) 2150 - Tempus_HHTAlpha_MPI_1 (Timeout) 2151 - Tempus_Newmark_MPI_1 (Timeout) 2154 - Tempus_IMEX_RK_Combined_FSA_MPI_1 (Timeout) 2155 - Tempus_IMEX_RK_Staggered_FSA_MPI_1 (Timeout) 2157 - Tempus_IMEX_RK_Partitioned_Combined_FSA_MPI_1 (Timeout) 2158 - Tempus_IMEX_RK_Partitioned_Staggered_FSA_MPI_1 (Timeout) 2282 - ROL_test_sol_solSROMGenerator_MPI_1 (Timeout) 2288 - ROL_test_sol_checkAlmostSureConstraint_MPI_1 (Timeout) 2320 - ROL_example_burgers-control_example_06_MPI_1 (Timeout) Errors while running CTest Total time for MPI_RELEASE_DEBUG_SHARED_PT = 121.44 min ``` Darn, taht is not good. That is a lot of timeouts. Now, I can't tell if these are timeouts because things are taking longer or if these are hangs. Someone would need to research that.
mhoemmen commented 6 years ago

@bartlettroscoe I have heard complaints about OpenMPI 1.8.x bugs. The OpenMPI web page considers it "retired" -- in fact, the oldest "not retired" version is 1.10.

mhoemmen commented 6 years ago

@prwolfe Have you seen issues like this with OpenMPI 1.8.x?

bartlettroscoe commented 6 years ago

I tested the full CI build going from OpenMPI 1.6.5 to 1.10.1 in the branch 2462-openmpi-1.6.5-to-1.10.1 in my fork of Trilinos git@github.com:bartlettroscoe/Trilinos.git and it caused 34 tests to time out (see details below). I can't tell if these are hangs or just that MPI communication is taking longer to complete (which is hard to believe).

I am wondering if there is not some problem with the way these tests are using MPI and I am wondering if someone should not dig in and try to debug some of these timeouts to see why they are happening? Perhpas there are some real defects in the code that these updated versions of OpenMPI are bringing out?

DETAILED NOTES (click to expand) **(3/28/2018)** I created the branch `2462-openmpi-1.6.5-to-1.10.1` in my fork of Trilinos `git@github.com:bartlettroscoe/Trilinos.git`. I added the commit c9e9097 to change from OpenMPI 1.6.5 to 1.10.1. I tested this with: ``` $ ./checkin-test-sems.sh --enable-all-packages=on --local-do-all ``` and it returned: ``` FAILED: Trilinos/MPI_RELEASE_DEBUG_SHARED_PT: passed=2552,notpassed=34 Wed Mar 28 09:41:16 MDT 2018 Enabled Packages: Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Packages Hostname: crf450.srn.sandia.gov Source Dir: /home/rabartl/Trilinos.base/Trilinos/cmake/tribits/ci_support/../../.. Build Dir: /home/rabartl/Trilinos.base/BUILDS/CHECKIN/MPI_RELEASE_DEBUG_SHARED_PT CMake Cache Varibles: -DTrilinos_TRIBITS_DIR:PATH=/home/rabartl/Trilinos.base/Trilinos/cmake/tribits -DTrilinos_ENABLE_TESTS:BOOL=ON -DTrilinos_TEST_CATEGORIES:STRING=BASIC -DTrilinos_ALLOW_NO_PACKAGES:BOOL=OFF -DDART_TESTING_TIMEOUT:STRING=300.0 -DBUILD_SHARED_LIBS=ON -DTrilinos_DISABLE_ENABLED_FORWARD_DEP_PACKAGES=ON -DTrilinos_ENABLE_SECONDARY_TESTED_CODE:BOOL=OFF -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/MpiReleaseDebugSharedPtSettings.cmake,cmake/std/BasicCiTestingSettings.cmake,cmake/std/sems/SEMSDevEnv.cmake -DTrilinos_ENABLE_ALL_OPTIONAL_PACKAGES:BOOL=ON -DTrilinos_ENABLE_ALL_PACKAGES:BOOL=ON -DTrilinos_ENABLE_ALL_FORWARD_DEP_PACKAGES:BOOL=ON -DTrilinos_ENABLE_PyTrilinos:BOOL=OFF -DTrilinos_ENABLE_Claps:BOOL=OFF -DTrilinos_ENABLE_TriKota:BOOL=OFF Make Options: -j16 CTest Options: -j16 Pull: Not Performed Configure: Passed (3.04 min) Build: Passed (99.75 min) Test: FAILED (26.52 min) 99% tests passed, 34 tests failed out of 2586 Label Time Summary: Amesos = 6.20 sec (14 tests) Amesos2 = 3.17 sec (9 tests) Anasazi = 37.92 sec (71 tests) AztecOO = 5.27 sec (17 tests) Belos = 35.64 sec (72 tests) Domi = 122.76 sec (125 tests) Epetra = 42.48 sec (61 tests) EpetraExt = 5.19 sec (11 tests) FEI = 49.67 sec (43 tests) Galeri = 6.93 sec (9 tests) GlobiPack = 3.39 sec (6 tests) Ifpack = 27.06 sec (53 tests) Ifpack2 = 60.88 sec (35 tests) Intrepid = 1661.11 sec (152 tests) Intrepid2 = 360.30 sec (144 tests) Isorropia = 1.67 sec (6 tests) Kokkos = 491.07 sec (23 tests) KokkosKernels = 653.29 sec (4 tests) ML = 21.03 sec (34 tests) MiniTensor = 1.04 sec (2 tests) MueLu = 1227.70 sec (83 tests) NOX = 415.46 sec (106 tests) OptiPack = 2.10 sec (5 tests) Panzer = 876.56 sec (154 tests) Phalanx = 17.27 sec (27 tests) Pike = 3.69 sec (7 tests) Piro = 20.75 sec (12 tests) ROL = 3246.73 sec (153 tests) RTOp = 11.13 sec (24 tests) Rythmos = 992.68 sec (83 tests) SEACAS = 56.99 sec (14 tests) STK = 127.17 sec (12 tests) Sacado = 115.83 sec (292 tests) Shards = 1.84 sec (4 tests) ShyLU_Node = 1.30 sec (3 tests) Stokhos = 285.61 sec (75 tests) Stratimikos = 178.14 sec (40 tests) Teko = 433.87 sec (19 tests) Tempus = 7758.58 sec (36 tests) Teuchos = 51.55 sec (137 tests) ThreadPool = 5.48 sec (10 tests) Thyra = 35.87 sec (81 tests) Tpetra = 66.82 sec (162 tests) TrilinosCouplings = 386.35 sec (24 tests) Triutils = 0.41 sec (2 tests) Xpetra = 25.48 sec (18 tests) Zoltan = 26.04 sec (19 tests) Zoltan2 = 54.41 sec (101 tests) Total Test time (real) = 1590.85 sec The following tests FAILED: 173 - KokkosKernels_graph_serial_MPI_1 (Timeout) 1506 - Teko_testdriver_tpetra_MPI_1 (Failed) 1983 - MueLu_UnitTestsTpetra_MPI_1 (Timeout) 1993 - MueLu_ParameterListInterpreterEpetra_MPI_1 (Timeout) 1997 - MueLu_ParameterListInterpreterTpetra_MPI_1 (Timeout) 2098 - Rythmos_BackwardEuler_ConvergenceTest_MPI_1 (Timeout) 2102 - Rythmos_IntegratorBuilder_ConvergenceTest_MPI_1 (Timeout) 2127 - Tempus_ForwardEuler_MPI_1 (Timeout) 2128 - Tempus_BackwardEuler_MPI_1 (Timeout) 2129 - Tempus_BackwardEuler_Combined_FSA_MPI_1 (Timeout) 2130 - Tempus_BackwardEuler_Staggered_FSA_MPI_1 (Timeout) 2132 - Tempus_BackwardEuler_ASA_MPI_1 (Timeout) 2133 - Tempus_BDF2_MPI_1 (Timeout) 2134 - Tempus_BDF2_Combined_FSA_MPI_1 (Timeout) 2135 - Tempus_BDF2_Staggered_FSA_MPI_1 (Timeout) 2137 - Tempus_BDF2_ASA_MPI_1 (Timeout) 2138 - Tempus_ExplicitRK_MPI_1 (Timeout) 2139 - Tempus_ExplicitRK_Combined_FSA_MPI_1 (Timeout) 2140 - Tempus_ExplicitRK_Staggered_FSA_MPI_1 (Timeout) 2142 - Tempus_ExplicitRK_ASA_MPI_1 (Timeout) 2144 - Tempus_DIRK_MPI_1 (Timeout) 2145 - Tempus_DIRK_Combined_FSA_MPI_1 (Timeout) 2146 - Tempus_DIRK_Staggered_FSA_MPI_1 (Timeout) 2148 - Tempus_DIRK_ASA_MPI_1 (Timeout) 2149 - Tempus_HHTAlpha_MPI_1 (Timeout) 2150 - Tempus_Newmark_MPI_1 (Timeout) 2153 - Tempus_IMEX_RK_Combined_FSA_MPI_1 (Timeout) 2154 - Tempus_IMEX_RK_Staggered_FSA_MPI_1 (Timeout) 2156 - Tempus_IMEX_RK_Partitioned_Combined_FSA_MPI_1 (Timeout) 2157 - Tempus_IMEX_RK_Partitioned_Staggered_FSA_MPI_1 (Timeout) 2281 - ROL_test_sol_solSROMGenerator_MPI_1 (Timeout) 2287 - ROL_test_sol_checkAlmostSureConstraint_MPI_1 (Timeout) 2319 - ROL_example_burgers-control_example_06_MPI_1 (Timeout) 2327 - ROL_example_parabolic-control_example_03_MPI_1 (Timeout) Errors while running CTest Total time for MPI_RELEASE_DEBUG_SHARED_PT = 129.32 min ``` Wow, that is even more test timeouts!
bartlettroscoe commented 6 years ago

I have heard complaints about OpenMPI 1.8.x bugs. The OpenMPI web page considers it "retired" -- in fact, the oldest "not retired" version is 1.10.

Okay, given that OpenMPI 1.10 is the oldest version of MPI that is supported, we should try to debug what is causing these timeouts. I will submit an experimental build to CDash and then we can go from there.

prwolfe commented 6 years ago

we had lots of issues with 1.8 - that's why we abandoned it. Basically it was slow and would not properly place processes. In fact we have had some issues with 1.10 but those responded well to placement directives.

mhoemmen commented 6 years ago

I remember the "let's try 1.8 .... oh that was bad let's not" episode :(

bartlettroscoe commented 6 years ago

I merged #2467 which enables experimental code in Xpetra and MueLu in the GCC 4.8.4 CI build.

bartlettroscoe commented 6 years ago

I ran the full Trilinos CI build and test suites with OpenMPI 1.6.5 (the current version used) and OpenMPI 1.10.1 on my machine crf450 and submitted to CDash using an all-at-once configure, build, and test:

The machine was loaded by another builds so I don't totally trust the timing numbers is showed but it seems that some tests and package test suites run much faster with OpenMPI 1.10.1 and others run much slower with OpenMPI 1.10.1 vs. OpenMPI 1.6.5 but overall the tests took:

You can see some of the detailed numbers on the CDash pages above and in the below notes.

I rebootted my machine crf450 and I will run these again and see what happens. But if I see numbers similar to this again, I will post a new Trilinos GitHub issue to focus on problems with Trilinos with OpenMPI 1.10.1.

DETAILED NOTES (click to expand) **(3/28/2018)** Doing an experimental submit to CDash so we can see the output from these timing out tests and then start to try to diagnose why they are failing: ``` $ cd MPI_RELEASE_DEBUG_SHARED_PT/ $ rm -r CMake* $ source ~/Trilinos.base/Trilinos/cmake/load_sems_dev_env.sh $ ./do-configure -DCTEST_BUILD_FLAGS=-j16 -DCTEST_PARALLEL_LEVEL=16 $ time make dashboard &> make.dashboard.out real 343m56.832s user 1232m32.619s sys 228m59.516s ``` This submitted to: * https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos&date=2018-03-28&filtercombine=and&filtercombine=and&filtercount=3&showfilters=1&filtercombine=and&field1=site&compare1=63&value1=crf450&field2=buildname&compare2=61&value2=Linux-MPI_RELEASE_DEBUG_SHARED_PT&field3=buildstamp&compare3=61&value3=20180328-2153-Experimental Interestingly, when running the tests package-by-package, there were fewer timeouts (16 total). The only timeouts were in Teko (1) and Tempus (15). **(3/29/2018)** **A) Initial all-at-once configure, build, test and submit with 2462-openmpi-1.6.5-to-1.10.1:** I will then do an all-at-once submmit configure, build, test, and submit and see what happens: ``` $ cd CHECKIN/MPI_RELEASE_DEBUG_SHARED_PT/ $ rm -r CMake* $ source ~/Trilinos.base/Trilinos/cmake/load_sems_dev_env.sh $ export PATH=/home/vera_env/common_tools/cmake-3.10.1/bin:$PATH $ which cmake /home/vera_env/common_tools/cmake-3.10.1/bin/cmake $ which mpirun /projects/sems/install/rhel6-x86_64/sems/compiler/gcc/4.8.4/openmpi/1.10.1/bin/mpirun $ ./do-configure -DCTEST_BUILD_FLAGS=-j16 -DCTEST_PARALLEL_LEVEL=4 \ -DTrilinos_CTEST_DO_ALL_AT_ONCE=TRUE -DTrilinos_CTEST_USE_NEW_AAO_FEATURES=ON $ time make dashboard &> make.dashboard.out real 159m59.918s user 1657m20.003s sys 132m37.954s ``` This submitted to: * https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos&date=2018-03-28&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=3&showfilters=1&filtercombine=and&field1=site&compare1=63&value1=crf450&field2=buildname&compare2=61&value2=Linux-MPI_RELEASE_DEBUG_SHARED_PT&field3=buildstamp&compare3=61&value3=20180329-1225-Experimental This showed 18 timeouts for the packages Tempus (14), MueLu (1), ROL (1), Rythmos (1), and Teko (1). There is a lot of data shown on CDash. **B) Baseline all-at-once configure, build, test and submit with 2462-openmpi-1.6.5-to-1.10.1-base:** Now, for a basis of comparison, I should compare with the OpenMPI 1.6.5 build. I can do this by creating another branch that is for the exact same version of Trilinos: ``` $ cd Trilinos/ $ git checkout -b 2462-openmpi-1.6.5-to-1.10.1-base 65c7ac6 $ git push -u rab-github 2462-openmpi-1.6.5-to-1.10.1-base $ git log-short -1 --name-status 65c7ac6 "Merge branch 'develop' of github.com:trilinos/Trilinos into develop" Author: Chris Siefert Date: Tue Mar 27 16:24:26 2018 -0600 (2 days ago) ``` Now run the all-at-once configure, build, test, and submit again: ``` $ cd CHECKIN/MPI_RELEASE_DEBUG_SHARED_PT/ $ source ~/Trilinos.base/Trilinos/cmake/load_sems_dev_env.sh $ export PATH=/home/vera_env/common_tools/cmake-3.10.1/bin:$PATH $ which cmake /home/vera_env/common_tools/cmake-3.10.1/bin/cmake $ which mpirun /projects/sems/install/rhel6-x86_64/sems/compiler/gcc/4.8.4/openmpi/1.6.5/bin/mpirun $ rm -r CMake* $ time ./do-configure -DCTEST_BUILD_FLAGS=-j16 -DCTEST_PARALLEL_LEVEL=4 \ -DTrilinos_CTEST_DO_ALL_AT_ONCE=TRUE -DTrilinos_CTEST_USE_NEW_AAO_FEATURES=ON \ -DDART_TESTING_TIMEOUT=1200 \ &> configure.2462-openmpi-1.6.5-to-1.10.1-base.out real 2m43.743s user 1m35.215s sys 0m36.769s $ time make dashboard &> make.dashboard.2462-openmpi-1.6.5-to-1.10.1-base.out real 153m14.541s user 1335m57.220s sys 107m48.393s ``` This passed all of the tests and submitted to: * https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos&date=2018-03-29&filtercombine=and&filtercount=3&showfilters=1&filtercombine=and&field1=site&compare1=63&value1=crf450&field2=buildname&compare2=61&value2=Linux-MPI_RELEASE_DEBUG_SHARED_PT&field3=buildstamp&compare3=61&value3=20180329-2143-Experimental And the local ctest -S output showed all passing: ``` 100% tests passed, 0 tests failed out of 2586 Subproject Time Summary: Amesos = 71.86 sec*proc (14 tests) Amesos2 = 36.33 sec*proc (9 tests) Anasazi = 383.07 sec*proc (71 tests) AztecOO = 57.40 sec*proc (17 tests) Belos = 374.60 sec*proc (72 tests) Domi = 417.55 sec*proc (125 tests) Epetra = 120.94 sec*proc (61 tests) EpetraExt = 53.28 sec*proc (11 tests) FEI = 94.68 sec*proc (43 tests) Galeri = 10.34 sec*proc (9 tests) GlobiPack = 0.53 sec*proc (6 tests) Ifpack = 201.11 sec*proc (48 tests) Ifpack2 = 102.01 sec*proc (35 tests) Intrepid = 136.61 sec*proc (152 tests) Intrepid2 = 36.17 sec*proc (144 tests) Isorropia = 27.71 sec*proc (6 tests) Kokkos = 39.54 sec*proc (23 tests) KokkosKernels = 70.40 sec*proc (4 tests) ML = 158.93 sec*proc (34 tests) MiniTensor = 0.12 sec*proc (2 tests) MueLu = 773.40 sec*proc (80 tests) NOX = 364.33 sec*proc (106 tests) OptiPack = 21.45 sec*proc (5 tests) Panzer = 1855.67 sec*proc (154 tests) Phalanx = 3.23 sec*proc (27 tests) Pike = 2.74 sec*proc (7 tests) Piro = 90.63 sec*proc (12 tests) ROL = 1916.41 sec*proc (153 tests) RTOp = 27.06 sec*proc (24 tests) Rythmos = 154.85 sec*proc (83 tests) SEACAS = 7.05 sec*proc (14 tests) STK = 46.83 sec*proc (12 tests) Sacado = 44.83 sec*proc (292 tests) Shards = 0.35 sec*proc (4 tests) ShyLU_Node = 0.18 sec*proc (3 tests) Stokhos = 134.80 sec*proc (75 tests) Stratimikos = 51.75 sec*proc (40 tests) Teko = 196.13 sec*proc (19 tests) Tempus = 2215.52 sec*proc (36 tests) Teuchos = 104.16 sec*proc (137 tests) ThreadPool = 19.33 sec*proc (10 tests) Thyra = 171.53 sec*proc (81 tests) Tpetra = 526.10 sec*proc (162 tests) TrilinosCouplings = 67.80 sec*proc (24 tests) Triutils = 8.64 sec*proc (2 tests) Xpetra = 157.05 sec*proc (18 tests) Zoltan = 813.39 sec*proc (19 tests) Zoltan2 = 479.54 sec*proc (101 tests) Total Test time (real) = 3190.31 sec ``` The most expensive tests were: ``` $ grep " Test " make.dashboard.2462-openmpi-1.6.5-to-1.10.1-base.out | grep "sec$" | sort -nr -k 7 | head -n 30 6/2586 Test #2140: Tempus_ExplicitRK_Staggered_FSA_MPI_1 ......................................................... Passed 215.29 sec 5/2586 Test #2157: Tempus_IMEX_RK_Partitioned_Staggered_FSA_MPI_1 ................................................ Passed 214.29 sec 4/2586 Test #2145: Tempus_DIRK_Combined_FSA_MPI_1 ................................................................ Passed 188.22 sec 3/2586 Test #2146: Tempus_DIRK_Staggered_FSA_MPI_1 ............................................................... Passed 161.72 sec 9/2586 Test #2156: Tempus_IMEX_RK_Partitioned_Combined_FSA_MPI_1 ................................................. Passed 160.28 sec 7/2586 Test #2139: Tempus_ExplicitRK_Combined_FSA_MPI_1 .......................................................... Passed 146.90 sec 10/2586 Test #2148: Tempus_DIRK_ASA_MPI_1 ......................................................................... Passed 125.28 sec 8/2586 Test #2149: Tempus_HHTAlpha_MPI_1 ......................................................................... Passed 124.29 sec 15/2586 Test #2142: Tempus_ExplicitRK_ASA_MPI_1 ................................................................... Passed 117.84 sec 19/2586 Test #2154: Tempus_IMEX_RK_Staggered_FSA_MPI_1 ............................................................ Passed 102.77 sec 17/2586 Test #2153: Tempus_IMEX_RK_Combined_FSA_MPI_1 ............................................................. Passed 97.35 sec 149/2586 Test #564: Zoltan_hg_simple_zoltan_parallel .............................................................. Passed 89.07 sec 36/2586 Test #2150: Tempus_Newmark_MPI_1 .......................................................................... Passed 76.28 sec 147/2586 Test #2368: ROL_example_PDE-OPT_0ld_adv-diff-react_example_02_MPI_4 ....................................... Passed 70.90 sec 151/2586 Test #558: Zoltan_ch_simple_zoltan_parallel .............................................................. Passed 67.63 sec 148/2586 Test #2533: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-3 .................................... Passed 65.29 sec 146/2586 Test #2529: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 .................................. Passed 64.84 sec 45/2586 Test #2133: Tempus_BDF2_MPI_1 ............................................................................. Passed 55.00 sec 27/2586 Test #2128: Tempus_BackwardEuler_MPI_1 .................................................................... Passed 51.35 sec 40/2586 Test #2281: ROL_test_sol_solSROMGenerator_MPI_1 ........................................................... Passed 49.37 sec 14/2586 Test #2134: Tempus_BDF2_Combined_FSA_MPI_1 ................................................................ Passed 45.76 sec 46/2586 Test #2102: Rythmos_IntegratorBuilder_ConvergenceTest_MPI_1 ............................................... Passed 45.22 sec 150/2586 Test #2363: ROL_example_PDE-OPT_0ld_poisson_example_01_MPI_4 .............................................. Passed 43.39 sec 12/2586 Test #2130: Tempus_BackwardEuler_Staggered_FSA_MPI_1 ...................................................... Passed 40.56 sec 11/2586 Test #2129: Tempus_BackwardEuler_Combined_FSA_MPI_1 ....................................................... Passed 40.44 sec 13/2586 Test #2135: Tempus_BDF2_Staggered_FSA_MPI_1 ............................................................... Passed 39.86 sec 53/2586 Test #1997: MueLu_ParameterListInterpreterTpetra_MPI_1 .................................................... Passed 37.92 sec 20/2586 Test #173: KokkosKernels_graph_serial_MPI_1 .............................................................. Passed 33.98 sec 153/2586 Test #2382: ROL_example_PDE-OPT_topo-opt_elasticity_example_01_MPI_4 ...................................... Passed 33.44 sec 152/2586 Test #2246: ROL_adapters_tpetra_test_sol_TpetraSimulatedConstraintInterfaceCVaR_MPI_4 ..................... Passed 33.05 sec ``` Now this is a solid basis of comparison for using OpenMPI 1.10.1. **C) Follow-up all-at-once configure, build, test and submit with 2462-openmpi-1.6.5-to-1.10.1:** That is not a lot of free memory left. It may have been that my machine was swapping to disk when trying to run the tests. I should try running the tests again locally bu this time using less processes and a larger timeout after going back to the branch `2462-openmpi-1.6.5-to-1.10.1`: ``` $ cd CHECKIN/MPI_RELEASE_DEBUG_SHARED_PT/ $ source ~/Trilinos.base/Trilinos/cmake/load_sems_dev_env.sh $ export PATH=/home/vera_env/common_tools/cmake-3.10.1/bin:$PATH $ which cmake /home/vera_env/common_tools/cmake-3.10.1/bin/cmake $ which mpirun /projects/sems/install/rhel6-x86_64/sems/compiler/gcc/4.8.4/openmpi/1.10.1/bin/mpirun $ rm -r CMake* $ time ./do-configure -DCTEST_BUILD_FLAGS=-j16 -DCTEST_PARALLEL_LEVEL=16 \ -DTrilinos_CTEST_DO_ALL_AT_ONCE=TRUE -DTrilinos_CTEST_USE_NEW_AAO_FEATURES=ON \ &> configure.2462-openmpi-1.6.5-to-1.10.1.out real 2m47.845s user 1m38.024s sys 0m38.445s $ time make dashboard &> make.dashboard.2462-openmpi-1.6.5-to-1.10.1.out real 182m12.103s user 1554m1.530s sys 123m35.152s ``` This posted results to: * https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos&parentid=3401996&filtercount=3&showfilters=1&field1=buildstamp&compare1=61&value1=20180330-0025-Experimental&filtercombine=and The test results shown in the ctest -S output were: ``` 99% tests passed, 5 tests failed out of 2586 Subproject Time Summary: Amesos = 18.34 sec*proc (14 tests) Amesos2 = 8.02 sec*proc (9 tests) Anasazi = 110.75 sec*proc (71 tests) AztecOO = 7.67 sec*proc (17 tests) Belos = 105.01 sec*proc (72 tests) Domi = 162.23 sec*proc (125 tests) Epetra = 32.93 sec*proc (61 tests) EpetraExt = 14.38 sec*proc (11 tests) FEI = 38.18 sec*proc (43 tests) Galeri = 3.93 sec*proc (9 tests) GlobiPack = 1.12 sec*proc (6 tests) Ifpack = 52.86 sec*proc (48 tests) Ifpack2 = 57.62 sec*proc (35 tests) Intrepid = 465.90 sec*proc (152 tests) Intrepid2 = 110.21 sec*proc (144 tests) Isorropia = 4.74 sec*proc (6 tests) Kokkos = 119.54 sec*proc (23 tests) KokkosKernels = 219.11 sec*proc (4 tests) ML = 44.09 sec*proc (34 tests) MiniTensor = 0.53 sec*proc (2 tests) MueLu = 878.95 sec*proc (80 tests) NOX = 252.04 sec*proc (106 tests) OptiPack = 6.39 sec*proc (5 tests) Panzer = 1802.58 sec*proc (154 tests) Phalanx = 8.27 sec*proc (27 tests) Pike = 1.42 sec*proc (7 tests) Piro = 60.27 sec*proc (12 tests) ROL = 2447.89 sec*proc (153 tests) RTOp = 5.18 sec*proc (24 tests) Rythmos = 444.69 sec*proc (83 tests) SEACAS = 15.40 sec*proc (14 tests) STK = 53.82 sec*proc (12 tests) Sacado = 43.39 sec*proc (292 tests) Shards = 0.66 sec*proc (4 tests) ShyLU_Node = 0.61 sec*proc (3 tests) Stokhos = 174.51 sec*proc (75 tests) Stratimikos = 67.55 sec*proc (40 tests) Teko = 247.80 sec*proc (19 tests) Tempus = 7564.71 sec*proc (36 tests) Teuchos = 32.62 sec*proc (137 tests) ThreadPool = 3.50 sec*proc (10 tests) Thyra = 37.33 sec*proc (81 tests) Tpetra = 125.66 sec*proc (162 tests) TrilinosCouplings = 140.58 sec*proc (24 tests) Triutils = 1.04 sec*proc (2 tests) Xpetra = 92.05 sec*proc (18 tests) Zoltan = 75.40 sec*proc (19 tests) Zoltan2 = 152.14 sec*proc (101 tests) Total Test time (real) = 4116.63 sec The following tests FAILED: 1506 - Teko_testdriver_tpetra_MPI_1 (Failed) 2140 - Tempus_ExplicitRK_Staggered_FSA_MPI_1 (Timeout) 2145 - Tempus_DIRK_Combined_FSA_MPI_1 (Timeout) 2146 - Tempus_DIRK_Staggered_FSA_MPI_1 (Timeout) 2157 - Tempus_IMEX_RK_Partitioned_Staggered_FSA_MPI_1 (Timeout) ``` The most expensive tests were: ``` $ grep " Test " make.dashboard.2462-openmpi-1.6.5-to-1.10.1.out | grep "Timeout" 2/2586 Test #2157: Tempus_IMEX_RK_Partitioned_Staggered_FSA_MPI_1 ................................................***Timeout 600.01 sec 3/2586 Test #2140: Tempus_ExplicitRK_Staggered_FSA_MPI_1 .........................................................***Timeout 600.05 sec 4/2586 Test #2145: Tempus_DIRK_Combined_FSA_MPI_1 ................................................................***Timeout 600.26 sec 5/2586 Test #2146: Tempus_DIRK_Staggered_FSA_MPI_1 ...............................................................***Timeout 600.14 sec $ grep " Test " make.dashboard.2462-openmpi-1.6.5-to-1.10.1.out | grep "sec$" | sort -nr -k 7 | head -n 30 9/2586 Test #2156: Tempus_IMEX_RK_Partitioned_Combined_FSA_MPI_1 ................................................. Passed 579.68 sec 8/2586 Test #2139: Tempus_ExplicitRK_Combined_FSA_MPI_1 .......................................................... Passed 567.91 sec 7/2586 Test #2148: Tempus_DIRK_ASA_MPI_1 ......................................................................... Passed 465.64 sec 6/2586 Test #2149: Tempus_HHTAlpha_MPI_1 ......................................................................... Passed 438.92 sec 13/2586 Test #2142: Tempus_ExplicitRK_ASA_MPI_1 ................................................................... Passed 435.95 sec 15/2586 Test #2154: Tempus_IMEX_RK_Staggered_FSA_MPI_1 ............................................................ Passed 368.65 sec 19/2586 Test #2153: Tempus_IMEX_RK_Combined_FSA_MPI_1 ............................................................. Passed 353.16 sec 33/2586 Test #2150: Tempus_Newmark_MPI_1 .......................................................................... Passed 255.15 sec 40/2586 Test #2281: ROL_test_sol_solSROMGenerator_MPI_1 ........................................................... Passed 199.18 sec 43/2586 Test #2133: Tempus_BDF2_MPI_1 ............................................................................. Passed 193.66 sec 26/2586 Test #2128: Tempus_BackwardEuler_MPI_1 .................................................................... Passed 171.63 sec 14/2586 Test #2134: Tempus_BDF2_Combined_FSA_MPI_1 ................................................................ Passed 167.25 sec 51/2586 Test #1997: MueLu_ParameterListInterpreterTpetra_MPI_1 .................................................... Passed 157.52 sec 42/2586 Test #2102: Rythmos_IntegratorBuilder_ConvergenceTest_MPI_1 ............................................... Passed 156.30 sec 10/2586 Test #2129: Tempus_BackwardEuler_Combined_FSA_MPI_1 ....................................................... Passed 150.13 sec 11/2586 Test #2130: Tempus_BackwardEuler_Staggered_FSA_MPI_1 ...................................................... Passed 147.52 sec 12/2586 Test #2135: Tempus_BDF2_Staggered_FSA_MPI_1 ............................................................... Passed 146.82 sec 24/2586 Test #1993: MueLu_ParameterListInterpreterEpetra_MPI_1 .................................................... Passed 125.19 sec 37/2586 Test #1983: MueLu_UnitTestsTpetra_MPI_1 ................................................................... Passed 120.19 sec 29/2586 Test #2287: ROL_test_sol_checkAlmostSureConstraint_MPI_1 .................................................. Passed 115.47 sec 16/2586 Test #2137: Tempus_BDF2_ASA_MPI_1 ......................................................................... Passed 113.17 sec 23/2586 Test #2319: ROL_example_burgers-control_example_06_MPI_1 .................................................. Passed 108.33 sec 21/2586 Test #2144: Tempus_DIRK_MPI_1 ............................................................................. Passed 107.34 sec 28/2586 Test #2098: Rythmos_BackwardEuler_ConvergenceTest_MPI_1 ................................................... Passed 106.76 sec 17/2586 Test #173: KokkosKernels_graph_serial_MPI_1 .............................................................. Passed 106.58 sec 186/2586 Test #2529: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 .................................. Passed 102.36 sec 22/2586 Test #2132: Tempus_BackwardEuler_ASA_MPI_1 ................................................................ Passed 93.00 sec 25/2586 Test #2138: Tempus_ExplicitRK_MPI_1 ....................................................................... Passed 90.59 sec 34/2586 Test #2327: ROL_example_parabolic-control_example_03_MPI_1 ................................................ Passed 82.84 sec 188/2586 Test #2533: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-3 .................................... Passed 82.81 sec ``` **D) Compare test runtimes:** Comparing the most expensive tests shown in {{make.dashboard.2462-openmpi-1.6.5-to-1.10.1.out}} vs the baseline {{make.dashboard.2462-openmpi-1.6.5-to-1.10.1-base.out}} we can clearly see that some tests took much longer with OpenMPI 1.10.1 vs. OpenMPI 1.6.5. Let's compare a few tests: | Test Name | OpenMPI 1.6.5 | OpenMPI 1.10.1 | | :-- | --: | --: | | Tempus_ExplicitRK_Staggered_FSA_MPI_1 | 215.29 sec | Timeout 600.05 sec | | Tempus_IMEX_RK_Partitioned_Staggered_FSA_MPI_1 | 214.29 sec | Timeout 600.05 sec | | Tempus_DIRK_Combined_FSA_MPI_1 | 188.22 sec | Timeout 600.26 sec | | Tempus_IMEX_RK_Partitioned_Combined_FSA_MPI | 160.28 sec | 579.68 sec | | Tempus_DIRK_ASA_MPI_1 | 125.28 sec | 465.64 sec | | Tempus_HHTAlpha_MPI_1 | 124.29 sec | 438.92 sec | Tempus_ExplicitRK_ASA_MPI_1 | 117.84 sec | 435.95 sec | | Tempus_IMEX_RK_Staggered_FSA_MPI_1 | 102.77 sec | 368.65 sec | | Tempus_IMEX_RK_Combined_FSA_MPI_1 | 97.35 sec | 353.16 sec | | Zoltan_hg_simple_zoltan_parallel | 89.07 sec | 8.07 sec* | | Tempus_Newmark_MPI_1 | 76.28 sec | 255.15 sec | 65.09 sec | | ROL_example_PDE-OPT_0ld_adv-diff-react_example_02_MPI_4 | 70.90 sec | 65.09 sec* | Note: the times with OpenMPI 1.10.1 mared with `*` were not showen in the list of the 30 most expensive tests for that case. Instead, I had to get the values out of the file `make.dashboard.2462-openmpi-1.6.5-to-1.10.1.out` on the machine. I need to run these builds and tests again on a unloaded machine so before I believe these numbers. But it does look like there is a big perforamnce problem with OpenMPI 1.10.1 vs. OpenMPI 1.6.5 for some builds and some packages.
bartlettroscoe commented 6 years ago

The PR #2609 provides a single *.cmake file to to define this build. The auto PR tester driver bash script just needs to source:

$ source <trilinos-base-dir>/cmake/load_sems_dev_env.sh

and then the ctest -S <script> driver script just needs the argument:

-C <trilinos-base-dir>/cmake/std/MpiReleaseDebugSharedPtSerial.cmake

and that is it.

The most important settings that we don't want to duplicate all over the place that are included in this file are from the files SEMSDevEnv.cmake and BasicCiTestingSettings.cmake.

bartlettroscoe commented 6 years ago

Now that the GCC-4.8.4-OpenMPI-1.10.1-MpiReleaseDebugSharedPtOpenMP build is 100% clean as described in https://github.com/trilinos/Trilinos/issues/2691#issuecomment-393184370, I will change this over to be the new CI build and be used as the default build for the checkin-test-sems.sh script.

@trilinos/framework,

This build is now ready to be used to replace the existing GCC 4.8.4 auto PR build. The build GCC-4.8.4-OpenMPI-1.10.1-MpiReleaseDebugSharedPtOpenMP completely matches the agreed-to GCC 4.8.4 in #2317.

bartlettroscoe commented 6 years ago

The post-push CI build linked to from:

is now set to the updated GCC 4.8.4 + Intel 1.10.1 + OpenMP build and it finished the initial build this morning of all 53 packages passing all 2722 tests. And it ran all of these tests in a wall-clock time of 24m 56s (on 8 cores).

@trililinos/framework, I this this build should be ready to substitute for the existing GCC 4.8.4 auto PR build. Should we open a new GitHub issue for that?

Otherwise, I am putting this in review.

bartlettroscoe commented 6 years ago

Given that issue #2788 exists for using this configuration for that auto PR GCC 4.8.4 build, I am closing this issue #2462 since there is nothing left to do. This updated configuration is being used in the post-push CI build so we will get an email if there are any failures going forward.