Closed bartlettroscoe closed 1 year ago
CC: @tcclevenger
And this error just took out a PR testing iteration for PR #10751 shown in the build:
CC: @fryeguy52
FYI: I searched the Trilinos 'develop' branch as of commit 1cd5bae44bc:
* 1cd5bae44bc "Merge pull request #10809 from cgcgcg/cxxStandard"
| Author: Christian Glusa <cgcgcg@users.noreply.github.com>
| Date: Wed Aug 3 08:34:02 2022 -0600 (63 minutes ago)
|
| M cmake/ctest/drivers/enigma/TrilinosCTestDriverCore.enigma.gcc.cmake
| M cmake/ctest/drivers/geminga/TrilinosCTestDriverCore.geminga.gcc-cuda.cmake
| M cmake/ctest/drivers/geminga/TrilinosCTestDriverCore.geminga.gcc.cmake
| M cmake/ctest/drivers/lightsaber/TrilinosCTestDriverCore.lightsaber.gcc.cmake
| M cmake/ctest/drivers/rocketman/TrilinosCTestDriverCore.rocketman.gcc.cmake
| M cmake/ctest/drivers/trappist/TrilinosCTestDriverCore.trappist.clang.cmake
| M cmake/ctest/drivers/trappist/TrilinosCTestDriverCore.trappist.gcc.cmake
and I did a search to try to find the code that is generated this command:
cmake --build . --config Release -- -j29 -k 0
by running:
$ cd Trilinos/
$ find . -type f -exec grep -nH "[-][-]build [.]" {} \; | grep -v /TriBITS/ | grep -v cmake/tribits/
./kokkos/appveyor.yml:9: cmake --build . --target install &&
./seacas/.appveyor.yml:83: - cmd: cmake --build . --config %configuration% -- /maxcpucount:4
./packages/kokkos/appveyor.yml:9: cmake --build . --target install &&
./packages/sacado/test/GTestSuite/googletest/appveyor.yml:111: & cmake --build . --config $env:configuration -- $cmake_parallel
./packages/sacado/test/GTestSuite/googletest/googletest/README.md:106:execute_process(COMMAND ${CMAKE_COMMAND} --build .
The closest match above is:
./packages/sacado/test/GTestSuite/googletest/appveyor.yml:111: & cmake --build . --config $env:configuration -- $cmake_parallel
That name appveyor
gets mentioned in:
$ cd packages/sacado/test/GTestSuite/googletest/
$ find . -type f -exec grep -nH appveyor {} \;
./appveyor.yml:63: appveyor DownloadFile https://github.com/bazelbuild/bazel/releases/download/0.28.1/bazel-0.28.1-windows-x86_64.exe -FileName bazel.exe
./README.md:6:[![Build status](https://ci.appveyor.com/api/projects/status/4o38plt0xbo1ubc8/branch/master?svg=true)](https://ci.appveyor.com/project/GoogleTestAppVeyor/googletest/branch/master)
But looking at the line from appveyor.yml
is shows:
$cmake_parallel = if ($env:generator -eq "MinGW Makefiles") {"-j2"} else {"/m"}
& cmake --build . --config $env:configuration -- $cmake_parallel
Well, that does not match the the signature:
cmake --build . --config Release -- -j29 -k 0
It seems that file appveyor.yml
is a configuration file for a tool appveyor
which is used as part of a CI/CD system called AppVoyer. The main website https://www.appveyor.com/ shows that Google is one of their customers so I can't see how that would get run in Trilinos PR testing.
So I am stumped how this command is getting run as part of Trilinos PR testing.
I will see if I can reproduce these errors myself (word is that we should be able to which I will try out now).
CC: @csiefer2, @e10harvey
NOTE: The builds that show these errors are similar to those that show "6" build errors reported in #10836 in that they zero tests "Not Run", "Fail" and "Pass". These are a little harder to search for on CDash but this query seems to select them.
Looking over this set of builds, we see different types of errors reported for the command:
"<base-dir>/cmake" "--build" "." "--config" "Debug" "--" "-j20" "-k" "0"
These look like real build errors in Trilinos but they are not being reported correctly with each package. Instead, they are just reported for the outer cmake --build .
command.
Here are some examples of different build errors reported:
1. EpetraOperatorWrapper_UnitTests.cpp cannot open Trilinos_Util_CrsMatrixGallery.h :
/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/thyra/adapters/epetra/test/EpetraOperatorWrapper/EpetraOperatorWrapper_UnitTests.cpp:54:10: fatal error: Trilinos_Util_CrsMatrixGallery.h: No such file or directory
#include "Trilinos_Util_CrsMatrixGallery.h"
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
NOTE: All of the examples above are from the builds rhel7_sems-gnu-7.2.0
or rhel7_sems-gnu-8.3.0
!
2. MueLu_Test_ETI.hpp ISO C++ forbids declaration of ‘type name’ with no type:
In file included from /scratch/trilinos/jenkins/ascic166/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/muelu/test/structured/Driver_Structured.cpp:437:0: /scratch/trilinos/jenkins/ascic166/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/muelu/test/structured/../unit_tests/MueLu_Test_ETI.hpp: In function ‘bool Automatic_Test_ETI(int, char**)’:
/scratch/trilinos/jenkins/ascic166/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/muelu/test/structured/../unit_tests/MueLu_Test_ETI.hpp:91:31: error: ISO C++ forbids declaration of ‘type name’ with no type [-fpermissive]
Teuchos::RCP<const Teuchos::MpiComm<int> > comm = Teuchos::rcp_dynamic_cast<const Teuchos::MpiComm<int> >(Teuchos::DefaultComm<int>::getComm());
^~~~~~~
NOTE: All of the examples above are from the builds rhel7_sems-gnu-7.2.0
!
3. ninja: error: loading 'build.ninja': No such file or directory:
ninja: error: loading 'build.ninja': No such file or directory
4. No error output:
Note, we see the error:
1. EpetraOperatorWrapper_UnitTests.cpp cannot open Trilinos_Util_CrsMatrixGallery.h
above being cleanly reported on the 'vortex' builds with the Thyra package shown here impacting PRs #10834, #10802, #10801, and #10751.
What I think is happening is that the same build error for the 'ascic' builds with the 'gnu-7.2.0' and 'gnu-8.3.0' builds is getting reported through the command cmake --build . --config Release -- -j29 -k 0
in a way that I don't understand.
We need to see if we can reproduce this build error on one of the 'ascic' builds locally.
FYI: I am trying to reproduce the 1. EpetraOperatorWrapper_UnitTests.cpp cannot open Trilinos_Util_CrsMatrixGallery.h error for the build:
rhel7_sems-gnu-7.2.0-openmpi-1.10.1-serial_debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
on the machine 'hpws055'.
FYI: I tried to reproduce the build error 1.EpetraOperatorWrapper_UnitTests.cpp cannot open Trilinos_Util_CrsMatrixGallery.h for the build:
rhel7_sems-gnu-7.2.0-openmpi-1.10.1-serial_debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
from the machine 'hpws055' and I was not successful in doing so. All of Thyra built just fine, including the executable Thyra_EpetraOperatorWrapper_UnitTests
. However, the tests all crash showing:
... lookup error: /projects/sems/install/rhel7-x86_64/sems/compiler/gcc/7.2.0/openmpi/1.10.1/lib/libmca_common_verbs.so.7: undefined symbol: ompi_common_verbs_usnic_register_fake_drivers
It appears you can't reproduce Trilinos PR builds on HPWS machines at SNL :-(
I will try reproducing on a real 'ascicgpu' machine.
FYI: I tried to reproduce the build error 1. EpetraOperatorWrapper_UnitTests.cpp cannot open Trilinos_Util_CrsMatrixGallery.h for the build:
rhel7_sems-gnu-7.2.0-openmpi-1.10.1-serial_debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
from the machine 'ascicgpu17' and I was not successful in doing so. All of Thyra built just fine, including the executable Thyra_EpetraOperatorWrapper_UnitTests
and all of the tests ran successfully. That submitted to CDash here and showed all 82 passing Thyra tests.
FYI: There is independent confirmation in new issue #10842 of the error 1. EpetraOperatorWrapper_UnitTests.cpp cannot open Trilinos_Util_CrsMatrixGallery.h. I will move my analysis of this error over to that issue.
NOTE: My current hypothesis is that an older version of Trilinos from a couple of weeks ago showed this error but has since been fixed on 'develop'. I will test that hypothesis out and document findings in #10842.
FYI: There is another clue in https://github.com/trilinos/Trilinos/issues/10842#issuecomment-1208600267. It seems that you might see the error 1. EpetraOperatorWrapper_UnitTests.cpp missing Trilinos_Util_CrsMatrixGallery. when running out of disk space.
FYI: I made my last very careful effort to reproduce the EpetraOperatorWrapper_UnitTests.cpp cannot open Trilinos_Util_CrsMatrixGallery.h error in https://github.com/trilinos/Trilinos/issues/10842#issuecomment-1208682513 for the 'vortex' build for PR #10808 and I was not able to do so (i.e. it passed the build).
Note that this issue is also tracking what was reported in #10906.
"In some PR testing, compile failures are erroneously showing up under the subproject Zoltan2Sphyx."
FYI: Still no XML files being archived in the Jenkins jobs to allow us to debug what is causing this behavior. See TRILINOSHD-188.
FYI: We are still seeing a bunch of these cases where errors are reported to Zoltan2Sphynx as seen here over the last 2 days with 7 PR iterations showing failures:
CC: @e10harvey, @zackgalbreath
FYI: The problem of reporting the global cmake --build . [other arguments]
command does not seem to be solved. In the build PR-10962-test-rhel7_sems-gnu-7.2.0-serial_release-debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_no-mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables-1025 that just ran an hour ago, it shows the build errors:
which shows a build error in the example object file:
packages/compadre/examples/CMakeFiles/Compadre_GMLS_Manifold_Test.dir/GMLS_Manifold.cpp.o
Why is that build error not being reported along with the Compadre?
The Build.xml file archived in:
shown here is given below.
What is strange about these two build errors is that they are for the same Compadre build error:
/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/compadre/examples/GMLS_Manifold.cpp: In function ‘int main(int, char**)’:
/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/compadre/examples/GMLS_Manifold.cpp:503:9: error: iteration 2147483647 invokes undefined behavior [-Werror=aggressive-loop-optimizations]
for (int j=0; j<dimension-1; ++j) {
^~~
[CTest: warning matched] /scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/compadre/examples/GMLS_Manifold.cpp:503:24: note: within this loop
for (int j=0; j<dimension-1; ++j) {
~^~~~~~~~~~~~
[CTest: warning matched] cc1plus: all warnings being treated as errors
and the Build.xml file shows two entries for the same build error. It is almost like the ctest -S process is running the build twice: once with launchers turned on and a follow up build with launchers turned off.
The second failure for the global cmake --build command entry in the XML file shows:
<Failure type="Error">
<!-- Meta-information about the build action -->
<Action/>
<!-- Details of command -->
<Command>
<WorkingDirectory>/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/pull_request_test</WorkingDirectory>
<Argument>/projects/sems/install/rhel7-x86_64/sems/utility/cmake/3.19.1/bin/cmake</Argument>
<Argument>--build</Argument>
<Argument>.</Argument>
<Argument>--config</Argument>
<Argument>Debug</Argument>
<Argument>--</Argument>
<Argument>-j20</Argument>
<Argument>-k</Argument>
<Argument>0</Argument>
</Command>
<!-- Result of command -->
<Result>
<StdOut>[1/13366] Building CXX object packages/kokkos/core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_NumericTraits.cpp.o
[2/13366] Building CXX object packages/kokkos/core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_CPUDiscovery.cpp.o
[3/13366] Building CXX object packages/kokkos/core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_MemorySpace.cpp.o
[4/13366] Building CXX object packages/kokkos/core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_MemoryPool.cpp.o
[5/13366] Building CXX object packages/kokkos/core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Spinwait.cpp.o
...
FAILED: packages/compadre/examples/CMakeFiles/Compadre_GMLS_Manifold_Test.dir/GMLS_Manifold.cpp.o
"/projects/sems/install/rhel7-x86_64/sems/utility/cmake/3.19.1/bin/ctest" --launch --target-name Compadre_GMLS_Manifold_Test --build-dir /scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/pull_request_test/packages/compadre/examples --output packages/compadre/examples/CMakeFiles/Compadre_GMLS_Manifold_Test.dir/GMLS_Manifold.cpp.o --source /scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/compadre/examples/GMLS_Manifold.cpp --language CXX --filter-prefix "" -- /projects/sems/install/rhel7-x86_64/sems/compiler/gcc/7.2.0/base/bin/g++ -I. -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/compadre/examples -Ipackages/compadre/src -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/compadre/src -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/compadre/src/basis -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/compadre/src/constraints -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/compadre/src/tpl -Ipackages/kokkos/core/src -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/kokkos/core/src -Ipackages/kokkos -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/kokkos/core/src/../../tpls/desul/include -Ipackages/kokkos/containers/src -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/kokkos/containers/src -Ipackages/kokkos/algorithms/src -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/kokkos/algorithms/src -Ipackages/kokkos-kernels/src -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/kokkos-kernels/src -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/kokkos-kernels/src/impl -Ipackages/kokkos-kernels/src/impl -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/kokkos-kernels/src/impl/tpls -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/kokkos-kernels/src/blas -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/kokkos-kernels/src/blas/impl -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/kokkos-kernels/src/sparse -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/kokkos-kernels/src/sparse/impl -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/kokkos-kernels/src/graph -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/kokkos-kernels/src/graph/impl -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/kokkos-kernels/src/batched -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/kokkos-kernels/src/batched/dense -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/kokkos-kernels/src/batched/dense/impl -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/kokkos-kernels/src/batched/sparse -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/kokkos-kernels/src/batched/sparse/impl -I/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/kokkos-kernels/src/common -isystem /projects/sems/install/rhel7-x86_64/sems/tpl/superlu/4.3/gcc/7.2.0/base/include -pedantic -Wall -Wno-long-long -Wwrite-strings -Wall -Wno-clobbered -Wno-vla -Wno-pragmas -Wno-unknown-pragmas -Wno-unused-local-typedefs -Wno-literal-suffix -Wno-deprecated-declarations -Wno-misleading-indentation -Wno-int-in-bool-context -Wno-maybe-uninitialized -Wno-nonnull-compare -Wno-address -Wno-inline -Wno-unused-but-set-variable -Wno-unused-variable -Wno-unused-label -Werror -DTRILINOS_HIDE_DEPRECATED_HEADER_WARNINGS -O3 -DNDEBUG -std=c++14 -MD -MT packages/compadre/examples/CMakeFiles/Compadre_GMLS_Manifold_Test.dir/GMLS_Manifold.cpp.o -MF packages/compadre/examples/CMakeFiles/Compadre_GMLS_Manifold_Test.dir/GMLS_Manifold.cpp.o.d -o packages/compadre/examples/CMakeFiles/Compadre_GMLS_Manifold_Test.dir/GMLS_Manifold.cpp.o -c /scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/compadre/examples/GMLS_Manifold.cpp
/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/compadre/examples/GMLS_Manifold.cpp: In function ‘int main(int, char**)’:
/scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/compadre/examples/GMLS_Manifold.cpp:503:9: error: iteration 2147483647 invokes undefined behavior [-Werror=aggressive-loop-optimizations]
for (int j=0; j<dimension-1; ++j) {
^~~
[CTest: warning matched] /scratch/trilinos/jenkins/ascic143/workspace/Trilinos_PR_gcc-7.2.0-serial/Trilinos/packages/compadre/examples/GMLS_Manifold.cpp:503:24: note: within this loop
for (int j=0; j<dimension-1; ++j) {
~^~~~~~~~~~~~
[CTest: warning matched] cc1plus: all warnings being treated as errors
[8514/13366] Building CXX object packages/stk/stk_util/stk_util/util/CMakeFiles/stk_util_util.dir/tokenize.cpp.o
[8515/13366] Building CXX object packages/stk/stk_util/stk_util/util/CMakeFiles/stk_util_util.dir/human_bytes.cpp.o
[8516/13366] Building CXX object packages/stk/stk_util/stk_util/environment/CMakeFiles/stk_util_env.dir/CPUTime.cpp.o
...
[13363/13366] Linking CXX executable packages/piro/test/Piro_ThyraSolver.exe
[13364/13366] Building CXX object packages/trilinoscouplings/examples/scaling/CMakeFiles/TrilinosCouplings_Example_Poisson_NoFE_Tpetra.dir/example_Poisson_NoFE_Tpetra.cpp.o
[13365/13366] Linking CXX executable packages/trilinoscouplings/examples/scaling/TrilinosCouplings_Example_Poisson_NoFE_Tpetra.exe
ninja: build stopped: cannot make progress due to previous errors.</StdOut>
<StdErr/>
<ExitCondition>1</ExitCondition>
</Result>
</Failure>
This is so strange.
FYI: The behavior described above turns out the be a CTest defect. For details and to follow the fix, see:
Unfortunately, I think that means we will need to upgrade CMake/CTest on all client machines to fix this which will require waiting for CMake 3.25.0 in Jan 2023 (or perhaps a patch release of CMake 3.24).
Update: The fix is going to come out in CMake 3.23.3!
FYI: The fix for this is in CMake 3.24.3 (released 2022-11-01) . (See SNL Kitware #209). Next: Install CMake 3.24.3 everywhere and use with Trilinos PR builds ...
With the upgrade of CMake 3.24.3 for all of the Trilinos PR builds yesterday, this should be resolved (see TRILINOSHD-228). For example, we are only seeing build errors for actual targets in the PR builds over the last day shown here and we see just the build error for the target:
Using a version of CMake between versions 3.19 and 2.24.2 (inclusive), we would have seen that same error showing up along with the entire ninja
build output for all targets (including all warnings that was the cause of #10836).
Closing this as complete.
Boy, that was a hard one to diagnose. But the fact that Kitware was willing to patch CMake 3.24.3, SEMS was willing to install CMake 3.24.3, and the Trilinos Framework team was willing and able to upgrade all of the PR builds is what allowed this to be fixed relatively quickly.
Bug Report
@trilinos/framework
Next Action Status
This is due to a defect in CTest introduced in CMake 3.18. The fix for this is in CMake 3.24.3 (released 2022-11-01) . (See SNL Kitware #209). Next: Install CMake 3.24.3 everywhere and use with Trilinos PR builds ...
Internal issues
Description
As shown in this query showing:
the new Trilinos Framework GenConfig build
rhel7_sems-cuda-11.4.2-sems-gnu-10.1.0-sems-openmpi-4.0.5_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_no-uvm_deprecated-on_no-package-enables
is failing with a build error reported in theZoltan2Sphynx
package showing:returning error code
1
.As you can see, this is currently failing in the "Master Merge" builds for promotion PRs #10820 and #10797 so this error has nothing to do with a given PR branch, this is impacting 'develop' and will impact everyone's PRs. The reason I saw it is because it took out my last PR iteration https://github.com/trilinos/Trilinos/pull/10813#issuecomment-1202330429 for PR #10813.
Steps to Reproduce
Run a PR build.