sandialabs / LCM

Laboratory for Computational Mechanics
Other
12 stars 7 forks source link

CDash reporting errors when build completes #69

Closed ikalash closed 1 year ago

ikalash commented 1 year ago

Starting on 4/20, the clang LCM build started reporting build errors to CDash when in fact there are no errors and a working executable is created. Here is what CDash says (reports just warnings, not errors): https://sems-cdash-son.sandia.gov/cdash/viewBuildError.php?type=0&buildid=48137. I am attaching the full log from the build. I'm puzzled as to what is causing this. @bartlettroscoe , as my go-to cmake/cdash contact, have you seen this sort of thing before? Any tips for how to circumvent it? We could modify the cmake/cdash scripts to run tests even if the build fails, but then the false failed build will still be reported to the CDash site.

lcm-serial-clang-release.log.txt.log

bartlettroscoe commented 1 year ago

@ikalash, I see the error reported in the build summary at:

showing:

image

Looking at the more detailed build output on the page:

at the bottom it shows:

image

What is interesting is the line:

The maximum number of reported warnings or errors has been reached!!!

I don't know if that should impact what errors are shown or not.

But looking at the build error output at:

it shows:

In file included from include/Tpetra_CrsMatrix_decl.hpp:53:
In file included from include/Tpetra_Vector.hpp:1:
In file included from include/Tpetra_Vector_decl.hpp:50:
include/Tpetra_MultiVector_decl.hpp:2339:5: warning: 'copyAndPermute' overrides a member function but is not marked 'override' [-Winconsistent-missing-override]
    copyAndPermute
    ^
include/Tpetra_DistObject_decl.hpp:855:5: note: overridden virtual function is here
    copyAndPermute (const SrcDistObject& source,
    ^
In file included from src/problems/Albany_LinComprNSProblem.cpp:5:

Can you get the build *.xml file generated by CTest on the client machine where this build is running for the last build? That might provide more clues.

Otherwise, we should ask Kitware about this. Let's talk offline to see if we can make that happen.

bartlettroscoe commented 1 year ago

@ikalash, FYI, I don't know if you noticed this or not, but at:

this build is showing a bunch of deprecation warnings at the bottom like:

CMake Warning (dev) at src/CMakeLists.txt:113 (target_link_libraries):
  The library that is being linked to, teuchosparameterlist, is marked as
  being deprecated by the owner.  The message provided by the developer is:

  WARNING: The non-namespaced target 'teuchosparameterlist' is deprecated! If
  always using newer versions of the project 'Trilinos', then use the new
  namespaced target 'TeuchosParameterList::teuchosparameterlist', or better
  yet, 'TeuchosParameterList::all_libs' to be less sensitive to changes in
  the definition of targets in the package 'TeuchosParameterList'.  Or, to
  maintain compatibility with older or newer versions the project 'Trilinos',
  instead link against the libraries specified by the variable
  'TeuchosParameterList_LIBRARIES'.

This warning is for project developers.  Use -Wno-dev to suppress it.

Fixing this should be spelled out in the above warning message.

Also note the warning at the top:

CMake Warning at /home/lcm/LCM/trilinos-install-serial-clang-release/lib/cmake/Kokkos/KokkosConfig.cmake:271 (MESSAGE):
  The installed Kokkos configuration does not support CXX extensions.
  Forcing -DCMAKE_CXX_EXTENSIONS=Off
Call Stack (most recent call first):
  /home/lcm/LCM/trilinos-install-serial-clang-release/lib/cmake/Amesos2/Amesos2Config.cmake:179 (include)
  /home/lcm/LCM/trilinos-install-serial-clang-release/lib/cmake/Stratimikos/StratimikosConfig.cmake:146 (include)
  /home/lcm/LCM/trilinos-install-serial-clang-release/lib/cmake/DataTransferKitOperators/DataTransferKitOperatorsConfig.cmake:155 (include)
  /home/lcm/LCM/trilinos-install-serial-clang-release/lib/cmake/DataTransferKit/DataTransferKitConfig.cmake:149 (include)
  /home/lcm/LCM/trilinos-install-serial-clang-release/lib/cmake/Trilinos/TrilinosConfig.cmake:123 (include)
  CMakeLists.txt:36 (find_package)

You can fix this by placing:

set(CMAKE_CXX_EXTENSIONS OFF CACHE BOOL
  "Disable C++ extensions (to avoid warnings from Kokkos)")

near the top of the top-level CMakeLists.txt file.

ikalash commented 1 year ago

Thanks for looking at this, @bartlettroscoe , in so much detail! What I am confused about is all the stuff that is reported appears to be warnings, not errors. I am not seeing any actual errors anywhere - perhaps the error is that there are too many warnings? We could try to suppress these by modifying the flags to see if it fixes the problem, I suppose. I am attaching the *xml files for the build and configure. I am not seeing anything here different than in the output I sent earlier and the CDash links, but perhaps you will see something I did not?

I'll add the CMAKE line you suggest. I will also send you a calendar invite to meet next week, if that's OK. It won't be until Wed. b/c I am traveling to ABQ on Monday and have all-day meetings Tuesday.

bartlettroscoe commented 1 year ago

FYI: We verified the problem is on the CTest side when it generates the Build.xml file that gets sent over to CDash. That XML file shows:

...
    <Build>
        <StartDateTime>May 10 00:14 PDT</StartDateTime>
        <StartBuildTime>1683702860</StartBuildTime>
        <BuildCommand>/var/lib/snapd/snap/cmake/1288/bin/cmake --build . --config "Release" --target "all" -- -i -j 72</BuildCommand>
        <Warning>
            <BuildLogLine>86</BuildLogLine>
            <Text>/.../LCM/src/disc/stk/percept/stk_rebalance/GeomDecomp.cpp:54:40: warning: 'max_size' is deprecated [-Wdeprecated-declarations]</Text>
            <SourceFile>src/disc/stk/percept/stk_rebalance/GeomDecomp.cpp</SourceFile>
            <SourceLineNumber>54</SourceLineNumber>
            <PreContext>[ 19%] Built target HeatProfile
[ 19%] Building CXX object src/CMakeFiles/albanyLib.dir/utility/PerformanceContext.cpp.o
[ 19%] Built target CylHeatProfile
[ 19%] Building CXX object src/CMakeFiles/albanyLib.dir/utility/TimeMonitor.cpp.o
[ 20%] Building CXX object src/CMakeFiles/albanyLib.dir/utility/Albany_CombineAndScatterManager.cpp.o
[ 20%] Building CXX object src/CMakeFiles/albanyLib.dir/utility/Albany_CombineAndScatterManagerTpetra.cpp.o
[ 21%] Linking CXX executable yaml2xml
[ 21%] Built target yaml2xml
[ 21%] Building CXX object src/CMakeFiles/albanyLib.dir/utility/Albany_CommUtils.cpp.o
[ 21%] Building CXX object src/CMakeFiles/albanyLib.dir/utility/Albany_Gather.cpp.o
</PreContext>
            <PostContext>        const unsigned ndim(nodal_coor.max_size(NODE_RANK));  // TODO - is there a better way to get this info?
                                       ^
</PostContext>
            <RepeatCount>0</RepeatCount>
        </Warning>

       ...

        <Warning>
            <BuildLogLine>476</BuildLogLine>
            <Text>/.../LCM/src/./Albany_EigendataInfoStructT.hpp:24:57: note: in instantiation of template class 'Tpetra::MultiVector&lt;&gt;' requested here</Text>
            <SourceFile>src/./Albany_EigendataInfoStructT.hpp</SourceFile>
            <SourceLineNumber>24</SourceLineNumber>
            <PreContext></PreContext>
            <PostContext>    eigenvectorRe = Teuchos::rcp(new Tpetra_MultiVector(*(copy.eigenvectorRe)));
                                                        ^
/.../trilinos-install-serial-clang-release/include/Tpetra_DistObject_decl.hpp:608:25: note: overridden virtual function is here
    virtual std::string description () const;
                        ^
In file included from /.../LCM/src/problems/Albany_LinComprNSProblem.cpp:5:
In file included from /.../LCM/src/./problems/Albany_LinComprNSProblem.hpp:8:
In file included from /.../LCM/src/./problems/Albany_AbstractProblem.hpp:14:
In file included from /.../LCM/src/./Albany_StateInfoStruct.hpp:17:
In file included from /.../LCM/src/./disc/Adapt_NodalDataBase.hpp:12:
The maximum number of reported warnings or errors has been reached!!!
</PostContext>
            <RepeatCount>0</RepeatCount>
        </Warning>
        <Log Encoding="base64" Compression="bin/gzip"/>
        <EndDateTime>May 10 00:18 PDT</EndDateTime>
        <EndBuildTime>1683703136</EndBuildTime>
        <ElapsedMinutes>4</ElapsedMinutes>
    </Build>

...

So what seems to be happening is that CTEST_USE_LAUNCHERS is not turned on and therefore ctest is having to scrape the direct make output to find warnings. And there is some feature where once the max number of warnings and errors (i.e. 50) has been reached, then it will not show any more, even if actual errors are below that!

Note that you see exactly 50 warnings in that Build.xml file:

$ grep "<Warning>" Build.xml | wc -l
50

My guess is that if the ctest -S <ctest-driver>.cmake driver sets:

set(CTEST_USE_LAUNCHERS 1)

and in the underlying ctest_configure() command (see here), then you will see the errors showing up in the XML file getting sent over to CDash. (CTest seems to have a different algorithm when CTEST_USE_LAUNCHERS=TRUE is set where it limits the number of warnings and errors reported separately so you will see the first 50 errors and the first 50 warnings. If you don't set CTEST_USE_LAUNCHERS=TRUE, you get just the first 50 errors or warnings and if there more than 50 warnings before any error is shown, you will never see it.)

I have never used as ctest -S script that does not set CTEST_USE_LAUNCHERS=TRUE so I don't have any experience with that mode.

Please set CTEST_USE_LAUNCHERS as described here and let's see what that looks like on CDash.

ikalash commented 1 year ago

It looks like this issue has gone away: https://sems-cdash-son.sandia.gov/cdash/index.php?project=Albany_LCM&date=2023-05-11&filtercount=1&showfilters=1&field1=groupname&compare1=61&value1=Nightly . I did add the CTEST_USE_LAUNCHERS flag; perhaps this is partially why? There are no errors reported so I cannot tell if the output reported to CDash was improved by adding this flag. I will close for now; hopefully the fix will persist.

bartlettroscoe commented 1 year ago

The fact that the build warnings went a way is a bit concerning. How did all of those warnings go away?

ikalash commented 1 year ago

I had the same thought.... no clue... sadly I've seen this sort of thing happen before and never understood it...

bartlettroscoe commented 1 year ago

What does the Build.log file show on the local machine? Does it show any warnings?

ikalash commented 1 year ago

The logs don't have any warnings (which is weird given all the warnings before, I agree). The attached log file shows the output if you'd like to look at it. trilinos-serial-clang-release.log

bartlettroscoe commented 1 year ago

@ikalash, that file trilinos-serial-clang-release.log would not show any warnings. That is the STDOUT from the ctest -S script.

The build log file you want to look at for warnings should be:

/home/lcm/LCM/trilinos-build-serial-clang-release/Testing/Temporary/LastBuild_20230511-0100.log

That file should be the raw STDERR+STDOUT for the build command (i.e. 'make' or 'ninja'). It is what you would see if you were running those build commands yourself.

ikalash commented 1 year ago

@bartlettroscoe , sorry I misunderstood. The file you ask about is attached. It does have warnings in it. LastBuild_20230512-0100.log

bartlettroscoe commented 1 year ago

@bartlettroscoe , sorry I misunderstood. The file you ask about is attached. It does have warnings in it. LastBuild_20230512-0100.log

@ikalash, the fact this is no longer showing warnings on CDash is not good. That might suggest it may not even show errors.

Can you attach the XML file for the last build that was submitted to CDash?