Closed bartlettroscoe closed 6 years ago
I think with the timing out KokkosKernels test being addressed in #2439, I think this failing test is the last failing test blocking the promotion of the build Trilinos-atdm-white-ride-cuda-debug
to the "ATDM" CDash Track/Group. This is especially important because this build is being targeted for an auto PR testing build for Trilinos as described in #2464.
Therefore, I am going to go ahead and disable this test for these builds. Then someone with the interest can try to see why these tests are segfaulting. But given the problems we are seeing on this platform like described in #1208, that my not be worth it. And beside, this Power8 platform is just a stepping stone to the Power9 platform target for the ATS-2 machine Sierra so no reason to kill ourselves with this.
@bartlettroscoe Do we have a list somewhere of "tests that we disabled because they are blocking CUDA builds"? I'm just a bit worried that we might lose track of what's failing.
A good addition to tribits would be a cmake function that disables tests but allows you to query for all disabled tests at configure time.
TRIBITS_ADD_TEST( ... DISABLED white,opt)
Then at configure time: -D <project_name|package_name>_SHOW_DISABLED_TESTS
This way we could very quickly get a sense of what works and what doesn't without having to dig through tickets.
Do we have a list somewhere of "tests that we disabled because they are blocking CUDA builds"? I'm just a bit worried that we might lose track of what's failing.
@mhoemmen, yes. Short-term you can just grep the tweaks files:
$ find cmake/std/atdm/ -name "*.cmake" -exec grep -nH "DISABLE" {} \; | grep -i cuda
cmake/std/atdm/ride/tweaks/CUDA-DEBUG-CUDA.cmake:4:ATDM_SET_ENABLE(TeuchosNumerics_LAPACK_test_MPI_1_DISABLE ON)
cmake/std/atdm/ride/tweaks/CUDA-DEBUG-CUDA.cmake:7:ATDM_SET_ENABLE(Belos_Tpetra_PseudoBlockCG_hb_test_MPI_4_DISABLE ON)
cmake/std/atdm/ride/tweaks/CUDA-RELEASE-CUDA.cmake:4:ATDM_SET_ENABLE(PanzerAdaptersSTK_main_driver_energy-ss-loca-eigenvalue_DISABLE ON)
cmake/std/atdm/ride/tweaks/CUDA_COMMON_TWEAKS.cmake:2:ATDM_SET_ENABLE(PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-3_DISABLE ON)
cmake/std/atdm/shiller/tweaks/CUDA_COMMON_TWEAKS.cmake:2:ATDM_SET_ENABLE(PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-3_DISABLE ON)
cmake/std/atdm/shiller/tweaks/CUDA_COMMON_TWEAKS.cmake:5:ATDM_SET_ENABLE(Anasazi_Epetra_BlockDavidson_auxtest_MPI_4_DISABLE ON)
cmake/std/atdm/shiller/tweaks/CUDA_COMMON_TWEAKS.cmake:8:ATDM_SET_ENABLE(Anasazi_Epetra_LOBPCG_auxtest_MPI_4_DISABLE ON)
(see explanation of this setup in https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#directory-structure-and-contents).
And if you look at the comments above these SET()
statements, they list the GitHub issue IDs for each of these so it is easy to trace back to why they were disabled.
After we upgrade CMake and CDash, then these disabled tests will show up on CDash as "Not Run" tests with the Details field "Disabled" (but those tests will not trigger CDash error emails) and you will be able to query for "Disabled" tests to see them all. But that requires CMake 3.10+ and the upgraded CDash that we are evaluating in https://gitlab.kitware.com/snl/project-1/issues/33.
A good addition to tribits would be a cmake function that disables tests but allows you to query for all disabled tests at configure time.
@rppawlo, that basically already exists. You just set Trilinos_TRACE_ADD_TEST=ON
and then grep for "NOT added". For example, for the cuda-debug
configure of Belos this morning on 'white' on CDash at:
you can see:
-- Belos_Tpetra_PseudoBlockCG_hb_test_MPI_4: NOT added test because Belos_Tpetra_PseudoBlockCG_hb_test_MPI_4_DISABLE='ON'!
But if you only wanted to see disabled tests, we could add support for <project_name|package_name>_SHOW_DISABLED_TESTS=ON
if that would be desired.
On the topic of disabled tests and GitHub issues, an idea that occurred to me would be that instead of closing GitHub issues that resolved an issue by just disabling tests, we could instead add a labels called something like Disabled Tests
and Stalled
and then leave the issues open and then filter them out using -label:"Disabled Tests"
in most views or specifically search for them using label:"Disabled Tests"
. That way, disabled tests could be searched for statically and in configure ouptut as I showed above and on CDash (after a CMake and CDash upgrade) and also in GitHub.
What do people think about that idea?
@bartlettroscoe I LIKE THAT IDEA
I think @csiefer2 agrees :D
If people choose to remove the test, that's a different thing -- it's like closing the issue with "wontfix".
CC: @trilinos/framework
I added the labels "Disabled Tests" and "Stalled" and applied them to, for example, #2474. See the updated documentation on this at:
After this test was disabled from these builds in the commit a68547f, from looking at this query, there is no sign of this test failing in any of the promoted "ATDM" CDash Group ATDM Trilinos builds recently (at least in the last month since 5/7/2018).
Therefore, I think we can close this issue.
CC: @trilinos/belos
Next Action Status
Since test was disabled in commit a68547f, no recent signs of this test failure.
Description
As shown at:
the test
Belos_Tpetra_PseudoBlockCG_hb_test_MPI_4
fails in the builds:Trilinos-atdm-white-ride-cuda-debug
Trilinos-atdm-white-ride-gnu-debug-openmp
run on
white
andride
and passes in every other build of Trilinos, including, ironically, theopt
builds onwhite
andride
which otherwise show a lot of failing Belos tests as described in #2454. This failing test for thecuda-debug
build shows a setfault:and for the
gnu-debug-openmp
build shows:Related Issues: