CC: @srajama1 (Trilinos Linear Solvers Product Lead), @e10harvey
??: Add label "client: ATDM">
??: Add label "ATDM Sev: Blocker" (by default but could be other "ATDM Sev: XXX")>
??: Add label "type: bug"?>
??: Add label "impacting: tests"?>
??: Add label for affected packages (e.g. "pkg: MueLu", "pkg: Tpetra", "pkg: Kokkos", etc.)>
??: Add label "PA: ???Project Area???" (e.g. "PA: Linear Solvers", "PA: Data Services")>
??: Add milestone "Initial cleanup of new ATDM ..." or "Keep promoted ATDM ...">
??: Once GitHub Issue is created, add entries for tests to TrilinosATDMStatus/*.csv files>
## Next Action Status
## Description
As shown in [this query](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&begin=2020-01-01&end=2020-12-14&filtercount=10&showfilters=1&filtercombine=and&field1=groupname&compare1=62&value1=Experimental&field2=buildname&compare2=65&value2=Trilinos-atdm-&field3=testname&compare3=65&value3=Adelus_&field4=status&compare4=62&value4=passed&field5=testoutput&compare5=94&value5=Error%20initializing%20RM%20connection.%20Exiting&field6=testoutput&compare6=96&value6=srun%3A%20error%3A%20s_p_parse_file%3A%20unable%20to%20read%20.%2Fetc%2Fslurm%2Fslurm.conf.%3A%20Permission%20denied&field7=testoutput&compare7=96&value7=cudaGetDeviceCount.*cudaErrorUnknown.*unknown%20error.*Kokkos_Cuda_Instance.cpp&field8=testoutput&compare8=96&value8=cudaMallocManaged.*cudaErrorUnknown.*unknown%20error.*Sacado_DynamicArrayTraits.hpp&field9=testoutput&compare9=96&value9=srun%3A%20error.*launch%20failed%3A%20Error%20configuring%20interconnect&field10=testoutput&compare10=97&value10=(Segmentation%20fault%7CSignal%3A%20Aborted)) the tests:
* `Adelus_vector_random_MPI_1`
* `Adelus_vector_random_MPI_2`
* `Adelus_vector_random_MPI_3`
* `Adelus_vector_random_MPI_4`
in the builds:
* `Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt`
* `Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi`
* `Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_static_dbg`
* `Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_static_dbg_cuda-aware-mpi`
* `Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_static_opt`
* `Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_static_opt_cuda-aware-mpi`
* `Trilinos-atdm-cee-rhel6_cuda-10.1.243_gcc-7.2.0_openmpi-4.0.3_shared_dbg`
* `Trilinos-atdm-cee-rhel6_cuda-10.1.243_gcc-7.2.0_openmpi-4.0.3_shared_opt`
* `Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug`
* `Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-debug`
* `Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-rdc-release-debug`
* `Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release`
* `Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug`
started failing on testing day 2020-08-18.
These look to all be failing with segfaults like:
```
terminate called after throwing an instance of 'std::runtime_error'
what(): Kokkos::Cuda::initialize(2) FAILED : Device identifier out of range [0..2]
Traceback functionality not available
[ascicgpu16:01785] *** Process received signal ***
[ascicgpu16:01785] Signal: Aborted (6)
[ascicgpu16:01785] Signal code: (-6)
```
and
```
[vortex2:128798] *** Process received signal ***
[vortex2:128798] Signal: Segmentation fault (11)
[vortex2:128798] Signal code: Address not mapped (1)
[vortex2:128798] Failing at address: 0x7fff13761680
```
These tests were failing when the Adelus package was first added to ATDM Trilinos testing.
## Current Status on CDash
Run the [above query](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&begin=2020-01-01&end=2020-12-14&filtercount=10&showfilters=1&filtercombine=and&field1=groupname&compare1=62&value1=Experimental&field2=buildname&compare2=65&value2=Trilinos-atdm-&field3=testname&compare3=65&value3=Adelus_&field4=status&compare4=62&value4=passed&field5=testoutput&compare5=94&value5=Error%20initializing%20RM%20connection.%20Exiting&field6=testoutput&compare6=96&value6=srun%3A%20error%3A%20s_p_parse_file%3A%20unable%20to%20read%20.%2Fetc%2Fslurm%2Fslurm.conf.%3A%20Permission%20denied&field7=testoutput&compare7=96&value7=cudaGetDeviceCount.*cudaErrorUnknown.*unknown%20error.*Kokkos_Cuda_Instance.cpp&field8=testoutput&compare8=96&value8=cudaMallocManaged.*cudaErrorUnknown.*unknown%20error.*Sacado_DynamicArrayTraits.hpp&field9=testoutput&compare9=96&value9=srun%3A%20error.*launch%20failed%3A%20Error%20configuring%20interconnect&field10=testoutput&compare10=97&value10=(Segmentation%20fault%7CSignal%3A%20Aborted)) adjusting the "Begin" and "End" dates to match today any other data range.
## Steps to Reproduce
One should be able to reproduce this failure as described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
and the system-specific instructions at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#specific-instructions-for-each-system
Just log into any of the associated machines and copy and paste the full CDash build name `` listed above and run commands like:
```
$ cd /
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_=ON \
$TRILINOS_DIR
$ make NP=16
$ ctest -j4
```
where `` is any package that you want to enabled to reproduce build and/or test results.
Again, for exact system-specific details on what commands to run to build and run tests, see:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#specific-instructions-for-each-system
And if you can't figure out what commands to run to produce the issue given the above-referenced documentation, please post a comment here and we will give you the exact minimal commands to reproduce the failures.
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
@trilinos/framework, as per https://github.com/trilinos/Trilinos/pull/8858#issuecomment-802282646, someone should just disable these failing tests on all platforms where they are failing. Apparently, these tests are not designed to run with the general test harness on these systems (which makes you wonder why they are even being enabled in these cases).
But as shown in this query they do actually pass for the 'ride' builds somehow. (That would explain why these pass in Trilinos PR testing which runs CUDA builds on 'ride'.)
Would take just a couple of minutes to add disables for these tests on the various platforms. (In fact, they should be disabled for all cuda builds by default given that they even fail for x86 processors with v100s as shown for the 'sems-rhel7' and 'cee-rhel7' cuda builds.)
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
@trilinos/adelus @srajama1 @vqd8a It appears that some of these are consistently passing or failing. Can you take a look at it and may be close some? Thanks!
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
@vqd8a It looks like many of these are passing, but the rest are "missing". I assume that means it has not run. Should we close this and let Grover start a new issue with the missing tests?
Closing this issue as it appears many test are passing and the remaining are now "missing" (not sure what that means ). Will pick up new issue on the remaining tests.
CC: @srajama1 (Trilinos Linear Solvers Product Lead), @e10harvey