trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.2k stars 563 forks source link

Adelus_vector_random_MPI_2-4 failing in Trilinos builds starting 2021-06-25 #9742

Closed ZUUL42 closed 1 year ago

ZUUL42 commented 3 years ago

CC: @trilinos/Adelus, @vqd8a (Trilinos Linear Solvers Triage Contact)

## Next Action Status ## Description As shown in [this query](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&begin=2021-06-24&end=2021-11-24&filtercount=6&showfilters=1&filtercombine=and&field1=groupname&compare1=62&value1=Experimental&field2=buildname&compare2=65&value2=Trilinos-atdm-&field3=buildname&compare3=63&value3=rhel7_cuda-10.1&field4=groupname&compare4=63&value4=Primary&field5=testname&compare5=63&value5=Adelus_vector_random_MPI_&field6=status&compare6=62&value6=passed) (click "Shown Matching Output" in upper right) the tests: * `Adelus_vector_random_MPI_2` * `Adelus_vector_random_MPI_3` * `Adelus_vector_random_MPI_4` in the builds: * `Trilinos-atdm-cee-rhel7_cuda-10.1.243_gnu-7.2.0_openmpi-4.0.3_shared_opt` * `Trilinos-atdm-sems-rhel7-cuda-10.1-Volta70-complex-shared-release-debug` started failing on testing day 2021-06-25. * `Adelus_vector_random_MPI_2` gets: ``` mpiexec noticed that process rank 1 with PID 49251 on node ascicgpu14 exited on signal 11 (Segmentation fault). ``` * `Adelus_vector_random_MPI_3` & `Adelus_vector_random_MPI_4` gets: ``` terminate called after throwing an instance of 'std::runtime_error' what(): Kokkos::Cuda::initialize(2) FAILED : Device identifier out of range [0..2] Traceback functionality not available ``` ## Current Status on CDash Run the [above query](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&begin=2021-06-24&end=2021-11-24&filtercount=6&showfilters=1&filtercombine=and&field1=groupname&compare1=62&value1=Experimental&field2=buildname&compare2=65&value2=Trilinos-atdm-&field3=buildname&compare3=63&value3=rhel7_cuda-10.1&field4=groupname&compare4=63&value4=Primary&field5=testname&compare5=63&value5=Adelus_vector_random_MPI_&field6=status&compare6=62&value6=passed) adjusting the "Begin" and "End" dates to match today any other date range or just click "CURRENT" in the top bar to see results for the current testing day. ## Steps to Reproduce One should be able to reproduce this failure as described in: * https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md and the system-specific instructions at: * https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#specific-instructions-for-each-system Just log into any of the associated machines and copy and paste the full CDash build name `` listed above and run commands like: ``` $ cd / $ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh $ cmake \ -GNinja \ -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \ -DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_=ON \ $TRILINOS_DIR $ make NP=16 $ ctest -j4 ``` where `` is any package that you want to enable to reproduce build and/or test results. Again, for exact system-specific details on what commands to run to build and run tests, see: * https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#specific-instructions-for-each-system If you can't figure out what commands to run to reproduce the problem given this documentation, then please post a comment here and we will give you the exact minimal commands.
grover-trilinos commented 3 years ago

Test results for issue #9742 as of 2021-10-03

Tests with issue trackers Failed: twif=5

Detailed test results: (click to expand)

Tests with issue trackers Failed: twif=5

Site Build Name Test Name Status Details Consec­utive Non-pass Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-10.1-Volta70-complex-shared-release-debug Adelus_­vector_­random_­MPI_­2 Failed Completed (Failed) 30 30 0 #9742
cee-rhel7 Trilinos-atdm-cee-rhel7_­cuda-10.1.243_­gnu-7.2.0_­openmpi-4.0.3_­shared_­opt Adelus_­vector_­random_­MPI_­3 Failed Completed (Failed) 29 29 0 #9742
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-10.1-Volta70-complex-shared-release-debug Adelus_­vector_­random_­MPI_­3 Failed Completed (Failed) 30 30 0 #9742
cee-rhel7 Trilinos-atdm-cee-rhel7_­cuda-10.1.243_­gnu-7.2.0_­openmpi-4.0.3_­shared_­opt Adelus_­vector_­random_­MPI_­4 Failed Completed (Failed) 29 29 0 #9742
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-10.1-Volta70-complex-shared-release-debug Adelus_­vector_­random_­MPI_­4 Failed Completed (Failed) 30 30 0 #9742

This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.

grover-trilinos commented 3 years ago

Test results for issue #9742 as of 2021-10-03

Tests with issue trackers Failed: twif=5

Detailed test results: (click to expand)

Tests with issue trackers Failed: twif=5

Site Build Name Test Name Status Details Consec­utive Non-pass Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-10.1-Volta70-complex-shared-release-debug Adelus_­vector_­random_­MPI_­2 Failed Completed (Failed) 30 30 0 #9742
cee-rhel7 Trilinos-atdm-cee-rhel7_­cuda-10.1.243_­gnu-7.2.0_­openmpi-4.0.3_­shared_­opt Adelus_­vector_­random_­MPI_­3 Failed Completed (Failed) 29 29 0 #9742
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-10.1-Volta70-complex-shared-release-debug Adelus_­vector_­random_­MPI_­3 Failed Completed (Failed) 30 30 0 #9742
cee-rhel7 Trilinos-atdm-cee-rhel7_­cuda-10.1.243_­gnu-7.2.0_­openmpi-4.0.3_­shared_­opt Adelus_­vector_­random_­MPI_­4 Failed Completed (Failed) 29 29 0 #9742
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-10.1-Volta70-complex-shared-release-debug Adelus_­vector_­random_­MPI_­4 Failed Completed (Failed) 30 30 0 #9742

This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.

github-actions[bot] commented 2 years ago

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label. If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE. If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

github-actions[bot] commented 1 year ago

This issue was closed due to inactivity for 395 days.