trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.22k stars 568 forks source link

KokkosCore_UnitTest_Cuda_MPI_1 failing in ATDM Trilinos 'waterman', 'ats2'/'vortex', 'ride', 'sems-rhel7' CUDA 'opt' builds starting 2020-02-04 #6799

Closed bartlettroscoe closed 3 years ago

bartlettroscoe commented 4 years ago

CC: @trilinos/kokkos, @kddevin (Trilinos Data Services Product Lead)

## Next Action Status Unit tests `cuda.debug_pin_um_to_host` and `cuda.debug_serial_execution` are fragile and need to be rewritten (see kokkos/kokkos#2506). These two unit tests are disabled in all ATDM Trilinos CUDA PR builds in PR #7407 and has been merged to 'atdm-nightly' in commit 4804b08. Next: Waiting for confirmation on CDash that this test is passing and the unit tests are not running ATDM Trilinos CUDA builds starting testing day 2020-05-21 ... ## Description As shown in [this query](https://testing-dev.sandia.gov/cdash/queryTests.php?project=Trilinos&begin=2020-02-01&end=2020-02-09&filtercount=3&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-waterman&field2=buildname&compare2=66&value2=opt&field3=testname&compare3=61&value3=KokkosCore_UnitTest_Cuda_MPI_1) and [this query](https://testing-dev.sandia.gov/cdash/queryTests.php?project=Trilinos&begin=2020-01-11&end=2020-02-09&filtercount=4&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-waterman&field2=testname&compare2=61&value2=KokkosCore_UnitTest_Cuda_MPI_1&field3=status&compare3=61&value3=failed&field4=details&compare4=61&value4=Completed%20(Failed)) the test: * `KokkosCore_UnitTest_Cuda_MPI_1` in the builds: * `Trilinos-atdm-waterman-cuda-9.2-opt` * `Trilinos-atdm-waterman_cuda-9.2_fpic_static_opt` * `Trilinos-atdm-waterman_cuda-9.2_shared_opt` started failing and timing out on 'waterman' on testing day 2020-02-04, which was the first day after the Kokkos 2.99 update. As shown in [this query](https://testing-dev.sandia.gov/cdash/queryTests.php?project=Trilinos&begin=2020-01-11&end=2020-02-09&filtercount=4&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-waterman&field2=testname&compare2=61&value2=KokkosCore_UnitTest_Cuda_MPI_1&field3=status&compare3=61&value3=failed&field4=testoutput&compare4=97&value4=FAILED.*cuda.debug_pin_um_to_host), when the test does not timeout, it fails the unit test `cuda.debug_pin_um_to_host` showing: ``` [ RUN ] cuda.debug_pin_um_to_host Time CudaSpace: 0.059752 CudaUVMSpace_1: 0.626746 CudaUVMSpace_2: 0.797016 CudaPinnedHostSpace: 5.441882 CudaUVMSpace_Pinned: 1.407892 /home/jenkins/waterman/workspace/Trilinos-atdm-waterman-cuda-9.2-opt/SRC_AND_BUILD/Trilinos/packages/kokkos/core/unit_test/cuda/TestCuda_DebugPinUVMSpace.cpp:127: Failure Value of: passed Actual: false Expected: true [ FAILED ] cuda.debug_pin_um_to_host (9332 ms) ``` (That output gives zero clue why the test failed but at least it gives a line number.) ## Current Status on CDash * [KokkosCore_UnitTest_Cuda_MPI_1 test on 'waterman' current testing day](https://testing-dev.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercount=2&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-waterman&field2=testname&compare2=61&value2=KokkosCore_UnitTest_Cuda_MPI_1) * [KokkosCore_UnitTest_Cuda_MPI_1 tests on 'waterman' last 5 testing days](https://testing-dev.sandia.gov/cdash/queryTests.php?project=Trilinos&begin=5%20days%20ago&end=now&filtercount=2&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-waterman&field2=testname&compare2=61&value2=KokkosCore_UnitTest_Cuda_MPI_1) ## Steps to Reproduce One should be able to reproduce this failure on the machine 'waterman' as described in: * https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md The specific commands given for the system 'waterman' are provided at: * https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#waterman The exact commands to reproduce this failing test, for the build `Trilinos-atdm-waterman-cuda-9.2-opt`, for example, should be: ``` $ cd / $ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh \ Trilinos-atdm-waterman-cuda-9.2-opt $ cmake \ -GNinja \ -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \ -DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_ALL_PACKAGES=ON \ $TRILINOS_DIR $ make NP=20 $ bsub -x -Is -n 20 ctest -j 4 ```
ndellingwood commented 4 years ago

@bartlettroscoe that test cuda.debug_pin_um_to_host makes a comparison of time results to determine a "pass" criteria but is fragile and I think needs to be revisited, can the tests cuda.debug_pin_um_to_host and cuda.debug_serial_execution be disabled in the ATDM builds until a better criterion is put in place for the test? Cross-referencing kokkos/kokkos#2506

bartlettroscoe commented 4 years ago

The test:

failed in the build:

yesterday as shown here showing:

[ RUN      ] cuda.atomics
Loop<N10TestAtomic11SuperScalarILi4EEE>( test = 3 FAILED : { 4950, 9900, 14850, 19800} != { 4854, 9708, 14562, 19416}
/home/jenkins/waterman/workspace/Trilinos-atdm-waterman-cuda-9.2-release-debug/SRC_AND_BUILD/Trilinos/packages/kokkos/core/unit_test/TestAtomic.hpp:560: Failure
Value of: (TestAtomic::Loop<TestAtomic::SuperScalar<4>, TEST_EXECSPACE>(100, 3))
  Actual: false
Expected: true
[  FAILED  ] cuda.atomics (754 ms)

@ndellingwood, is this another timing problem? Should we expect to be seeing more random failures like this?

ndellingwood commented 4 years ago

@bartlettroscoe I don't think that test (cuda.atomics) relies on timing data (I'll take a look to confirm), let me test your build as well as a kokkos-only version and check if there can be anything random about this failure, afterwards I'll file a bug report as necessary.

ndellingwood commented 4 years ago

I did not reproduce failure of that test within a Kokkos VOTD develop branch nor Trilinos VOTD develop branch, and I see nothing in the test depending on unreliable pass/fail criteria that would cause randomness in results.

@crtrott can running multiple tests on a GPU using ctest -j N result in resource contention that might cause the atomics test to fail as posted above https://github.com/trilinos/Trilinos/issues/6799#issuecomment-587464443 ?

ndellingwood commented 4 years ago

I tried something simple to see if I could reproduce, launched a job on Waterman where I ran the test 10000 times in a Kokkos build and in a Trilinos build (using the ATDM environment configuration provided earlier) but saw no occurrences of the failure.

bartlettroscoe commented 4 years ago

@ndellingwood, it may only occur when running it with all of the other tests. I have updated the instructions as such.

bartlettroscoe commented 4 years ago

@ndellingwood, could this happen when multiple kernels are running on the same GPU at the same time?

ndellingwood commented 4 years ago

could this happen when multiple kernels are running on the same GPU at the same time

@bartlettroscoe I'm not certain, not clear to me if this could impact the atomic operations or disrupt something in the test that's pounding the atomics @crtrott any thoughts?

bartlettroscoe commented 4 years ago

FYI: As shown in this query, there are more builds that show the error which include:

This is not a fluke.

Again, the error is in the unit test cuda.debug_pin_um_to_host and looks like:

[ RUN      ] cuda.debug_pin_um_to_host
Time CudaSpace: 0.052340 CudaUVMSpace_1: 0.052490 CudaUVMSpace_2: 0.049860 CudaPinnedHostSpace: 0.096996 CudaUVMSpace_Pinned: 0.052756
/vscratch1/jenkins/vortex-slave/workspace/Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt-exp/SRC_AND_BUILD/Trilinos/packages/kokkos/core/unit_test/cuda/TestCuda_DebugPinUVMSpace.cpp:127: Failure
Value of: passed
  Actual: false
Expected: true
[  FAILED  ] cuda.debug_pin_um_to_host (378 ms)
bartlettroscoe commented 4 years ago

FYI: As shown in this query, we are also seeing failures in the unit test cuda.debug_serial_execution showing:

[ RUN      ] cuda.debug_serial_execution
Time For1: 0.001218 For2: 0.001222 ForSerial: 0.009890
/home/jenkins/ride/workspace/Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug/SRC_AND_BUILD/Trilinos/packages/kokkos/core/unit_test/cuda/TestCuda_DebugSerialExecution.cpp:140: Failure
Value of: passed_par_for
  Actual: false
Expected: true
[  FAILED  ] cuda.debug_serial_execution (48 ms)

with history:

Site Build Name Test Name Status Time Proc Time Details Build Time Processors
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug KokkosCore_UnitTest_Cuda_MPI_1 Failed 1m 28s 520ms 1m 28s 520ms Completed (Failed) 2020-02-23T03:20:45 MST 1
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-debug KokkosCore_UnitTest_Cuda_MPI_1 Failed 4m 35s 770ms 4m 35s 770ms Completed (Failed) 2020-02-21T03:06:35 MST 1
bartlettroscoe commented 4 years ago

FYI: As shown in this query and this query, this test:

is also now failing every testing day starting 2020-03-22 in the build:

showing the errors:

[ RUN      ] cuda.debug_serial_execution
Time Scan1: 0.023175 Scan2: 0.004769 ScanSerial: 0.023451
/home/jenkins/ride/workspace/Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release/SRC_AND_BUILD/Trilinos/packages/kokkos/core/unit_test/cuda/TestCuda_DebugSerialExecution.cpp:192: Failure
Value of: passed_par_scan
  Actual: false
Expected: true
[  FAILED  ] cuda.debug_serial_execution (235 ms)

and

[ RUN      ] cuda.debug_pin_um_to_host
Time CudaSpace: 0.063499 CudaUVMSpace_1: 0.279313 CudaUVMSpace_2: 0.077930 CudaPinnedHostSpace: 1.058533 CudaUVMSpace_Pinned: 0.077453
/home/jenkins/ride/workspace/Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release/SRC_AND_BUILD/Trilinos/packages/kokkos/core/unit_test/cuda/TestCuda_DebugPinUVMSpace.cpp:127: Failure
Value of: passed
  Actual: false
Expected: true
[  FAILED  ] cuda.debug_pin_um_to_host (1637 ms)
bartlettroscoe commented 4 years ago

@bartlettroscoe that test cuda.debug_pin_um_to_host makes a comparison of time results to determine a "pass" criteria but is fragile and I think needs to be revisited, can the tests cuda.debug_pin_um_to_host and cuda.debug_serial_execution be disabled in the ATDM builds until a better criterion is put in place for the test?

@ndellingwood, sorry I missed this comment of yours from before.

Yes, we can disable just those unit tests for just the ATDM Trilinos builds (or just the CUDA builds) as described in:

I think we likely just want to disable these for all ATDM Trilinos CUDA builds? IF that is the case, the instructions for doing that are in:

and use the CMake cache var <full_test_name>_EXTRA_ARGS. See examples of this in:

$ cd Trilinos/

$ find cmake/std/atdm/ -name "*.cmake" -exec grep -nH "_EXTRA_ARGS" {} \;
cmake/std/atdm/ride/tweaks/CUDA-10.0_GNU-7.4.0_DEBUG_CUDA_POWER8_KEPLER37.cmake:5:ATDM_SET_CACHE(KokkosKernels_sparse_serial_MPI_1_EXTRA_ARGS
cmake/std/atdm/ride/tweaks/CUDA-9.2_GNU-7.2.0_DEBUG_CUDA_POWER8_KEPLER37.cmake:5:ATDM_SET_CACHE(KokkosContainers_UnitTest_Serial_MPI_1_EXTRA_ARGS
cmake/std/atdm/ride/tweaks/CUDA-9.2_GNU-7.2.0_DEBUG_CUDA_POWER8_KEPLER37.cmake:8:ATDM_SET_CACHE(KokkosKernels_graph_serial_MPI_1_EXTRA_ARGS
cmake/std/atdm/ride/tweaks/CUDA-9.2_GNU-7.2.0_DEBUG_CUDA_POWER8_KEPLER37.cmake:11:ATDM_SET_CACHE(KokkosKernels_sparse_serial_MPI_1_EXTRA_ARGS
cmake/std/atdm/ride/tweaks/GNU-7.2.0_DEBUG_OPENMP_POWER8.cmake:8:ATDM_SET_CACHE(KokkosContainers_UnitTest_Serial_MPI_1_EXTRA_ARGS
cmake/std/atdm/ride/tweaks/GNU-7.2.0_DEBUG_OPENMP_POWER8.cmake:11:ATDM_SET_CACHE(KokkosContainers_UnitTest_OpenMP_MPI_1_EXTRA_ARGS
cmake/std/atdm/ride/tweaks/GNU-7.2.0_DEBUG_OPENMP_POWER8.cmake:14:ATDM_SET_CACHE(KokkosKernels_graph_serial_MPI_1_EXTRA_ARGS
cmake/std/atdm/ride/tweaks/GNU-7.2.0_DEBUG_OPENMP_POWER8.cmake:17:ATDM_SET_CACHE(KokkosKernels_sparse_openmp_MPI_1_EXTRA_ARGS
cmake/std/atdm/ride/tweaks/GNU-7.2.0_DEBUG_OPENMP_POWER8.cmake:20:ATDM_SET_CACHE(KokkosKernels_sparse_serial_MPI_1_EXTRA_ARGS
cmake/std/atdm/ride/tweaks/GNU-7.4.0_DEBUG_OPENMP_POWER8.cmake:5:ATDM_SET_CACHE(KokkosKernels_sparse_serial_MPI_1_EXTRA_ARGS
cmake/std/atdm/shiller/tweaks/GNU_DEBUG_SERIAL_HSW.cmake:5:ATDM_SET_CACHE(KokkosKernels_sparse_serial_MPI_1_EXTRA_ARGS
cmake/std/atdm/shiller/tweaks/CUDA-9.0_DEBUG_CUDA_KEPLER37.cmake:2:ATDM_SET_CACHE(KokkosKernels_sparse_serial_MPI_1_EXTRA_ARGS
cmake/std/atdm/shiller/tweaks/INTEL_DEBUG_OPENMP_HSW.cmake:2:ATDM_SET_CACHE(KokkosKernels_sparse_serial_MPI_1_EXTRA_ARGS
cmake/std/atdm/shiller/tweaks/INTEL_DEBUG_SERIAL_HSW.cmake:4:ATDM_SET_CACHE(KokkosKernels_sparse_serial_MPI_1_EXTRA_ARGS
cmake/std/atdm/waterman/tweaks/CUDA-9.2_DEBUG_CUDA_POWER9_VOLTA70.cmake:8:ATDM_SET_CACHE(KokkosKernels_sparse_serial_MPI_1_EXTRA_ARGS

But I think you want to put this in the file:

in the if block for CUDA builds and just disable these unit tests in all CUDA builds

bartlettroscoe commented 4 years ago

FYI: As shown here, we also saw this test:

failing today 2020-03-25 in the build:

showing:

[ RUN      ] cuda.debug_pin_um_to_host
Time CudaSpace: 0.035394 CudaUVMSpace_1: 0.080651 CudaUVMSpace_2: 0.100436 CudaPinnedHostSpace: 0.555502 CudaUVMSpace_Pinned: 0.094931
/scratch/jenkins/ascicgpu14/workspace/Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug/SRC_AND_BUILD/Trilinos/packages/kokkos/core/unit_test/cuda/TestCuda_DebugPinUVMSpace.cpp:127: Failure
Value of: passed
  Actual: false
Expected: true
[  FAILED  ] cuda.debug_pin_um_to_host (999 ms)

That suggests that these unit tests should be disabled in all of that ATDM Trilinos CUDA builds for the time being.

@ndellingwood, do you just want me to do this and create the PR and have you review it?

ndellingwood commented 4 years ago

do you just want me to do this and create the PR and have you review it

@bartlettroscoe that would be great thanks, I pinged the corresponding kokkos issue as well with your comment.

bartlettroscoe commented 4 years ago

@ndellingwood, okay, I assigned this issue to myself for now and will work to get these unit tests disabled.

bartlettroscoe commented 4 years ago

I had forgotten that I had already created this issue. This failing tests has also been taking down PR testing iterations as shown in https://github.com/trilinos/Trilinos/issues/3276#issuecomment-631749385.

I will disable these individual unit tests in all ATDM Trilinos CUDA builds and in the Trilinos PR CUDA build.

bartlettroscoe commented 4 years ago

These two unit tests are disabled in all ATDM Trilinos CUDA PR builds in PR #7407 and has been merged to 'atdm-nightly' in commit 4804b08.

Putting in review.

grover-trilinos commented 4 years ago

Test results for issue #6799 as of 2020-08-16

Tests with issue trackers Passed: twip=3

Detailed test results: (click to expand)

Tests with issue trackers Passed: twip=3

Site Build Name Test Name Status Details Consec­utive Pass Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-debug KokkosCore_­UnitTest_­Cuda_­MPI_­1 Passed Completed 20 0 20 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug KokkosCore_­UnitTest_­Cuda_­MPI_­1 Passed Completed 20 0 20 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release KokkosCore_­UnitTest_­Cuda_­MPI_­1 Passed Completed 20 0 20 #6799

This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.

grover-trilinos commented 4 years ago

Test results for issue #6799 as of 2020-08-23

Tests with issue trackers Passed: twip=5

Detailed test results: (click to expand)

Tests with issue trackers Passed: twip=5

Site Build Name Test Name Status Details Consec­utive Pass Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
vortex Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_­static_­opt KokkosCore_­UnitTest_­Cuda_­MPI_­1 Passed Completed 4 1 17 #6799
vortex Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_­static_­opt_­cuda-aware-mpi KokkosCore_­UnitTest_­Cuda_­MPI_­1 Passed Completed 5 1 18 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-debug KokkosCore_­UnitTest_­Cuda_­MPI_­1 Passed Completed 26 0 26 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug KokkosCore_­UnitTest_­Cuda_­MPI_­1 Passed Completed 26 0 26 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release KokkosCore_­UnitTest_­Cuda_­MPI_­1 Passed Completed 26 0 26 #6799

This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.

grover-trilinos commented 4 years ago

Test results for issue #6799 as of 2020-08-30

Tests with issue trackers Missing: twim=4

Detailed test results: (click to expand)

Tests with issue trackers Missing: twim=4

Site Build Name Test Name Status Details Consec­utive Missing Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
vortex Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_­static_­opt_­cuda-aware-mpi KokkosCore_­UnitTest_­Cuda_­MPI_­1 Missing Missing 7 1 14 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-debug KokkosCore_­UnitTest_­Cuda_­MPI_­1 Missing Missing 5 0 24 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug KokkosCore_­UnitTest_­Cuda_­MPI_­1 Missing Missing 5 0 24 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release KokkosCore_­UnitTest_­Cuda_­MPI_­1 Missing Missing 5 0 24 #6799

This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.

grover-trilinos commented 4 years ago

Test results for issue #6799 as of 2020-09-06

Tests with issue trackers Missing: twim=5

Detailed test results: (click to expand)

Tests with issue trackers Missing: twim=5

Site Build Name Test Name Status Details Consec­utive Missing Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
vortex Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_­static_­opt KokkosCore_­UnitTest_­Cuda_­MPI_­1 Missing Missing 12 1 7 #6799
vortex Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_­static_­opt_­cuda-aware-mpi KokkosCore_­UnitTest_­Cuda_­MPI_­1 Missing Missing 14 1 7 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-debug KokkosCore_­UnitTest_­Cuda_­MPI_­1 Missing Missing 12 0 17 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug KokkosCore_­UnitTest_­Cuda_­MPI_­1 Missing Missing 12 0 17 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release KokkosCore_­UnitTest_­Cuda_­MPI_­1 Missing Missing 12 0 17 #6799

This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.

grover-trilinos commented 4 years ago

Test results for issue #6799 as of 2020-09-13

Tests with issue trackers Missing: twim=3

Detailed test results: (click to expand)

Tests with issue trackers Missing: twim=3

Site Build Name Test Name Status Details Consec­utive Missing Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-debug KokkosCore_­UnitTest_­Cuda_­MPI_­1 Missing Missing 19 0 10 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug KokkosCore_­UnitTest_­Cuda_­MPI_­1 Missing Missing 19 0 10 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release KokkosCore_­UnitTest_­Cuda_­MPI_­1 Missing Missing 19 0 10 #6799

This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.

grover-trilinos commented 4 years ago

Test results for issue #6799 as of 2020-09-20

Tests with issue trackers Missing: twim=5

Detailed test results: (click to expand)

Tests with issue trackers Missing: twim=5

Site Build Name Test Name Status Details Consec­utive Missing Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
vortex Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_­static_­opt KokkosCore_­UnitTest_­Cuda_­MPI_­1 Missing Missing 26 0 3 #6799
vortex Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_­static_­opt_­cuda-aware-mpi KokkosCore_­UnitTest_­Cuda_­MPI_­1 Missing Missing 28 0 1 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-debug KokkosCore_­UnitTest_­Cuda_­MPI_­1 Missing Missing 26 0 4 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug KokkosCore_­UnitTest_­Cuda_­MPI_­1 Missing Missing 26 0 4 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release KokkosCore_­UnitTest_­Cuda_­MPI_­1 Missing Missing 26 0 4 #6799

This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.

crtrott commented 4 years ago

Just as a comment: these tests are doing critical testing of whether the fencing behavior of Kokkos is correct, something tpetra has been complaining a lot aboud. The only way to check whether fencing behavior is correct without some external tools running is timing.

bartlettroscoe commented 4 years ago

@crtrott, if you compare the Grover comments from 4 weeks ago to [3 weeks ago](), you can see these tests went from passing to missing. I did not do anything so it may be worth someone looking into why that happened. (If Kokkos developers can't figure that I can help with that. Should be able to determine this from just looking at results on CDsah.) Also note that we are getting a lot of missing test results from ATS-2 machine 'vortex' lately (see ATDV-396).

crtrott commented 4 years ago

Does missing just mean fail? 4 weeks ago Kokkos 3.2 came in and we fixed more fencing issues (i.e. over fencing) largely reported by Tpetra. These tests are now testing that we don't screw up the Tpetra use cases again.

bartlettroscoe commented 4 years ago

Does missing just mean fail?

@crtrott, no, it means that build X did not report any test results for that testing day. For example, what the Grover report today says is that those 5 tests were not included in the test results posted to CDash for those builds. Make sense?

Did those tests change their names in any way with the Kokkos 3.2 upgrade? If so, then that would result in them being reported as missing.

crtrott commented 4 years ago

actually they might have. We split executables into multiple ones recently, I bet its now CUDA_1_MPI_1 or so (or maybe CUDA_2) not sure where the specific tests we were looking for went. But basically we split that up.

bartlettroscoe commented 4 years ago

actually they might have. We split executables into multiple ones recently, I bet its now CUDA_1_MPI_1 or so (or maybe CUDA_2) not sure where the specific tests we were looking for went. But basically we split that up.

@trilinos/kokkos, @crtrott,

As shown in this query it looks like the unit test KokkosCore_UnitTest_Cuda_MPI_1 was split into 3 unit tests:

starting on testing day 2020-08-25.

As shown in this query and this query (click "Show Matching Output"), it looks like the unit tests cuda.debug_pin_um_to_host and cuda.debug_serial_execution in the test KokkosCore_UnitTest_Cuda3_MPI_1 are is still randomly failing in the builds:

with recent history:

Site Build Name Test Name Status Time Proc Time Details Build Time Processors
cee-rhel6 Trilinos-atdm-cee-rhel6_cuda-10.1.243_gcc-7.2.0_openmpi-4.0.3_shared_dbg KokkosCore_UnitTest_Cuda3_MPI_1 Failed 5s 330ms 5s 330ms Completed (Failed) 2020-09-17T23:57:12 MDT 1
cee-rhel6 Trilinos-atdm-cee-rhel6_cuda-10.1.243_gcc-7.2.0_openmpi-4.0.3_shared_dbg KokkosCore_UnitTest_Cuda3_MPI_1 Failed 4s 640ms 4s 640ms Completed (Failed) 2020-09-13T23:53:51 MDT 1
cee-rhel6 Trilinos-atdm-cee-rhel6_cuda-10.1.243_gcc-7.2.0_openmpi-4.0.3_shared_dbg KokkosCore_UnitTest_Cuda3_MPI_1 Failed 4s 900ms 4s 900ms Completed (Failed) 2020-09-10T23:53:47 MDT 1
cee-rhel6 Trilinos-atdm-cee-rhel6_cuda-10.1.243_gcc-7.2.0_openmpi-4.0.3_shared_dbg KokkosCore_UnitTest_Cuda3_MPI_1 Failed 4s 960ms 4s 960ms Completed (Failed) 2020-09-05T23:53:57 MDT 1
cee-rhel6 Trilinos-atdm-cee-rhel6_cuda-10.1.243_gcc-7.2.0_openmpi-4.0.3_shared_opt KokkosCore_UnitTest_Cuda3_MPI_1 Failed 3s 980ms 3s 980ms Completed (Failed) 2020-09-20T22:45:10 MDT 1
cee-rhel6 Trilinos-atdm-cee-rhel6_cuda-10.1.243_gcc-7.2.0_openmpi-4.0.3_shared_opt KokkosCore_UnitTest_Cuda3_MPI_1 Failed 3s 710ms 3s 710ms Completed (Failed) 2020-09-17T22:45:15 MDT 1
cee-rhel6 Trilinos-atdm-cee-rhel6_cuda-10.1.243_gcc-7.2.0_openmpi-4.0.3_shared_opt KokkosCore_UnitTest_Cuda3_MPI_1 Failed 4s 570ms 4s 570ms Completed (Failed) 2020-09-12T22:45:11 MDT 1
cee-rhel6 Trilinos-atdm-cee-rhel6_cuda-10.1.243_gcc-7.2.0_openmpi-4.0.3_shared_opt KokkosCore_UnitTest_Cuda3_MPI_1 Failed 3s 910ms 3s 910ms Completed (Failed) 2020-09-03T22:45:13 MDT 1
cee-rhel6 Trilinos-atdm-cee-rhel6_mini_cuda-10.1.243_gcc-7.2.0_openmpi-4.0.3_static_dbg KokkosCore_UnitTest_Cuda3_MPI_1 Failed 5s 330ms 5s 330ms Completed (Failed) 2020-09-18T01:34:42 MDT 1
cee-rhel6 Trilinos-atdm-cee-rhel6_mini_cuda-10.1.243_gcc-7.2.0_openmpi-4.0.3_static_dbg KokkosCore_UnitTest_Cuda3_MPI_1 Failed 5s 510ms 5s 510ms Completed (Failed) 2020-09-09T06:39:15 MDT 1
cee-rhel6 Trilinos-atdm-cee-rhel6_mini_cuda-10.1.243_gcc-7.2.0_openmpi-4.0.3_static_dbg KokkosCore_UnitTest_Cuda3_MPI_1 Failed 5s 270ms 5s 270ms Completed (Failed) 2020-08-30T01:20:20 MDT 1
cee-rhel6 Trilinos-atdm-cee-rhel6_mini_cuda-10.1.243_gcc-7.2.0_openmpi-4.0.3_static_opt KokkosCore_UnitTest_Cuda3_MPI_1 Failed 6s 930ms 6s 930ms Completed (Failed) 2020-09-07T01:18:16 MDT 1
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug KokkosCore_UnitTest_Cuda3_MPI_1 Failed 1s 410ms 1s 410ms Completed (Failed) 2020-09-12T04:33:13 MDT 1
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release KokkosCore_UnitTest_Cuda3_MPI_1 Failed 3s 770ms 3s 770ms Completed (Failed) 2020-09-16T05:42:20 MDT 1
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug KokkosCore_UnitTest_Cuda3_MPI_1 Failed 5s 230ms 5s 230ms Completed (Failed) 2020-09-14T04:16:45 MDT 1

showing errors like:

[ RUN      ] cuda.debug_pin_um_to_host
Time CudaSpace: 0.048639 CudaUVMSpace_1: 0.243198 CudaUVMSpace_2: 0.056822 CudaPinnedHostSpace: 0.874996 CudaUVMSpace_Pinned: 0.248510
/scratch/atdm-devops-admin/atdm-trilinos-nightly-builds/Trilinos-atdm-cee-rhel6_cuda-10.1.243_gcc-7.2.0_openmpi-4.0.3_shared_dbg/SRC_AND_BUILD/Trilinos/packages/kokkos/core/unit_test/cuda/TestCuda_DebugPinUVMSpace.cpp:128: Failure
Value of: passed
  Actual: false
Expected: true
[  FAILED  ] cuda.debug_pin_um_to_host (1615 ms)

and

[ RUN      ] cuda.debug_serial_execution
Time For1: 0.000495 For2: 0.000213 ForSerial: 0.000473
/scratch/atdm-devops-admin/atdm-trilinos-nightly-builds/Trilinos-atdm-cee-rhel6_cuda-10.1.243_gcc-7.2.0_openmpi-4.0.3_shared_dbg/SRC_AND_BUILD/Trilinos/packages/kokkos/core/unit_test/cuda/TestCuda_DebugSerialExecution.cpp:141: Failure
Value of: passed_par_for
  Actual: false
Expected: true
[  FAILED  ] cuda.debug_serial_execution (6 ms)

I will update the CSV file entries in the ATDM Trilinos Status repo used by Grover to associate these tests with this issue.

bartlettroscoe commented 4 years ago

FYI: The update of the tracking of these randomly failing unit tests is shown in the commit:

bartlettroscoe commented 4 years ago

@trilinos/kokkos, @crtrott, @ndellingwood,

Is pass/fail for these unit tests still based on timings? It looks like it might be (but hard to tell from the unit test output).

crtrott commented 4 years ago

Yes it should be. Its testing for asynchronicity (i.e. that there aren't fences there shouldn't be) for certain api stuff

bartlettroscoe commented 4 years ago

Yes it should be. Its testing for asynchronicity (i.e. that there aren't fences there shouldn't be) for certain api stuff

@crtrott, is there any way to make these tests more robust? Not good to have randomly failing tests.

crtrott commented 4 years ago

We can take another look, but this is as robust as I could make them. The problem is that if something else is hogging the GPU and the test takes a bit to launch all the timing is gonna be off by an arbitrary large amount - i.e. there is no timing based criteria that ever could pass reliably. These tests already are just comparing two timings which are collected during the test, stuff like with fence vs without fence with expected differences usually being large (>4x) and the criteria being somewhere around 2x. Similar for the tests deducing the correct memory spaces. I mean the differences there should be also >5x and our criteria for passing are much smaller than that. I think these tests just have to run on their own and if you can't do that, then they need to be disabled and filtered out. But in general they test for semantics not for a specific performance. They are definitely not designed to catch small performance issues, they are only designed to test whether we do something fundamentally unexpected.

bartlettroscoe commented 4 years ago

We can take another look, but this is as robust as I could make them. ... They are definitely not designed to catch small performance issues, they are only designed to test whether we do something fundamentally unexpected.

@crtrott, but any times on a loaded GPU can vary widely depending on what else is running at the same time. Can we aggregate the unit tests based on timings into their own separate unit test executable and the mark those tests with RUN_SERIAL so that they always run by themselves with ctest? Tests that don't depend on timing can be run at the same time of the total wall-clock time goes down.

Actually, as shown in this query, this test KokkosCore_UnitTest_Cuda3_MPI_1 finishes in less than 6 seconds. Therefore, can we just pass in RUN_SERIAL to the TRIBITS_ADD_TEST() command?

crtrott commented 4 years ago

Sure we can do that. @ndellingwood Nathan can you move all the relevant tests which use this timing stuff to Cuda3?

bartlettroscoe commented 4 years ago

move all the relevant tests which use this timing stuff to Cuda3?

@crtrott and @ndellingwood, just a suggestion, but you might want to call that test something like KokkosCore_UnitTest_CudaTimingBased to make it clear these are based on timing things?

ndellingwood commented 4 years ago

Nathan can you move all the relevant tests which use this timing stuff to Cuda3?

@crtrott yes, I can group the tests together, and if you like I'll place them in a new test with a descriptive name like @bartlettroscoe mentioned

ndellingwood commented 4 years ago

I opened kokkos/kokkos#3405 and self-assigned with the request to group these time-based tests into a common executable

grover-trilinos commented 4 years ago

Test results for issue #6799 as of 2020-09-27

Tests with issue trackers Passed: twip=7

Detailed test results: (click to expand)

Tests with issue trackers Passed: twip=7

Site Build Name Test Name Status Details Consec­utive Pass Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 9 4 26 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 4 5 25 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 9 3 27 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 3 2 28 #6799
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 12 1 23 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 12 1 28 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 11 1 29 #6799

This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.

grover-trilinos commented 4 years ago

Test results for issue #6799 as of 2020-10-04

Tests with issue trackers Passed: twip=6

Detailed test results: (click to expand)

Tests with issue trackers Passed: twip=6

Site Build Name Test Name Status Details Consec­utive Pass Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 16 4 26 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 2 5 25 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 5 4 26 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 1 4 26 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 19 1 28 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 18 1 29 #6799

This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.

grover-trilinos commented 4 years ago

Test results for issue #6799 as of 2020-10-11

Tests with issue trackers Passed: twip=6
Tests with issue trackers Failed: twif=1

Detailed test results: (click to expand)

Tests with issue trackers Passed: twip=6

Site Build Name Test Name Status Details Consec­utive Pass Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 4 3 27 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 9 5 25 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 8 3 27 #6799
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 5 2 23 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 26 1 28 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 25 1 29 #6799

Tests with issue trackers Failed: twif=1

Site Build Name Test Name Status Details Consec­utive Non-pass Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Failed Completed (Failed) 1 5 25 #6799

This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.

grover-trilinos commented 4 years ago

Test results for issue #6799 as of 2020-10-18

Tests with issue trackers Passed: twip=7

Detailed test results: (click to expand)

Tests with issue trackers Passed: twip=7

Site Build Name Test Name Status Details Consec­utive Pass Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 11 1 29 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 2 4 26 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 7 4 26 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 15 3 27 #6799
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 12 1 26 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 30 0 30 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 30 0 30 #6799

This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.

grover-trilinos commented 4 years ago

Test results for issue #6799 as of 2020-10-25

Tests with issue trackers Passed: twip=6
Tests with issue trackers Failed: twif=1

Detailed test results: (click to expand)

Tests with issue trackers Passed: twip=6

Site Build Name Test Name Status Details Consec­utive Pass Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 9 2 28 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 14 4 26 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 22 2 28 #6799
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 6 2 26 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 28 0 28 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 28 0 28 #6799

Tests with issue trackers Failed: twif=1

Site Build Name Test Name Status Details Consec­utive Non-pass Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Failed Completed (Failed) 1 2 28 #6799

This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.

grover-trilinos commented 4 years ago

Test results for issue #6799 as of 2020-11-01

Tests with issue trackers Passed: twip=6
Tests with issue trackers Missing: twim=1

Detailed test results: (click to expand)

Tests with issue trackers Passed: twip=6

Site Build Name Test Name Status Details Consec­utive Pass Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 7 2 28 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 4 3 27 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 3 2 28 #6799
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 13 2 26 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 28 0 28 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 28 0 28 #6799

Tests with issue trackers Missing: twim=1

Site Build Name Test Name Status Details Consec­utive Missing Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Missing Missing -1 1 30 #6799

This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.

grover-trilinos commented 4 years ago

Test results for issue #6799 as of 2020-11-08

Tests with issue trackers Passed: twip=7

Detailed test results: (click to expand)

Tests with issue trackers Passed: twip=7

Site Build Name Test Name Status Details Consec­utive Pass Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 14 1 29 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 23 1 29 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 2 3 27 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 10 1 29 #6799
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 5 2 28 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 25 0 25 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 27 0 27 #6799

This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.

grover-trilinos commented 4 years ago

Test results for issue #6799 as of 2020-11-15

Tests with issue trackers Passed: twip=7

Detailed test results: (click to expand)

Tests with issue trackers Passed: twip=7

Site Build Name Test Name Status Details Consec­utive Pass Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 21 1 29 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 5 1 29 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 9 2 28 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 1 3 27 #6799
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 12 2 28 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 2 3 22 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 3 2 25 #6799

This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.

grover-trilinos commented 3 years ago

Test results for issue #6799 as of 2020-11-22

Tests with issue trackers Passed: twip=7

Detailed test results: (click to expand)

Tests with issue trackers Passed: twip=7

Site Build Name Test Name Status Details Consec­utive Pass Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 28 1 29 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 1 4 26 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 6 3 27 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 8 3 27 #6799
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 19 1 29 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 8 3 23 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 9 2 26 #6799

This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.

grover-trilinos commented 3 years ago

Test results for issue #6799 as of 2020-11-29

Tests with issue trackers Passed: twip=7

Detailed test results: (click to expand)

Tests with issue trackers Passed: twip=7

Site Build Name Test Name Status Details Consec­utive Pass Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 30 0 30 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 8 4 26 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 13 2 28 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 15 2 28 #6799
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 26 1 29 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 15 3 23 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 16 2 26 #6799

This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.

grover-trilinos commented 3 years ago

Test results for issue #6799 as of 2020-12-06

Tests with issue trackers Passed: twip=7

Detailed test results: (click to expand)

Tests with issue trackers Passed: twip=7

Site Build Name Test Name Status Details Consec­utive Pass Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 30 0 30 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­shared_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 15 4 26 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­dbg KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 20 1 29 #6799
cee-rhel6 Trilinos-atdm-cee-rhel6_­mini_­cuda-10.1.243_­gcc-7.2.0_­openmpi-4.0.3_­static_­opt KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 22 2 28 #6799
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 30 0 30 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 22 3 25 #6799
ride Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release KokkosCore_­UnitTest_­Cuda3_­MPI_­1 Passed Completed 23 2 27 #6799

This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.