Closed bartlettroscoe closed 3 years ago
@tjfulle Do you have time to look at this issue? I'm guessing something bad happens because of complex scalars.
@kddevin I'll take a look at this
@bartlettroscoe the following script, with TRILINOS_DIR
set to a clean clone, results in an infinite CMake configuration cycle:
#!/bin/sh
rm -rf CMake*
build="Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi"
source $TRILINOS_DIR/cmake/std/atdm/load-env.sh $build
if [ $? -ne 0 ]; then
echo "Failed to source Trilinos configuration"
exit 1
fi
cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON \
-DTrilinos_ENABLE_Tpetra=ON \
$TRILINOS_DIR
make NP=16
Any ideas why?
@tjfulle, this is on 'vortex', yes? What is $TRILINOS_DIR
? Where is your build dir in relation to the source dir $TRILINOS_DIR
?
Yes on vortex. My checkout looks like:
/home/tjfulle/.../trilinos/
build/
atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi -> /vscratch1/tjfulle/builds/trilinos/atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi/
build.sh*
pretty standard for how I set things up. I know I have seen this before, I just don't recall the resolution.
@tjfulle, I have been able to reproduce the recursive reconfigure with the following commands:
$ ssh vortex
$ cd ~/Trilinos.base/Trilinos/
$ git pull
$ git log --name-status -1
commit affda446dbed1abb637f7023a310f514d04ff57a (HEAD -> develop, github/develop)
Merge: d7d84417ef4 f6cc2e95efa
Author: trilinos-autotester <trilinos@sandia.gov>
Date: Mon Dec 21 17:57:51 2020 -0700
Merge Pull Request #8511 from ZUUL42/Trilinos/teko_tests_ON
Automatically Merged using Trilinos Pull Request AutoTester
PR Title: Turn Teko tests back on.
PR Author: ZUUL42
$ cd ~/Trilinos.base/Trilinos/build/
$ ls -l
total 4
lrwxrwxrwx 1 rabartl rabartl 123 Dec 21 19:12 ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi -> /vscratch1/rabartl/Trilinos.base/BUILDS/VORTEX/ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi
$ cd ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi/
$ source ~/Trilinos.base/Trilinos/cmake/std/atdm/load-env.sh \
ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi
Hostname 'vortex60' matches known ATDM host 'vortex60' and system 'ats2'
Setting compiler and build options for build-name 'ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi'
Using ats2 compiler stack CUDA-10.1.243_GNU-7.3.1_SPMPI-ROLLING to build RELEASE code with Kokkos node type CUDA
Due to MODULEPATH changes, the following have been reloaded:
1) spectrum-mpi/rolling-release
$ module list
Currently Loaded Modules:
1) StdEnv (S) 3) sparc-tools/aerotools/2 5) gcc/7.3.1 7) cuda/10.1.243 9) sparc-dev/cuda-10.1.243_gcc-7.3.1_spmpi-rolling
2) python/2.7.16 4) sparc-tools/taos/2020.09.04 6) spectrum-mpi/rolling-release 8) lapack/3.8.0-gcc-4.9.3 10) git/2.20.0
Where:
S: Module is Sticky, requires --force to unload or purge
$ rm -r CMake*
$ time cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Tpetra=ON \
$HOME/Trilinos.base/Trilinos \
&> configure.out
real 1m31.536s
user 0m48.770s
sys 0m55.829s
$ tail -n 5 configure.out
Finished configuring Trilinos!
-- Configuring done
-- Generating done
-- Build files have been written to: /ascldap/users/rabartl/Trilinos.base/Trilinos/build/ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi
$ ninja -j8
The final ninja -j8
command just does configure over and over. If you run ninja -d explain
your see:
ninja explain: output ../../CMakeLists.txt of phony edge with no inputs doesn't exist
ninja explain: ../../CMakeLists.txt is dirty
ninja explain: ../../CTestConfig.cmake is dirty
ninja explain: ../../LICENSE is dirty
ninja explain: ../../PackagesList.cmake is dirty
ninja explain: ../../ProjectName.cmake is dirty
ninja explain: ../../README is dirty
...
This is crazy behavior for Ninja that I can't explain.
But the fix is simple. Just configure from a build dir outside of the source tree with:
$ cd /vscratch1/rabartl/Trilinos.base/BUILDS/VORTEX/ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi/
$ rm -r CMake*
$ source ~/Trilinos.base/Trilinos/cmake/std/atdm/load-env.sh \
ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi
Hostname 'vortex60' matches known ATDM host 'vortex60' and system 'ats2'
Setting compiler and build options for build-name 'ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi'
Using ats2 compiler stack CUDA-10.1.243_GNU-7.3.1_SPMPI-ROLLING to build RELEASE code with Kokkos node type CUDA
Due to MODULEPATH changes, the following have been reloaded:
1) spectrum-mpi/rolling-release
$ time cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Tpetra=ON \
$HOME/Trilinos.base/Trilinos \
&> configure.out
real 1m22.598s
user 0m48.002s
sys 0m49.535s
$ ninja -j8
[0/1] Re-running CMake...
...
-- Configuring done
-- Generating done
-- Build files have been written to: /vscratch1/rabartl/Trilinos.base/BUILDS/VORTEX/ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi
[1/1574] Building CXX object packages/kokkos/core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_CPUDiscovery.cpp.o
[2/1574] Building CXX object packages/kokkos/core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_HostBarrier.cpp.o
...
I will update the instructions to warn not to try to configure and build under the source tree. I will also provide more detailed instructions on reproducing the runtime error with the env setting export TPETRA_ASSUME_CUDA_AWARE_MPI=1
as explained at:
Tests with issue trackers Failed: twif=3
Site | Build Name | Test Name | Status | Details | Consecutive Non-pass Days | Non-pass Last 30 Days | Pass Last 30 Days | Issue Tracker |
---|---|---|---|---|---|---|---|---|
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_CrsMatrix_UnitTests3_MPI_4 | Failed | Completed (Failed) | 30 | 30 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MV_reduce_strided_MPI_4 | Failed | Completed (Failed) | 30 | 30 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MultiVector_UnitTests_MPI_4 | Failed | Completed (Failed) | 30 | 30 | 0 | #8474 |
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
FYI: This is also impacting a Belos test Belos_Tpetra_MVOPTester_complex_test_MPI_4
as shown here.
Tests with issue trackers Failed: twif=4
Site | Build Name | Test Name | Status | Details | Consecutive Non-pass Days | Non-pass Last 30 Days | Pass Last 30 Days | Issue Tracker |
---|---|---|---|---|---|---|---|---|
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | Belos_Tpetra_MVOPTester_complex_test_MPI_4 | Failed | Completed (Failed) | 30 | 30 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_CrsMatrix_UnitTests3_MPI_4 | Failed | Completed (Failed) | 30 | 30 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MV_reduce_strided_MPI_4 | Failed | Completed (Failed) | 30 | 30 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MultiVector_UnitTests_MPI_4 | Failed | Completed (Failed) | 30 | 30 | 0 | #8474 |
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
Tests with issue trackers Failed: twif=4
Site | Build Name | Test Name | Status | Details | Consecutive Non-pass Days | Non-pass Last 30 Days | Pass Last 30 Days | Issue Tracker |
---|---|---|---|---|---|---|---|---|
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | Belos_Tpetra_MVOPTester_complex_test_MPI_4 | Failed | Completed (Failed) | 24 | 24 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_CrsMatrix_UnitTests3_MPI_4 | Failed | Completed (Failed) | 24 | 24 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MV_reduce_strided_MPI_4 | Failed | Completed (Failed) | 24 | 24 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MultiVector_UnitTests_MPI_4 | Failed | Completed (Failed) | 24 | 24 | 0 | #8474 |
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
Tests with issue trackers Failed: twif=4
Site | Build Name | Test Name | Status | Details | Consecutive Non-pass Days | Non-pass Last 30 Days | Pass Last 30 Days | Issue Tracker |
---|---|---|---|---|---|---|---|---|
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | Belos_Tpetra_MVOPTester_complex_test_MPI_4 | Failed | Completed (Failed) | 24 | 24 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_CrsMatrix_UnitTests3_MPI_4 | Failed | Completed (Failed) | 24 | 24 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MV_reduce_strided_MPI_4 | Failed | Completed (Failed) | 24 | 24 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MultiVector_UnitTests_MPI_4 | Failed | Completed (Failed) | 24 | 24 | 0 | #8474 |
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
@tjfulle, did you see my note about configuring and building outside of the source tree above? That should fix the problem.
@bartlettroscoe yes, I did. I'm actually working on this issue as we speak. Has something in TriBITs changed recently? I regularly build CMake projects (not just Trilinos) in $PROJECT_SOURCE_DIR/builds/some-build-subdirectory
and haven't run in to the infinite configure recursion before.
I'm actually working on this issue as we speak. Has something in TriBITs changed recently?
@tjfulle, not sure. We need to see if this occurs on other platforms as well or if this system is an oddity.
Quick update. The issue seems to be in Tpetra::MultiVector::reduce
and may be similar in nature to #6423 and #6431
@kddevin these test failures occur in MultiVector::reduce
which attempts to use device memory to perform a Teuchos::allReduce
. Simply forcing a different code path (to use host memory for the reduction, see c22ea0d where I merely commented out ::Tpetra::Details::Behavior::assumeMpiIsCudaAware()
) resolves the test failures on vortex. But, is it right? This solution is essentially what was done to fix #6423 and #6431.
Tests with issue trackers Failed: twif=4
Site | Build Name | Test Name | Status | Details | Consecutive Non-pass Days | Non-pass Last 30 Days | Pass Last 30 Days | Issue Tracker |
---|---|---|---|---|---|---|---|---|
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | Belos_Tpetra_MVOPTester_complex_test_MPI_4 | Failed | Completed (Failed) | 26 | 26 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_CrsMatrix_UnitTests3_MPI_4 | Failed | Completed (Failed) | 26 | 26 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MV_reduce_strided_MPI_4 | Failed | Completed (Failed) | 26 | 26 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MultiVector_UnitTests_MPI_4 | Failed | Completed (Failed) | 26 | 26 | 0 | #8474 |
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
Tests with issue trackers Failed: twif=4
Site | Build Name | Test Name | Status | Details | Consecutive Non-pass Days | Non-pass Last 30 Days | Pass Last 30 Days | Issue Tracker |
---|---|---|---|---|---|---|---|---|
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | Belos_Tpetra_MVOPTester_complex_test_MPI_4 | Failed | Completed (Failed) | 26 | 26 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_CrsMatrix_UnitTests3_MPI_4 | Failed | Completed (Failed) | 26 | 26 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MV_reduce_strided_MPI_4 | Failed | Completed (Failed) | 26 | 26 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MultiVector_UnitTests_MPI_4 | Failed | Completed (Failed) | 26 | 26 | 0 | #8474 |
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
Tests with issue trackers Failed: twif=4
Site | Build Name | Test Name | Status | Details | Consecutive Non-pass Days | Non-pass Last 30 Days | Pass Last 30 Days | Issue Tracker |
---|---|---|---|---|---|---|---|---|
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | Belos_Tpetra_MVOPTester_complex_test_MPI_4 | Failed | Completed (Failed) | 27 | 27 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_CrsMatrix_UnitTests3_MPI_4 | Failed | Completed (Failed) | 27 | 27 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MV_reduce_strided_MPI_4 | Failed | Completed (Failed) | 27 | 27 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MultiVector_UnitTests_MPI_4 | Failed | Completed (Failed) | 27 | 27 | 0 | #8474 |
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
Tests with issue trackers Failed: twif=4
Site | Build Name | Test Name | Status | Details | Consecutive Non-pass Days | Non-pass Last 30 Days | Pass Last 30 Days | Issue Tracker |
---|---|---|---|---|---|---|---|---|
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | Belos_Tpetra_MVOPTester_complex_test_MPI_4 | Failed | Completed (Failed) | 26 | 26 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_CrsMatrix_UnitTests3_MPI_4 | Failed | Completed (Failed) | 26 | 26 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MV_reduce_strided_MPI_4 | Failed | Completed (Failed) | 26 | 26 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MultiVector_UnitTests_MPI_4 | Failed | Completed (Failed) | 26 | 26 | 0 | #8474 |
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
@bartlettroscoe - the bug may be specific to spectrum MPI and complex data types. Below is a working example demonstrating the problem. If the reduceAll
line is not commented, this test will fail as with the tests in this issue. A potential fix is shown - treat the View<Kokkos::complex<float>*>
as a View<float*>
with double the length. See the comments about complex numbers about half way down in the Summit user's guide.
#include "Kokkos_Complex.hpp"
#include "Tpetra_TestingUtilities.hpp"
#include "Tpetra_Details_extractMpiCommFromTeuchos.hpp"
#include "Tpetra_Core.hpp"
#include "mpi.h"
namespace { // (anonymous)
using Tpetra::getDefaultComm;
using Teuchos::REDUCE_SUM;
using Teuchos::reduceAll;
TEUCHOS_UNIT_TEST(View, ComplexReduce)
{
using view_type = Kokkos::View<Kokkos::complex<float>*, Kokkos::CudaSpace>;
view_type v0("view_1", 5);
view_type v1("view_2", 5);
Kokkos::parallel_for(
"fill",
Kokkos::RangePolicy<Kokkos::Cuda>(0, v0.extent(0)),
KOKKOS_LAMBDA (const int64_t i) {
v0(i).real() = 2.0 + static_cast<float>(i);
v0(i).imag() = 1.0 + static_cast<float>(i);
}
);
auto comm = getDefaultComm();
// reduceAll(*comm, REDUCE_SUM, static_cast<int>(v0.span()), v0.data(), v1.data());
auto mpi_comm = Tpetra::Details::extractMpiCommFromTeuchos(*comm);
MPI_Allreduce(
v0.data(),
v1.data(),
2 * v1.span(),
MPI_FLOAT,
MPI_SUM,
mpi_comm
);
Kokkos::parallel_for(
"print",
Kokkos::RangePolicy<Kokkos::Cuda>(0, v0.extent(0)),
KOKKOS_LAMBDA (const int64_t i) {
printf(
"%d: (%g, %g), (%g, %g)\n", i, v0(i).real(), v0(i).imag(), v1(i).real(), v1(i).imag()
);
}
);
}
} // namespace (anonymous)
int
main (int argc, char* argv[])
{
Tpetra::ScopeGuard tpetraScope (&argc, &argv);
const int errCode =
Teuchos::UnitTestRepository::runUnitTestsFromMain (argc, argv);
return errCode;
}
what do you think is the best fix?
what do you think is the best fix?
@tjfulle, interesting MPI bug. I am not sure the best way to address this. On one hand, it would be great if the Teuchos Comm wrappers could automatically make this translation but I am not sure how hard that would be to do without studying the code some. On the other hand, that Summit documentation says:
This is a known issue with libcoll and the SMPI team is working to resolve it.
so it seems tempting to try to put in the minimal ifdefs to fix this problem for ATS-2 for now.
That documentation also says the other option is:
An alternative workaround is to disable IBM optimized collectives. This will impact performance however but requires no code changes and should be correct for all MPI_Allreduce operations.
Given that this defect is only impacting complex builds and the major customers running on ATS-2 are not enabling complex types, I wonder if we should just disable the broken IBM optimized collectives for complex builds and be done with it? That should be a simple change to the files atdm/ats2/environment.sh and trilinos_jsrun to set those options:
--smpiargs="-HCOLL -FCA -mca coll_hcoll_enable 1 -mca coll_hcoll_np 0 -mca coll ^basic -mca coll ^ibm -async"
I could post a PR with the suggested change and you could test it?
What do you think?
@bartlettroscoe - disabling those optimizations seems like the optimal path forward. I can test whatever PR you open.
@bartlettroscoe - disabling those optimizations seems like the optimal path forward. I can test whatever PR you open.
@tjfulle, the arguments are set in PR #8858. Can you pull that branch and give this a try to see if it fixes this failure? If it looks good, then approve that PR and set AT: AUTOMERGE
so that PR can be merged.
Relates to my epic SEPW-213
CC: @kddevin, @jjellio
@tjfulle, as I explained in https://github.com/trilinos/Trilinos/pull/8858#issuecomment-798604587, it looks like the jsrun
options:
'-M' '-gpu -mca coll_hcoll_enable 1 -mca coll_hcoll_np 0 -mca coll ^basic -mca coll ^ibm -async'
as implemented in PR #8858 appears to make all of the test failures in this Issue go away. But it does trigger one new error for the test TpetraCore_idot_MPI_4
.
Seems like unless there is an important customer that needs to run with high-performance with complex types on ATS-2 in the short-term, then we should consider going with the runtime options in PR #8858 and either disable the test TpetraCore_idot_MPI_4
for that build (or someone can try to debug it some to see what is happening). I would hate to see a bunch of hacks of Teuchos software for a know defect in this one ATS-2 IBM/NVIDIA platform that IBM says there are going to fix. I think you need to have an important current customer that needs this to justify that. And there are so many other existing failures that need to be fixed in Trilinos on these various platforms as documented in this Issue query.
I have likely already done too much here. I was just trying to show how to set the jsrun options documented in the Summit's user guide and I have done that. The rest is up to the Trilinos developers to decide what to do but my recommendation is to just merge PR #8858 and disable the test TpetraCore_idot_MPI_4
for this one build and move on to fixing other failing tests to try to clean up the dashboard.
What's the mojo to reproduce this build? I'd like to try -M -ucx
or omitting async
(among others that come to mind).
I'm not really sure how tested async
is with -gpu
enabled (last year that was defined as an unsupported operation) - as in, collectives on GPU's did not support progress threads. I don't have any public link to support that... it's from discussions with admins.
It's also worth adding, it may be more productive to use Intel's MPI benchmarks (if they compiler on ATS2). With those, you can toggle the MPI datatype. Many systems use the OSU MPI benchmarks to acceptance (which hard code many data types to either FLOAT or DOUBLE). In the past I've hacked their code to test other types, but Intel now supports runtime selection of Datatype (not sure if it supports complex though).
What's the mojo to reproduce this build?
@jjellio, just checkout the branch from PR #8858 and then follow the instructions for the ATDM Trilinos configuration for ATS-2 listed here to configure, build, and run tests. Then just edit this line in the cmake/std/atdm/ats2/enviornment.sh
file. Note that the ctest output gives you the exact jsrun command being used so you can just run it manually or manually modify the commandline in the generated CTestTestfile.cmake
file for the test you want to run and rerun the test without needing to reload the env, then reconfigure. (I did the latter.)
Let me know if you have any questions about this.
I'm not seeing a problem fyi...
TRILINOS=$(cd ..; pwd)
# Load env and configure on the login node
source $TRILINOS/cmake/std/atdm/load-env.sh ats2-cuda-complex-release
cmake -GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON \
-DTrilinos_ENABLE_Tpetra=ON \
$TRILINOS
Ignore
jjellio@vortex50 /ascl...o/src/Trilinos/build $ DEBUG_SCRIPT=1 TPETRA_ASSUME_CUDA_AWARE_MPI=0 ../cmake/std/atdm/ats2/trilinos_jsrun '-p' '4' '--rs_per_socket' '4' '/ascldap/users/jjellio/src/Trilinos/build/packages/tpetra/core/test/Comm/TpetraCore_idot.exe' 2>&1 | grep -P '(^BEFORE|^AFTER|^End Result:)'
BEFORE: jsrun '-p' '4' '--rs_per_socket' '4' '/ascldap/users/jjellio/src/Trilinos/build/packages/tpetra/core/test/Comm/TpetraCore_idot.exe'
AFTER: export TPETRA_ASSUME_CUDA_AWARE_MPI=0; jsrun '-p' '4' '--rs_per_socket' '4' '/ascldap/users/jjellio/src/Trilinos/build/packages/tpetra/core/test/Comm/TpetraCore_idot.exe'
End Result: TEST PASSED
jjellio@vortex50 /ascl...o/src/Trilinos/build $ DEBUG_SCRIPT=1 TPETRA_ASSUME_CUDA_AWARE_MPI=1 ../cmake/std/atdm/ats2/trilinos_jsrun '-p' '4' '--rs_per_socket' '4' '/ascldap/users/jjellio/src/Trilinos/build/packages/tpetra/core/test/Comm/TpetraCore_idot.exe' 2>&1 | grep -P '(^BEFORE|^AFTER|^End Result:)'
BEFORE: jsrun '-p' '4' '--rs_per_socket' '4' '/ascldap/users/jjellio/src/Trilinos/build/packages/tpetra/core/test/Comm/TpetraCore_idot.exe'
AFTER: export TPETRA_ASSUME_CUDA_AWARE_MPI=1; jsrun '-E LD_PRELOAD=/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/lib/pami_451/libpami.so' '-M -gpu' '-p' '4' '--rs_per_socket' '4' '/ascldap/users/jjellio/src/Trilinos/build/packages/tpetra/core/test/Comm/TpetraCore_idot.exe'
End Result: TEST PASSED
Edit - (cursed formatting!)
I also tried Ross' with and without -async
and both worked (so you can likely drop -async
... it is not well tested imo)
Maybe I'm missing the point - are people wanting to use HCOLL? I thought LLNL had it disabled by default, e.g.,
jjellio@vortex50 /ascl...o/src/Trilinos/build $ jsrun -p1 -g1 -r1 -c1 -brs -M -gpu env | grep -i hcoll
LD_LIBRARY_PATH=/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/container/../lib/pami_port:/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/container/../lib:/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/container/../lib/pami_port:pami_port:/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/container/../lib:/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/container/../lib/pami_port:/opt/ibm/spectrum_mpi/jsm_pmix/lib/:/opt/ibm/spectrum_mpi/jsm_pmix/../lib/:/opt/ibm/spectrum_mpi/jsm_pmix/../lib/:/opt/ibm/csm/lib/:/usr/tce/packages/gcc/gcc-7.3.1/lib:/usr/tcetmp/packages/lapack/lapack-3.8.0-gcc-4.9.3/lib:/usr/tce/packages/cuda/cuda-10.1.243/lib64:/usr/gapps/sparc/tools/taos/2020-09-04/taos_ats2-pwr9_gcc-4.9.3_opt/lib:/usr/gapps/sparc/tools/seacas/ats2-pwr9/2020-04-07/00000000/ats2-pwr9_gcc-7.3.1_serial_spmpi-rolling_shared_opt/lib:/usr/tce/packages/python/python-2.7.16/lib:/usr/tce/packages/gcc/gcc-7.3.1/lib:/usr/local/cuda/lib64:/opt/ibm/spectrumcomputing/lsf/10.1.0.9/linux3.10-glibc2.17-ppc64le-csm/lib:/opt/mellanox/hcoll/lib:/opt/mellanox/sharp/lib
ATDM_CONFIG_MPI_PRE_FLAGS=-M;-mca coll_hcoll_enable 1 -mca coll_hcoll_np 0 -mca coll ^basic -mca coll ^ibm -async
HCOLL_EXTERNAL_UCM_EVENTS=1
HCOLL_ENABLE_SHARP=0
HCOLL_SHARP_NP=512
HCOLL_MAIN_IB=mlx5_0:1
HCOLL_ML_DISABLE_ALLREDUCE=0
HCOLL_ML_DISABLE_BCAST=0
HCOLL_ALLREDUCE_ZCOPY_TUNE=static
OMPI_MCA_coll_hcoll_priority=90
OMPI_MCA_coll_hcoll_enable=0
OMPI_MCA_coll_ibm_fallbackhcoll=0
OMPI_LD_LIBRARY_PATH_POSTPEND=/opt/mellanox/hcoll/lib:/opt/mellanox/sharp/lib
OMPI_MCA_mca_base_env_list_distro=MPI_ROOT,OPAL_PREFIX,OPAL_LIBDIR,PMIX_INSTALL_PREFIX,PAMI_IBV_SKIP_CQOVERFLOW_CHECK,IBV_FORK_SAFE,HCOLL_EXTERNAL_UCM_EVENTS,UCX_MEM_MMAP_HOOK_MODE,PMIX_MCA_ptl_tcp_handshake_wait_time,PMIX_MCA_ptl_tcp_handshake_max_retries,PMIX_MCA_gds,HCOLL_ENABLE_SHARP,SHARP_COLL_ENABLE_MCAST_TARGET,SHARP_COLL_LOG_LEVEL,HCOLL_SHARP_NPHCOLL_MAIN_IB,HCOLL_ML_DISABLE_ALLREDUCE,HCOLL_ML_DISABLE_BCAST,HCOLL_ALLREDUCE_ZCOPY_TUNE,UCX_MEM_EVENTS,OMPI_LD_LIBRARY_PATH_POSTPEND,OMPI_LD_PRELOAD_POSTPEND_DISTRO,OMPI_LD_LIBRARY_PATH_PREPEND_DISTRO,SMPI_INTERNAL_MPIRUN_LAUNCH_JSM
Okay, I can see the failures. In the topmost list of unit tests
TpetraCore_CrsMatrix_UnitTests3_MPI_4
TpetraCore_MV_reduce_strided_MPI_4
TpetraCore_MultiVector_UnitTests_MPI_4
I suspect disabling IBM's collectives may fix this, e.g., (no HCOLL nonsense... that may add more headaches elsewhere)
TPETRA_ASSUME_CUDA_AWARE_MPI=1 jsrun \
'-E LD_PRELOAD=/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/lib/pami_451/libpami.so' \
'-M -gpu -mca coll ^ibm' '-p' '4' '--rs_per_socket' '4' \
'/ascldap/users/jjellio/src/Trilinos/build/packages/tpetra/core/test/MultiVector/TpetraCore_MultiVector_UnitTests.exe'
Seems to work.
FYI - this should default to use OpenMPI's 'cuda' transport for collectives. (not IBM's variants).
# with the regular JSRUN line, you get IBM as the top priority
[vortex50:52960] coll:base:comm_select: selecting libnbc, priority 10, Enabled
[vortex50:52960] coll:base:comm_select: selecting basic, priority 10, Enabled
[vortex50:52960] coll:base:comm_select: selecting cuda, priority 78, Enabled
[vortex50:52960] coll:base:comm_select: selecting ibm, priority 95, Enabled
# with '-M -gpu -mca coll ^ibm', you will now get 'cuda' as the top one
[vortex50:52880] coll:base:comm_select: selecting libnbc, priority 10, Enabled
[vortex50:52880] coll:base:comm_select: selecting basic, priority 10, Enabled
[vortex50:52880] coll:base:comm_select: selecting cuda, priority 78, Enabled
I suppose you could disable basic
that's what -mca coll ^basic
is doing in your line - but I don't think it's necessary.
I do need to test this with 2 physical nodes... What works for 'shared memory' may not work otherwise
I can dig a little more - there also is usually a way to restrict this to specific function calls. E.g., disable it for MPI_Allreduce
(if that's broken) - but keep the optimized one for everything else.
TLDR: I'd suggest -mca coll ^ibm
as the jsrun flags for a COMPLEX build. This avoids some more adventurous parts of Spectrum (-async
and HCOLL
). HCOLL
is disabled by default on Sierra (or it is on our testbeds). async
enables progress threads - In my testing with EMPIRE this slowed down the code to the tune of about 2x. (I also doubt it is used much). The flags I propose keep 'default' parameters as much as possible. It's also possible we could tune that argument to filter out only broken collectives (e.g., only all_reduce if that is the case)
Looks good:
TPETRA_ASSUME_CUDA_AWARE_MPI=1 jsrun \
'-E LD_PRELOAD=/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/lib/pami_451/libpami.so' \
'-M -gpu -mca coll ^ibm -mca coll_base_verbose 0 ' \
-r1 -p4 -d cyclic \
'/ascldap/users/jjellio/src/Trilinos/build/packages/tpetra/core/test/MultiVector/TpetraCore_MultiVector_UnitTests.exe'
End Result: TEST PASSED
# go without cuda-aware
TPETRA_ASSUME_CUDA_AWARE_MPI=0 jsrun \
'-M -mca coll ^ibm -mca coll_base_verbose 0 ' \
-r1 -p4 -d cyclic \
'/ascldap/users/jjellio/src/Trilinos/build/packages/tpetra/core/test/MultiVector/TpetraCore_MultiVector_UnitTests.exe'
End Result: TEST PASSED
# do cuda aware but leave IBM enabled:
TPETRA_ASSUME_CUDA_AWARE_MPI=1 jsrun \
'-E LD_PRELOAD=/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/lib/pami_451/libpami.so' \
'-M -gpu -mca coll_base_verbose 0 ' \
-r1 -p4 -d cyclic \
'/ascldap/users/jjellio/src/Trilinos/build/packages/tpetra/core/test/MultiVector/TpetraCore_MultiVector_UnitTests.exe'
End Result: TEST FAILED
Edit: I find/replaced the flags in Ctest files and reran the test suite:
TPETRA_ASSUME_CUDA_AWARE_MPI=1 ctest -j4
100% tests passed, 0 tests failed out of 271
Subproject Time Summary:
Tpetra = 2315.94 sec*proc (271 tests)
Total Test time (real) = 644.70 sec
TPETRA_ASSUME_CUDA_AWARE_MPI=0 ctest -j4
100% tests passed, 0 tests failed out of 271
Subproject Time Summary:
Tpetra = 1994.44 sec*proc (271 tests)
Total Test time (real) = 562.59 sec
# now try on 4 nodes
# delete the -r4 part and replace with -r1 -d cyclic
find -name "CTestTestfile.cmake" -print0 | xargs -0 sed -i -e 's|"--rs_per_socket" "4"|"-r" "1" "-d" "cyclic"|g'
TPETRA_ASSUME_CUDA_AWARE_MPI=1 ctest -j4
100% tests passed, 0 tests failed out of 271
Subproject Time Summary:
Tpetra = 2821.91 sec*proc (271 tests)
Total Test time (real) = 770.99 sec
TPETRA_ASSUME_CUDA_AWARE_MPI=0 ctest -j4
100% tests passed, 0 tests failed out of 271
Subproject Time Summary:
Tpetra = 3057.93 sec*proc (271 tests)
Total Test time (real) = 829.25 sec
E.g., the actual flags are:
1: Test command: /ascldap/users/jjellio/src/Trilinos/cmake/std/atdm/ats2/trilinos_jsrun "-M" "-mca coll ^ibm" "-p" "1" "--rs_per_socket" "4" "/ascldap/users/jjellio/src/Trilinos/build/packages/tpetra/tsqr/test/TpetraTSQR_CuSolver.exe"
1: Environment variables:
1: CTEST_KOKKOS_DEVICE_TYPE=gpus
1: Test timeout computed to be: 600
1: BEFORE: jsrun '-M' '-mca coll ^ibm' '-p' '1' '--rs_per_socket' '4' '/ascldap/users/jjellio/src/Trilinos/build/packages/tpetra/tsqr/test/TpetraTSQR_CuSolver.exe'
1: AFTER: export TPETRA_ASSUME_CUDA_AWARE_MPI=1; jsrun '-M' '-disable_gpu_hooks -mca coll ^ibm' '-p' '1' '--rs_per_socket' '4' '/ascldap/users/jjellio/src/Trilinos/build/packages/tpetra/tsqr/test/TpetraTSQR_CuSolver.exe'
1: Test cuBLAS and cuSOLVER handle creation
1: s.get () != nullptr = 1 == true : passed
1: Original x: 666: Converted x: 666
1: b.get () != nullptr = 1 == true : passed
1: C_h(0,0) = 7.00000000000000000e+00 == 7.00000000000000000e+00 : passed
1: C_h(0,0) = 3.10000000000000000e+01 == 3.10000000000000000e+01 : passed
1: s.get () != nullptr = 1 == true : passed
1: Original x: 666: Converted x: 666
1: b.get () != nullptr = 1 == true : passed
1: C_h(0,0) = 7.00000000e+00 == 7.00000000e+00 : passed
1: C_h(0,0) = 3.10000000e+01 == 3.10000000e+01 : passed
1: s.get () != nullptr = 1 == true : passed
1: Original x: (666,418): Converted x: (666,418)
1: b.get () != nullptr = 1 == true : passed
1: s.get () != nullptr = 1 == true : passed
1: Original x: (666,418): Converted x: (666,418)
1: b.get () != nullptr = 1 == true : passed
1:
1: End Result: TEST PASSED
1: jsrun return value: 0
1/271 Test #1: TpetraTSQR_CuSolver_MPI_1 ................................................................... Passed 2.12 sec
Edit, and for reference, In an earlier post I mentioned testing across physical nodes:
jsrun -r1 -p4 -d cyclic hostname
vortex79
vortex51
vortex78
vortex77
Tests with issue trackers Failed: twif=4
Site | Build Name | Test Name | Status | Details | Consecutive Non-pass Days | Non-pass Last 30 Days | Pass Last 30 Days | Issue Tracker |
---|---|---|---|---|---|---|---|---|
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | Belos_Tpetra_MVOPTester_complex_test_MPI_4 | Failed | Completed (Failed) | 28 | 28 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_CrsMatrix_UnitTests3_MPI_4 | Failed | Completed (Failed) | 28 | 28 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MV_reduce_strided_MPI_4 | Failed | Completed (Failed) | 28 | 28 | 0 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MultiVector_UnitTests_MPI_4 | Failed | Completed (Failed) | 28 | 28 | 0 | #8474 |
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
Looks good
@jjellio, what about the failing test TpetraCore_idot_MPI_4
? Did you run that?
I ran everything (271 tests)
I ran everything (271 tests)
@jjellio, then can you push a commit to the branch for PR #8858 that uses the updated jsrun options?
@tjfulle and @jjellio,
As I described in more detail just now in https://github.com/trilinos/Trilinos/pull/8858#issuecomment-801170169, I ran 4 full sets of these ats2 complex builds for the updated jsrun options '-M' '-mca coll ^ibm'
over the last 3 days.
In summary:
'-M' '-mca coll ^ibm'
appears to eliminate all of the Tpetra and Belos failures due to the Spectrum MPI complex collectives bug reported in #8474.cdash_analyze_and_report.py
tool that filter out these mass jsrun failures.The fact that these updated jsrun options eliminates these Tpetra and Belos esrrors without requiring any code hacks in Trilinos seems to be like a good tradeoff in these options do cause more incidence of mass jsrun failures (which we can tolerate and don't degrade our ability to maintain the test suite a long as they occur in less than 50% of the builds or so).
Therefore, I would say we say we should merge PR #8858 and see how it goes in nightly testing. Agreed?
Tests with issue trackers Passed: twip=4
Site | Build Name | Test Name | Status | Details | Consecutive Pass Days | Non-pass Last 30 Days | Pass Last 30 Days | Issue Tracker |
---|---|---|---|---|---|---|---|---|
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | Belos_Tpetra_MVOPTester_complex_test_MPI_4 | Passed | Completed | 3 | 25 | 3 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_CrsMatrix_UnitTests3_MPI_4 | Passed | Completed | 3 | 25 | 3 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MV_reduce_strided_MPI_4 | Passed | Completed | 3 | 25 | 3 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MultiVector_UnitTests_MPI_4 | Passed | Completed | 3 | 25 | 3 | #8474 |
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
Tests with issue trackers Passed: twip=4
Site | Build Name | Test Name | Status | Details | Consecutive Pass Days | Non-pass Last 30 Days | Pass Last 30 Days | Issue Tracker |
---|---|---|---|---|---|---|---|---|
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | Belos_Tpetra_MVOPTester_complex_test_MPI_4 | Passed | Completed | 10 | 19 | 10 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_CrsMatrix_UnitTests3_MPI_4 | Passed | Completed | 10 | 19 | 10 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MV_reduce_strided_MPI_4 | Passed | Completed | 10 | 19 | 10 | #8474 |
vortex | Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi | TpetraCore_MultiVector_UnitTests_MPI_4 | Passed | Completed | 10 | 19 | 10 | #8474 |
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat.
CC: @trilinos/tpetra, @kddevin (Trilinos Data Services Product Lead), @e10harvey