Closed bartlettroscoe closed 7 years ago
CC: @trilinos/thyra, @trilinos/stratimikos
I am getting to work on this now.
I was able to reproduce warnings for Thyra for the NVCC/CUDA build on shiller using the env provded in #1133. I extracted this into the scripts under Trilinos:
cmake/ctest/drivers/ATTB/attb_cuda_config_base.sh
cmake/ctest/drivers/ATTB/attb_cuda_config_for_drekar.sh
cmake/ctest/drivers/ATTB/load_attb_cuda_env.sh
and used these to reproduce warnings in Thyra and fix them. See details below.
I created the new macro TEUCHOS_NONREACHABLE_RETURN(RETURN_VAL)
and used it to remove all of the "statement is unreachable" warnings.
Next I will remove all of the warnings for Stratimikos for NVCC/CUDA.
DETAILED NOTES:
(2017/03/16)
@bathmath provided the pretty complete configure script above. It even has the module loads so it is self-contained. The only piece of info missing is what machine these builds are done on. My guess are the ATTB machines hansen and shiller.
First, I need to separate this configure script into its three logical parts:
1) The setup of the env and modules
2) The specification of the env passed to cmake
3) The list of Trilinos packages to enable/disable
I broke up this script into the thre files:
load_attb_cuda_env.sh
attb_cuda_config_base.sh
attb_cuda_config_for_drekar.sh
I pushed this to the branch:
as of commit 1b2f8ce.
I then tried to reproduce the configure and build with:
$ time ./do-configure -DTrilinos_ENABLE_TESTS=ON \
-DTrilinos_ENABLE_Thyra=ON -DTrilinos_ENABLE_Stratimikos=ON \
&> configure.out
real 1m56.765s
user 1m5.000s
sys 0m11.434s
$ time make -j10 &> make.out
real 12m31.788s
user 74m36.931s
sys 19m48.565s
That build failed with link failures like:
[100%] Linking CXX executable ThyraCore_DefaultBlockedLinearOpUnitTests.exe
/home/projects/x86-64-haswell/blas/20150602/gcc/4.8.4/lib/libblas.a(xerbla.o): In function `xerbla_':
xerbla.f:(.text+0x52): undefined reference to `_gfortran_st_write'
xerbla.f:(.text+0x5d): undefined reference to `_gfortran_string_len_trim'
xerbla.f:(.text+0x6f): undefined reference to `_gfortran_transfer_character_write'
xerbla.f:(.text+0x7f): undefined reference to `_gfortran_transfer_integer_write'
xerbla.f:(.text+0x87): undefined reference to `_gfortran_st_write_done'
xerbla.f:(.text+0x90): undefined reference to `_gfortran_stop_string'
collect2: error: ld returned 1 exit status
make[3]: *** [packages/thyra/core/test/operator_vector/ThyraCore_DefaultBlockedLinearOpUnitTests.exe] Error 1
Seems like the build configuration is missing -lgfortran
. I will try adding that and see what happens. That seemed to fix it (in commit c9c4ef7). I pushed that in the commit c9c4ef7.
Now configuring and building from scratch:
$ time ./do-configure -DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Thyra=ON -DTrilinos_ENABLE_Stratimikos=ON &> configure.out && time make -j16 &> make.out ; ~/mailmsg.py "Finished CUDA build for Thyra and Startimikos on shiller"
real 1m46.340s
user 1m2.969s
sys 0m11.209s
real 6m18.293s
user 52m6.997s
sys 22m53.191s
Now that actually built and linked everything.
Now running the tests:
$ ctest -j10
[ ... ]
97% tests passed, 3 tests failed out of 119
Label Time Summary:
Stratimikos = 43.38 sec (39 tests)
Thyra = 69.01 sec (80 tests)
Total Test time (real) = 24.22 sec
The following tests FAILED:
78 - ThyraTpetraAdapters_TpetraThyraWrappersUnitTests_MPI_4 (Failed)
79 - ThyraTpetraAdapters_TpetraThyraWrappersUnitTests_serial_MPI_1 (Failed)
80 - ThyraTpetraAdapters_Simple2DTpetraModelEvaluatorUnitTests_MPI_1 (Failed)
Errors while running CTest
real 0m24.276s
user 0m41.490s
sys 0m50.463s
Looks like you can't actually run tests that need CUDA on the build node. You get, for example:
1. TpetraThyraWrappers_double_createVectorSpace_UnitTest ...
p=0: *** Caught standard std::exception of type 'std::runtime_error' :
cudaGetDeviceCount( & m_cudaDevCount ) error( cudaErrorInsufficientDriver): CUDA driver version is insufficient for CUDA runtime version
/home/rabartl/Trilinos.base/Trilinos/packages/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp:207
Traceback functionality not available
[FAILED] (0.000584 sec) TpetraThyraWrappers_double_createVectorSpace_UnitTest
Location: /home/rabartl/Trilinos.base/Trilinos/packages/thyra/adapters/tpetra/test/TpetraThyraWrappers_UnitTests.cpp:220
So how to run test on shiller? Where do I find this documentation? I did a search on the snl-wiki for "shiller" and I found out where to find documentation on the machine itself. After grabing interactive nodes, I was able to run:
$ ctest -j2 -R ^ThyraTpetraAdapters.*
Test project /home/rabartl/Trilinos.base/BUILDS/CUDA/MPI_RELEASE_CUDA
Start 80: ThyraTpetraAdapters_Simple2DTpetraModelEvaluatorUnitTests_MPI_1
Start 79: ThyraTpetraAdapters_TpetraThyraWrappersUnitTests_serial_MPI_1
1/3 Test #80: ThyraTpetraAdapters_Simple2DTpetraModelEvaluatorUnitTests_MPI_1 ... Passed 15.06 sec
2/3 Test #79: ThyraTpetraAdapters_TpetraThyraWrappersUnitTests_serial_MPI_1 ..... Passed 15.76 sec
Start 78: ThyraTpetraAdapters_TpetraThyraWrappersUnitTests_MPI_4
3/3 Test #78: ThyraTpetraAdapters_TpetraThyraWrappersUnitTests_MPI_4 ............ Passed 17.52 sec
100% tests passed, 0 tests failed out of 3
Label Time Summary:
Thyra = 48.34 sec (3 tests)
Total Test time (real) = 33.44 sec
Wow, shiller is super slow!
Anyhow, I can finally see the warnings that were produced when building Thyra and Stratimikos tests.
Looking for unique warnings from Teuchos first:
$ grep warning make.out | grep packages/teuchos | sort | uniq | less
/home/rabartl/Trilinos.base/Trilinos/packages/teuchos/numerics/src/Teuchos_SerialDenseMatrix.hpp:455:11: warning: ISO C++ does not support variable-length array types [-Wvla]
/home/rabartl/Trilinos.base/Trilinos/packages/teuchos/numerics/src/Teuchos_SerialDenseMatrix.hpp:659:11: warning: ISO C++ does not support variable-length array types [-Wvla]
/home/rabartl/Trilinos.base/Trilinos/packages/teuchos/numerics/src/Teuchos_SerialDenseMatrix.hpp:679:11: warning: ISO C++ does not support variable-length array types [-Wvla]
/home/rabartl/Trilinos.base/Trilinos/packages/teuchos/numerics/src/Teuchos_SerialSymDenseMatrix.hpp:488:11: warning: ISO C++ does not support variable-length array types [-Wvla]
/home/rabartl/Trilinos.base/Trilinos/packages/teuchos/numerics/src/Teuchos_SerialTriDiMatrix.hpp:413:11: warning: ISO C++ does not support variable-length array types [-Wvla]
/home/rabartl/Trilinos.base/Trilinos/packages/teuchos/numerics/src/Teuchos_SerialTriDiMatrix.hpp:431:11: warning: ISO C++ does not support variable-length array types [-Wvla]
/home/rabartl/Trilinos.base/Trilinos/packages/teuchos/numerics/src/Teuchos_SerialTriDiMatrix.hpp:456:11: warning: ISO C++ does not support variable-length array types [-Wvla]
/home/rabartl/Trilinos.base/Trilinos/packages/teuchos/numerics/src/Teuchos_SerialTriDiMatrix.hpp:468:11: warning: ISO C++ does not support variable-length array types [-Wvla]
/home/rabartl/Trilinos.base/Trilinos/packages/teuchos/numerics/src/Teuchos_SerialTriDiMatrix.hpp:492:11: warning: ISO C++ does not support variable-length array types [-Wvla]
/home/rabartl/Trilinos.base/Trilinos/packages/teuchos/numerics/src/Teuchos_SerialTriDiMatrix.hpp:571:51: warning: ISO C++ does not support variable-length array types [-Wvla]
I need to create a separate issue for that.
Now to look at Thyra warnings:
$ grep warning make.out | grep packages/thyra | sort | uniq | wc -l
36
Of these, almost all of them are "statement is unreachable" as shown by:
$ grep warning make.out | grep packages/thyra | grep -v "statement is unreachable" | sort | uniq
/home/rabartl/Trilinos.base/Trilinos/packages/thyra/adapters/epetra/src/Thyra_EpetraOperatorViewExtractorStd.cpp(61): warning: variable "eFwdOp" was set but never used
That leaves 35 "statement is unreachable" warnings.
Many of these "statement is unreachable" warnings come from code like:
Thyra::ModelEvaluatorBase::EDerivativeMultiVectorOrientation
Thyra::convert(
const EpetraExt::ModelEvaluator::EDerivativeMultiVectorOrientation &mvOrientation
)
{
switch(mvOrientation) {
case EpetraExt::ModelEvaluator::DERIV_MV_BY_COL :
return ModelEvaluatorBase::DERIV_MV_BY_COL;
case EpetraExt::ModelEvaluator::DERIV_TRANS_MV_BY_ROW :
return ModelEvaluatorBase::DERIV_TRANS_MV_BY_ROW;
default:
TEUCHOS_TEST_FOR_EXCEPT(true);
}
return ModelEvaluatorBase::DERIV_MV_BY_COL; // Should never be called!
}
Grepping for that comment shows there 15 of these:
$ find . -name "*pp" -exec grep -nHi "Should never be called" {} \; | wc -l
15
So that is where I will start.
What I want to do here is to create a macro TEUCHOS_NONREACHABLE_RETURN(RETURN_VAL)
and then use this in all such returns.
I removed all of the warnings for Thyra and pushed to the 'develop' branch.
I finished cleaning up the warnings from Thyra and Stratimikos for NVCC and pushed to the Trilinos 'develop' branch (see below).
@bathmatt, I am closing this as complete as I am pretty sure that you will not see any more warnings on NVCC on shiller coming from building Thyra or Stratimikos. There are also not any warnings for GCC 4.8.3 as well.
DETAILED NOTES:
(2017/03/17)
I cleaned up all of the warnings for Stratimikos for NVCC and pushed to the branch 'thyra-stratimikos-cuda-warnings-1140'.
Now I want to verify that all warnings for Thyra and Stratimikos are removed for the Trilinos build for Drekar. For that, on shiller, I do:
$ cd /home/rabartl/Trilinos.base/BUILDS/CUDA/MPI_RELEASE_CUDA
$ ln -s ../../../Trilinos/cmake/ctest/drivers/ATTB/attb_cuda_config_base.sh .
$ ln -s ../../../Trilinos/cmake/ctest/drivers/ATTB/attb_cuda_config_for_drekar.sh .
$ rm -r CMake*
$ time ./attb_cuda_config_for_drekar.sh &> configure.out
real 3m32.358s
user 1m40.306s
sys 0m15.161s
$ time make -j10 &> make.out
real 685m28.870s
user 2382m5.324s
sys 62m22.490s
Wow, almost 7.5 hours to build these packages on 10 cores! That is fantastic!
Looking for any remaining Thyra or Stratimikos warnings:
$ grep "warning: " make.out | grep "\(packages/stratimikos\|thyra\|Thyra\)" | sort | uniq | less
/home/rabartl/Trilinos.base/Trilinos/packages/nox/src-loca/src-thyra/LOCA_Thyra_GroupWrapper.H(66): warning: overloaded virtual function "LOCA::Thyra::Group::operator=" is only par
/home/rabartl/Trilinos.base/Trilinos/packages/nox/src-thyra/NOX_Thyra_Group.C(486): warning: statement is unreachable
/home/rabartl/Trilinos.base/Trilinos/packages/nox/src-thyra/NOX_Thyra_Group.C(695): warning: statement is unreachable
/home/rabartl/Trilinos.base/Trilinos/packages/nox/src-thyra/NOX_Thyra_Group.C(859): warning: statement is unreachable
/home/rabartl/Trilinos.base/Trilinos/packages/xpetra/sup/Utils/Xpetra_ThyraUtils.hpp(942): warning: dynamic initialization in unreachable code
/home/rabartl/Trilinos.base/Trilinos/packages/xpetra/sup/Utils/Xpetra_ThyraUtils.hpp(943): warning: statement is unreachable
I could fix the ones in NOX but I think Roger is already taking care of those (see #1139). As for the warnings in Xpetra, those should be fixed by an Xpetra developer.
Looking at the remaining warnings on this branch and the size of the make output:
[rabartl@shiller01 MPI_RELEASE_CUDA (master)]$ grep "warning: " make.out | wc -l
148033
[rabartl@shiller01 MPI_RELEASE_CUDA (master)]$ grep "warning: " make.out | sort | uniq | wc -l
21409
[rabartl@shiller01 MPI_RELEASE_CUDA (master)]$ du -sh make.out
562M make.out
Wow, still over 20K unique warnings!
(2017/03/20)
I then went back on crf450 and built and tested Teuchos, Thyra, and Stratimikos with GCC 4.8.3 with:
$ time ./do-configure -DTrilinos_ENABLE_Teuchos=ON \
-DTrilinos_ENABLE_Thyra=ON -DTrilinos_ENABLE_Stratimikos=ON \
&> configure.out
real 0m13.419s
user 0m8.696s
sys 0m2.616s
$ time make -j32 &> make.out
real 9m50.045s
user 94m45.685s
sys 8m1.349s
$ ctest -j32
...
100% tests passed, 0 tests failed out of 243
Label Time Summary:
Stratimikos = 29.59 sec (39 tests)
Teuchos = 17.14 sec (124 tests)
Thyra = 10.07 sec (80 tests)
Total Test time (real) = 5.49 sec
And this gave no warnings:
grep "warning" make.out | wc -l
0
(2017/03/21)
I pushed these commits to the Trilinos 'develop' branch with the top commit being:
commit 416fc129b8fed5f8c5083dad272feffc7e059f95
Merge: 9104e93 7d90906
Author: Roscoe A. Bartlett <rabartl@sandia.gov>
Date: Tue Mar 21 05:42:29 2017 -0600
Merge branch 'more-warnings-1133-1140' of github.com:bartlettroscoe/Trilinos into develop
Build/Test Cases Summary
Enabled Packages: EpetraExt, Stratimikos, TeuchosComm, TeuchosCore, ThyraCore
Disabled Packages: PyTrilinos,Claps,TriKota
Enabled all Forward Packages
0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=2234,notpassed=0 (100.48 min)
Other local commits for this build/test group: 7d90906, 8e3a5f45, 48a0a35, 8b6b253, aa8fafa, 148e6b4, 01122bd
This will take care of Thyra and Stratimikos warnings reported in #1133.