Closed ibaned closed 7 years ago
@trilinos/amesos2
@ibaned : I am curious. Does it work with other solvers (KLU2) ? Even if it does, the problem could be in the interface to SuperLU_Dist ? Can you dump the matrix for us ?
Correct, the KLU2 test passes, which is why I think it might be the Amesos2 -> SuperLU_Dist interface rather than the new Stratimikos -> Amesos2 interface proposed in the PR. I'll try to dump the matrix, but I have to rebuild the code first...
Actually, no need, the matrix just comes from this file: https://github.com/trilinos/Trilinos/blob/master/packages/ml/examples/BasicExamples/A.mm
@ibaned : Can we get some info on SuperLU_Dist version, configure options etc ?
@srajama1 I made a few CMake fixes to the branch while getting this set up. I'm using SuperLU_Dist 5.3.1, but I'm positive I went back and used the older version that Trilinos is originally compatible with, so you should see the same failure either way. Here is my configure script:
#!/bin/bash -ex
MPI_BASE_DIR=$HOME/install/gcc/mpich
BOOST_DIR=$HOME/install/gcc/boost
NETCDF_DIR=$HOME/install/gcc/netcdf
HDF5_DIR=$HOME/install/gcc/hdf5
PARMETIS_DIR=$HOME/install/gcc/parmetis
SUPERLUDIST_DIR=$HOME/install/gcc/SuperLU_DIST
cmake $HOME/src/Trilinos-superludist \
-DCMAKE_INSTALL_PREFIX:PATH=$HOME/install/gcc/Trilinos-superludist \
-DCMAKE_BUILD_TYPE:STRING=NONE \
-DBUILD_SHARED_LIBS:BOOL=ON \
-DTPL_FIND_SHARED_LIBS:BOOL=ON \
-DTPL_ENABLE_MPI:BOOL=ON \
-DMPI_BASE_DIR:PATH=${MPI_BASE_DIR} \
-DCMAKE_CXX_COMPILER:FILEPATH=${MPI_BASE_DIR}/bin/mpicxx \
-DCMAKE_C_COMPILER:FILEPATH=${MPI_BASE_DIR}/bin/mpicc \
-DTrilinos_ENABLE_Fortran:BOOL=OFF \
-DCMAKE_CXX_FLAGS:STRING='-O3 -g' \
-DCMAKE_C_FLAGS:STRING='-O3 -g' \
\
-DTrilinos_ENABLE_ALL_PACKAGES:BOOL=OFF \
-DTrilinos_ENABLE_ALL_OPTIONAL_PACKAGES:BOOL=OFF \
\
-DTrilinos_ENABLE_Teuchos:BOOL=ON \
-DTeuchos_ENABLE_LONG_LONG_INT:BOOL=ON \
\
-DTrilinos_ENABLE_Tpetra:BOOL=ON \
-DTpetra_INST_INT_LONG_LONG:BOOL=ON \
-DTpetra_INST_INT_INT:BOOL=ON \
-DTpetra_INST_DOUBLE:BOOL=ON \
-DTpetra_INST_FLOAT:BOOL=OFF \
-DTpetra_INST_COMPLEX_FLOAT:BOOL=OFF \
-DTpetra_INST_COMPLEX_DOUBLE:BOOL=OFF \
-DTpetra_INST_INT_LONG:BOOL=OFF \
-DTpetra_INST_INT_UNSIGNED:BOOL=OFF \
\
-DTPL_ENABLE_Boost:BOOL=ON \
-DTPL_ENABLE_BoostLib:BOOL=ON \
-DBoost_INCLUDE_DIRS:PATH=$BOOST_DIR/include \
-DBoost_LIBRARY_DIRS:PATH=$BOOST_DIR/lib \
-DBoostLib_INCLUDE_DIRS:PATH=$BOOST_DIR/include \
-DBoostLib_LIBRARY_DIRS:PATH=$BOOST_DIR/lib \
\
-DTPL_ENABLE_Zlib:BOOL=ON \
\
-DTPL_ENABLE_HDF5:BOOL=ON \
-DHDF5_INCLUDE_DIRS:PATH=$HDF5_DIR/include \
-DTPL_HDF5_LIBRARIES:STRING='/home/daibane/install/gcc/hdf5/lib/libhdf5.so;/home/daibane/install/gcc/hdf5/lib/libhdf5_hl.so' \
\
-DTPL_ENABLE_ParMETIS:BOOL=ON \
-DParMETIS_INCLUDE_DIRS:PATH="$PARMETIS_DIR/include" \
-DParMETIS_LIBRARY_DIRS:PATH="$PARMETIS_DIR/lib" \
\
-DTPL_ENABLE_SuperLUDist:BOOL=ON \
-DSuperLUDist_INCLUDE_DIRS:PATH="$SUPERLUDIST_DIR/include" \
-DSuperLUDist_LIBRARY_DIRS:PATH="$SUPERLUDIST_DIR/lib" \
\
-DTrilinos_ENABLE_Kokkos:BOOL=ON \
-DTrilinos_ENABLE_KokkosCore:BOOL=ON \
-DTrilinos_ENABLE_KokkosContainers:BOOL=ON \
-DTrilinos_ENABLE_KokkosExample:BOOL=OFF \
-DKokkos_ENABLE_Serial:BOOL=ON \
-DKokkos_ENABLE_OpenMP:BOOL=OFF \
-DKokkos_ENABLE_Pthread:BOOL=OFF \
-DKokkos_ENABLE_Cuda:BOOL=OFF \
-DTPL_ENABLE_CUDA:BOOL=OFF \
\
-DTrilinos_ENABLE_Amesos2:BOOL=ON \
-DAmesos2_ENABLE_KLU2:BOOL=ON \
\
-DTrilinos_ENABLE_EpetraExt:BOOL=ON \
-DTrilinos_ENABLE_ThyraTpetraAdapters:BOOL=ON \
\
-DTrilinos_ENABLE_Stratimikos:BOOL=ON \
-DStratimikos_ENABLE_TESTS:BOOL=ON \
\
2>&1 | tee config_log
Its a trimmed down version of another script, so some TPLs may not be needed. You should get three tests, with results as follows:
Test project /home/daibane/build/gcc/Trilinos-superludist
Start 1: Stratimikos_test_single_amesos2_tpetra_solver_driver_KLU2_MPI_1
1/3 Test #1: Stratimikos_test_single_amesos2_tpetra_solver_driver_KLU2_MPI_1 ........... Passed 0.17 sec
Start 2: Stratimikos_test_single_amesos2_tpetra_solver_driver_SuperLU_DIST_MPI_1
2/3 Test #2: Stratimikos_test_single_amesos2_tpetra_solver_driver_SuperLU_DIST_MPI_1 ...***Failed 0.29 sec
Start 3: Stratimikos_ValidParameters_MPI_1
3/3 Test #3: Stratimikos_ValidParameters_MPI_1 ......................................... Passed 0.03 sec
After some fighting with SuperLU_Dist, I've finally got to the point where we can investigate further. I'm able to reproduce the SuperLU_Dist failure; however, the KLU2 test also fails for me:
D) Testing the LinearOpBase interface of nsA ...
*** Entering LinearOpTester<double,double>::check(op,...) ...
describe op:
Thyra::Amesos2LinearOpWithSolve<double>{rangeDim=10000,domainDim=10000}
fwdOp = Thyra::TpetraLinearOp<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >{rangeDim=10000,domainDim=10000}
amesos2Solver=Amesos2::KLU2<Tpetra::CrsMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false>, Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false> >
Checking the domain and range spaces ...
op.domain().get() != NULL ? passed
op.range().get() != NULL ? passed
this->check_linear_properties()==true:Checking the linear properties of the forward linear operator ... op.opSupported(NOTRANS) = true == true : passed
Checking that the forward operator is truly linear:
0.5*op*(v1 + v2) == 0.5*op*v1 + 0.5*op*v2
\_____/ \___/
v3 v5
\_____________/ \___________________/
v4 v5
sum(v4) == sum(v5)
Random vector tests = 1
v1 = randomize(-1,+1); ...
v2 = randomize(-1,+1); ...
v3 = v1 + v2 ...
v4 = 0.5*op*v3 ...
v5 = op*v1 ...
v5 = 0.5*op*v2 + 0.5*v5 ...
Check: rel_err(sum(v4), sum(v5))
= rel_err(1.23077, 1.23077) = 3.42782e-14
<= linear_properties_error_tol() = 1e-14 : FAILED
It's not clear to me how significant this failure is though as 1e-14 is fairly strict tolerance and the test result misses it by a factor of a few. (This is on OS X with Clang 7.3.1+mpich 3.1.4).
@krcb thanks for looking into this !
I agree that the KLU2 failure looks too borderline to be significant.
I think there are options you can pass to this test system to change that tolerance, and raising it to 1e-12
so you can get through KLU2 and focus on SuperLU_Dist makes sense to me.
I've run the provided matrix through the Amesos2 SuperLU_Dist test driver. The output is as follows:
Test matrix A.mm ...
| with SuperLU_DIST :
Testing Tpetra objects
Doing tpetra test run `run0' with s=double lo=int go=int ...
Running test with types S=double, LO=int, GO=int, N=Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>
Reading matrix from /Volumes/Scratch/build/trilinos/trilinosrepo/parAmesos2/packages/amesos2/test/solvers/../matrices/A.mm ... Matrix Market reader: readSparse:
-- Reading banner line
-- Reading dimensions line
-- Making Adder for collecting matrix data
-- Reading matrix data
-- Successfully read the Matrix Market data
-- Tolerant mode: rebroadcasting matrix dimensions
----- Dimensions before: 10000 x 10000
----- Dimensions after: 10000 x 10000
-- Converting matrix data into CSR format on Proc 0
----- Proc 0: Matrix has numRows=10000 rows and numEntries=29998 entries.
----- Proc 0: numEntriesPerRow[0..9999] (only showing first and last few entries) = [2 3 ... 3 2]
----- Proc 0: rowPtr (only showing first and last few entries) = [0 2 ... 29993 29996 29998]
-- Making range, domain, and row maps
-- Distributing the matrix data
-- Proc 0: Copying my data from global arrays
-- Proc 0: I own 2500 rows and 7499 entries
-- Proc 0: Processing proc 1
-- Proc 0: Proc 1 owns 2500 rows
-- Proc 0: Proc 1 owns 7500 entries
-- Proc 0: Finished with proc 1
-- Proc 0: Processing proc 2
-- Proc 0: Proc 2 owns 2500 rows
-- Proc 0: Proc 2 owns 7500 entries
-- Proc 0: Finished with proc 2
-- Proc 0: Processing proc 3
-- Proc 0: Proc 3 owns 2500 rows
-- Proc 0: Proc 3 owns 7499 entries
-- Proc 0: Finished with proc 3
-- Proc 0: About to fill in myRowPtr
-- Proc 0: Done with distribute
-- Inserting matrix entries on each processor and calling fillComplete()
-- Done creating the CrsMatrix from the Matrix Market data
done
Tpetra::CrsMatrix (Kokkos refactor):
Template parameters:
Scalar: double
LocalOrdinal: int
GlobalOrdinal: int
Node: Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>
isFillComplete: true
Global dimensions: [10000, 10000]
Global number of entries: 29998
Global number of diagonal entries: 10000
Global max number of entries in a row: 3
Creating right-hand side and solution vectors
Creating near-copy of matrix for refactor test
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
passed
- Tpetra test succeeded
+ Testing with SuperLU_DIST passed
A.mm passed
========================================================================================================================
As far as I can see, this looks ok (but I may be missing something). As an experiment, I took a few of the test matrices in Amesos2 that are used for testing the SuperLU_Dist interface and ran them through the new Stratimikos Amesos2 SuperLU_Dist interface. For example, on orsirr_2.mtx, I get:
"/Volumes/Scratch/install/trilinos/mpich-3.1.4-static/bin/mpiexec" "-np" "1" "/Volumes/Scratch/build/trilinos/trilinosrepo/parAmesos2/packages/stratimikos/adapters/amesos2/test/Stratimikos_test_single_amesos2_tpetra_solver_driver.exe" "--show-all-tests" "--solver-type=SuperLU_DIST" "--verbose" "--matrix-file=/Volumes/Scratch/checkout/trilinosall/trilinosrepo/packages/amesos2/test/matrices/orsirr_2.mtx"
Teuchos::GlobalMPISession::GlobalMPISession(): started processor with name tabby.txcorp.com and rank 0!
***
*** Testing Thyra::BelosLinearOpWithSolveFactory (and Thyra::BelosLinearOpWithSolve)
***
Echoing input options:
matrixFile = /Volumes/Scratch/checkout/trilinosall/trilinosrepo/packages/amesos2/test/matrices/orsirr_2.mtx
numRhs = 1
numRandomVectors = 1
maxFwdError = 1e-14
maxResid = 1e-06
showAllTests = 1
dumpAll = 0
A) Reading in a tpetra matrix A from the file '/Volumes/Scratch/checkout/trilinosall/trilinosrepo/packages/amesos2/test/matrices/orsirr_2.mtx' ...
B) Creating an Amesos2LinearOpWithSolveFactory object opFactory ...
lowsFactory.getValidParameters():
Solver Type : string = KLU2
Refactorization Policy : string = RepivotOnRefactorization
Throw on Preconditioner Input : bool = 1
VerboseObject ->
Verbosity Level : string = default
Output File : string = none
amesos2LOWSFPL before setting parameters:
Solver Type : string = SuperLU_DIST [unused]
amesos2LOWSFPL after setting parameters:
Solver Type : string = SuperLU_DIST
Refactorization Policy : string = RepivotOnRefactorization [default]
Throw on Preconditioner Input : bool = 1 [default]
VerboseObject ->
Output File : string = none [default]
Verbosity Level : string = default [default]
C) Creating a Amesos2LinearOpWithSolve object nsA from A ...
D) Testing the LinearOpBase interface of nsA ...
*** Entering LinearOpTester<double,double>::check(op,...) ...
describe op:
Thyra::Amesos2LinearOpWithSolve<double>{rangeDim=886,domainDim=886}
fwdOp = Thyra::TpetraLinearOp<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >{rangeDim=886,domainDim=886}
amesos2Solver=Amesos2::Superludist<Tpetra::CrsMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false>, Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false> >
Checking the domain and range spaces ...
op.domain().get() != NULL ? passed
op.range().get() != NULL ? passed
this->check_linear_properties()==true:Checking the linear properties of the forward linear operator ... op.opSupported(NOTRANS) = true == true : passed
Checking that the forward operator is truly linear:
0.5*op*(v1 + v2) == 0.5*op*v1 + 0.5*op*v2
\_____/ \___/
v3 v5
\_____________/ \___________________/
v4 v5
sum(v4) == sum(v5)
Random vector tests = 1
v1 = randomize(-1,+1); ...
v2 = randomize(-1,+1); ...
v3 = v1 + v2 ...
v4 = 0.5*op*v3 ...
v5 = op*v1 ...
v5 = 0.5*op*v2 + 0.5*v5 ...
Check: rel_err(sum(v4), sum(v5))
= rel_err(-32673.4, -32673.4) = 1.55881e-15
<= linear_properties_error_tol() = 1e-14 : passed
Warning! rel_err(sum(v4), sum(v5))
= rel_err(-32673.4, -32673.4) = 1.55881e-15
>= linear_properties_warning_tol() = 1e-16!
(this->check_linear_properties()&&this->check_adjoint())==false: Skipping the check of the linear properties of the adjoint operator!
this->check_adjoint()==false: Skipping check for the agreement of the adjoint and forward operators!
this->check_for_symmetry()==false: Skipping check of symmetry ...
Congratulations, this LinearOpBase object seems to check out!
*** Leaving LinearOpTester<double,double>::check(...)
E) Testing the LinearOpWithSolveBase interface of nsA ...
*** Entering LinearOpWithSolveTester<double>::check(op,...) ...
describe forward op:
Thyra::Amesos2LinearOpWithSolve<double>{rangeDim=886,domainDim=886}
fwdOp = Thyra::TpetraLinearOp<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >{rangeDim=886,domainDim=886}
amesos2Solver=Amesos2::Superludist<Tpetra::CrsMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false>, Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false> >
this->check_forward_default()==true: Checking the default forward solve ... op.solveSupports(NOTRANS) = true == true : passed
Checking that the forward default solve matches the forward operator:
inv(Op)*Op*v1 == v1
\___/
v2
\___________/
v3
v4 = v3-v1
v5 = Op*v3-v2
norm(v4)/norm(v1) <= forward_default_solution_error_error_tol()
norm(v5)/norm(v2) <= forward_default_residual_error_tol()
Random vector tests = 1
v1 = randomize(-1,+1); ...
v2 = Op*v1 ...
=> Apply time = 1.0547e-05 sec
v3 = inv(Op)*v2 ...
Solving system using Amesos2 solver Amesos2::Superludist<Tpetra::CrsMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false>, Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false> > ...
=> Solve time = 0.00256535 sec
solve status:
solveStatus = SOLVE_STATUS_CONVERGED
achievedTol = unknownTolerance()
message:extraParameters: NONE
v4 = v3 - v1 ...
v5 = Op*v3 - v2 ...
=> Apply time = 4.5857e-05 sec
Check: |norm(v4)/norm(v1)| = 7.61771e-14 <= forward_default_solution_error_error_tol() = 1e-06 : passed
Check: |norm(v5)/norm(v2)| = 2.3389e-16 <= forward_default_residual_error_tol() = 2e-06 : passed
this->check_forward_residual()==true: Checking the forward solve with a tolerance on the residual ... op.solveSupports(NOTRANS) = true == true : passed
Checking that the forward solve matches the forward operator to a residual tolerance:
v3 = inv(Op)*Op*v1
\___/
v2
v4 = Op*v3-v2
norm(v4)/norm(v2) <= forward_residual_solve_tol() + forward_residual_slack_error_tol()
Random vector tests = 1
v1 = randomize(-1,+1); ...
v2 = Op*v1 ...
=> Apply time = 2.3606e-05 sec
v3 = inv(Op)*v2 ...
Solving system using Amesos2 solver Amesos2::Superludist<Tpetra::CrsMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false>, Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false> > ...
=> Solve time = 0.00089881 sec
solve status:
solveStatus = SOLVE_STATUS_CONVERGED
achievedTol = unknownTolerance()
message:extraParameters: NONE
check: solveStatus = SOLVE_STATUS_CONVERGED == SOLVE_STATUS_CONVERGED : passed
v4 = Op*v3 - v2 ...
=> Apply time = 1.0879e-05 sec
Check: |norm(v4)/norm(v2)| = 1.64104e-16 <= forward_residual_solve_tol()+forward_residual_slack_error_tol() = 2e-06 : passed
this->check_adjoint_default()==false: Skipping the check of the adjoint solve with a default tolerance!
this->check_adjoint_residual()==false: Skipping the check of the adjoint solve with a tolerance on the residual!
Congratulations, this LinearOpWithSolveBase object seems to check out!
*** Leaving LinearOpWithSolveTester<double>::check(...)
amesos2LOWSFPL after solving:
Solver Type : string = SuperLU_DIST
Refactorization Policy : string = RepivotOnRefactorization [default]
Throw on Preconditioner Input : bool = 1 [default]
VerboseObject ->
Output File : string = none [default]
Verbosity Level : string = default [default]
Congratulations! All of the tests checked out!
Similar are results are seen for other Amesos2 test matrices. This seems to indicate that for this matrix, SuperLU_Dist is having a problem and Amesos2 is not catching it during it's tests. In order to make progress on #1090, the way forward could be to use the current set of test matrices from Amesos2 to test the Stratimikos interface, while we investigate precisely what is happening with the matrix here. Once we have figured that out, we could restore this matrix back to the Stratimikos tests and add it to the Amesos2 tests.
In order to make progress on #1090, the way forward could be to use the current set of test matrices from Amesos2 to test the Stratimikos interface, while we investigate precisely what is happening with the matrix here
That approach would be fine with me
I just pushed this change as commit 066f244 to the branch for #1090. Indeed, the SuperLU_Dist test passes now. I'm not sure whether debugging the old matrix should be considered part of this issue, or if we should just close this one.
I just ran this matrix through the Stratimikos->Amesos->SuperLU_Dist test driver. It appears to pass there. Could you comment on whether you are performing additional tests in the Stratimikos->Amesos2 driver c.f. the Stratimikos->Amesos driver?
Just to be clear: you mean that ML's A.mm
matrix passes with the Stratimikos->Amesos->SuperLU_Dist codepath ? I tried to just copy that test in #1090, if it is doing something more that wasn't my intention.
@ibaned that appears to be the case, yes. We'll have to compare the Amesos/Amesos2 SuperLU_Dist implementations to understand what is going on in more detail and whether there's something about how we call SuperLU_Dist from Amesos2 that causes an issue with this matrix. For now though, I think the fact that the Amesos2 SuperLU_Dist test matrices pass when run through the Stratimikos->Amesos2->SuperLU_Dist interface is sufficient to demonstrate the Stratimikos driver.
Can we close this now ?
Took me a while to recall where we were, but yes I think so.
This is an unusual issue in that the relevant code is not yet in Trilinos at the time of posting, but it will help us to track problems with PR #1090 . If one checks out that code and compiles with Thyra, Tpetra, Amesos2, Stratimikos, KLU2, and SuperLUDist enabled, the following command:
Produces the following output:
It looks like the most outstanding issue is that
inv(A)*A*v != v
, by a large error (~0.2), in part (E) of the testing.@srajama1