trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.2k stars 563 forks source link

Stratimikos: Amesos2 SuperLU_DIST test failing #1092

Closed ibaned closed 7 years ago

ibaned commented 7 years ago

This is an unusual issue in that the relevant code is not yet in Trilinos at the time of posting, but it will help us to track problems with PR #1090 . If one checks out that code and compiles with Thyra, Tpetra, Amesos2, Stratimikos, KLU2, and SuperLUDist enabled, the following command:

ctest -VV -R Stratimikos_test_single_amesos2_tpetra_solver_driver_SuperLU_DIST_MPI_1

Produces the following output:

UpdateCTestConfiguration  from :/home/daibane/build/host/Trilinos/DartConfiguration.tcl
Parse Config file:/home/daibane/build/host/Trilinos/DartConfiguration.tcl
 Add coverage exclude regular expressions.
SetCTestConfiguration:CMakeCommand:/usr/local/bin/cmake
UpdateCTestConfiguration  from :/home/daibane/build/host/Trilinos/DartConfiguration.tcl
Parse Config file:/home/daibane/build/host/Trilinos/DartConfiguration.tcl
Test project /home/daibane/build/host/Trilinos
Constructing a list of tests
Done constructing a list of tests
Checking test dependency graph...
Checking test dependency graph end
test 13
    Start 13: Stratimikos_test_single_amesos2_tpetra_solver_driver_SuperLU_DIST_MPI_1

13: Test command: /home/daibane/install/host/mpich/bin/mpiexec "-np" "1" "/home/daibane/build/host/Trilinos/packages/stratimikos/adapters/amesos2/test/Stratimikos_test_single_amesos2_tpetra_solver_driver.exe" "--show-all-tests" "--solver-type=SuperLU_DIST" "--verbose" "--matrix-file=A.mm"
13: Test timeout computed to be: 1500
13: Teuchos::GlobalMPISession::GlobalMPISession(): started processor with name westley.srn.sandia.gov and rank 0!
13: 
13: ***
13: *** Testing Thyra::BelosLinearOpWithSolveFactory (and Thyra::BelosLinearOpWithSolve)
13: ***
13: 
13: Echoing input options:
13:   matrixFile             = A.mm
13:   numRhs                 = 1
13:   numRandomVectors       = 1
13:   maxFwdError            = 1e-14
13:   maxResid               = 1e-06
13:   showAllTests           = 1
13:   dumpAll                = 0
13: 
13: A) Reading in a tpetra matrix A from the file 'A.mm' ...
13: 
13: B) Creating an Amesos2LinearOpWithSolveFactory object opFactory ...
13: 
13: lowsFactory.getValidParameters():
13:  Solver Type : string = KLU2
13:  Refactorization Policy : string = RepivotOnRefactorization
13:  Throw on Preconditioner Input : bool = 1
13:  VerboseObject -> 
13:   Verbosity Level : string = default
13:   Output File : string = none
13: 
13: amesos2LOWSFPL before setting parameters:
13:  Solver Type : string = SuperLU_DIST   [unused]
13: 
13: amesos2LOWSFPL after setting parameters:
13:  Solver Type : string = SuperLU_DIST
13:  Refactorization Policy : string = RepivotOnRefactorization   [default]
13:  Throw on Preconditioner Input : bool = 1   [default]
13:  VerboseObject -> 
13:   Output File : string = none   [default]
13:   Verbosity Level : string = default   [default]
13: 
13: C) Creating a Amesos2LinearOpWithSolve object nsA from A ...
13: .. Use parMETIS ordering on A'+A with 1 sub-domains.
13:     Max szBlk          128
13:     Parameters: fill mem 5 fill pelt 5
13:     Nonzeros in L       29971
13:     Nonzeros in U       19971
13:     nonzeros in L+U-I   49942
13:     No of supers   9990
13:     Size of G(L)   29952
13:     Size of G(U)   19962
13:     Size of G(L+U) 49914
13:     ParSYMBfact (MB)      : L\U MAX 0.68    AVG 0.68
13: .. # L blocks 29933 # U blocks 19943
13: MPI tag upper bound = 268435455
13: .. Starting with 1 OpenMP threads 
13:  === using DAG ===
13:  * init: 3.021002e-03 seconds
13: .. thresh = s_eps 5.960464e-08 * anorm 3.999800e+04 = 2.384067e-03
13: .. Buffer size: Lsub 11 Lval 9  Usub 11 Uval 2  LDA 3
13: [0] .. BIG U size 3072 
13: [0] .. BIG V size 131072
13:   Max row size is 3 
13:   Using buffer_size of 5000000 
13:   Threads per process 1 
13: Time in scattering 0.000000 
13: Time in dgemm 0.000000 
13: Total time spent in schur update is         :  0.01 seconds,
13: Total Time in Factorization                 :  0.02 seconds, 
13: Time (other GEMM and Scatter)               :  0.02 seconds, 
13: Total time spent in schur update when offload           :  0.00 seconds,
13: 
13: D) Testing the LinearOpBase interface of nsA ...
13:  
13:  *** Entering LinearOpTester<double,double>::check(op,...) ...
13:  
13:  describe op:
13:  Thyra::Amesos2LinearOpWithSolve<double>{rangeDim=10000,domainDim=10000}
13:   fwdOp = Thyra::TpetraLinearOp<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >{rangeDim=10000,domainDim=10000}
13:   amesos2Solver=Amesos2::Superludist<Tpetra::CrsMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false>, Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false> >
13:  
13:  Checking the domain and range spaces ... 
13:  op.domain().get() != NULL ? passed
13:  
13:  op.range().get() != NULL ? passed
13:  
13:  this->check_linear_properties()==true:Checking the linear properties of the forward linear operator ... op.opSupported(NOTRANS) = true == true : passed
13:  
13:  Checking that the forward operator is truly linear:
13:  
13:    0.5*op*(v1 + v2) == 0.5*op*v1 + 0.5*op*v2
13:            \_____/         \___/
13:               v3            v5
13:    \_____________/     \___________________/
13:           v4                    v5
13:  
13:             sum(v4) == sum(v5)
13:  
13:  Random vector tests = 1
13:   
13:   v1 = randomize(-1,+1); ...
13:   
13:   v2 = randomize(-1,+1); ...
13:   
13:   v3 = v1 + v2 ...
13:   
13:   v4 = 0.5*op*v3 ...
13:   
13:   v5 = op*v1 ...
13:   
13:   v5 = 0.5*op*v2 + 0.5*v5 ...
13:   
13:   Check: rel_err(sum(v4), sum(v5))
13:          = rel_err(-0.37757, -0.37757) = 3.23449e-15
13:            <= linear_properties_error_tol() = 1e-14 : passed
13:   Warning! rel_err(sum(v4), sum(v5))
13:          = rel_err(-0.37757, -0.37757) = 3.23449e-15
13:            >= linear_properties_warning_tol() = 1e-16!
13:  
13:  (this->check_linear_properties()&&this->check_adjoint())==false: Skipping the check of the linear properties of the adjoint operator!
13:  
13:  this->check_adjoint()==false: Skipping check for the agreement of the adjoint and forward operators!
13:  
13:  this->check_for_symmetry()==false: Skipping check of symmetry ...
13:  
13:  Congratulations, this LinearOpBase object seems to check out!
13:  
13:  *** Leaving LinearOpTester<double,double>::check(...)
13: 
13: E) Testing the LinearOpWithSolveBase interface of nsA ...
13:  
13:  *** Entering LinearOpWithSolveTester<double>::check(op,...) ...
13:  
13:  describe forward op:
13:  Thyra::Amesos2LinearOpWithSolve<double>{rangeDim=10000,domainDim=10000}
13:   fwdOp = Thyra::TpetraLinearOp<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >{rangeDim=10000,domainDim=10000}
13:   amesos2Solver=Amesos2::Superludist<Tpetra::CrsMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false>, Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false> >
13:  
13:  this->check_forward_default()==true: Checking the default forward solve ... op.solveSupports(NOTRANS) = true == true : passed
13:  
13:  Checking that the forward default solve matches the forward operator:
13:  
13:    inv(Op)*Op*v1 == v1
13:            \___/
13:             v2
13:    \___________/
13:           v3
13:  
13:    v4 = v3-v1
13:    v5 = Op*v3-v2
13:  
13:    norm(v4)/norm(v1) <= forward_default_solution_error_error_tol()
13:    norm(v5)/norm(v2) <= forward_default_residual_error_tol()
13:   
13:   Random vector tests = 1
13:    
13:    v1 = randomize(-1,+1); ...
13:    
13:    v2 = Op*v1 ...
13:     
13:     => Apply time = 8.10623e-05 sec
13:    
13:    v3 = inv(Op)*v2 ...
13:    
13:    Solving system using Amesos2 solver Amesos2::Superludist<Tpetra::CrsMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false>, Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false> > ...
13:    
13:     
13:     => Solve time = 0.007236 sec
13:    
13:    solve status:
13:      solveStatus = SOLVE_STATUS_CONVERGED
13:      achievedTol = unknownTolerance()
13:      message:extraParameters: NONE
13:    
13:    v4 = v3 - v1 ...
13:    
13:    v5 = Op*v3 - v2 ...
13:     
13:     => Apply time = 7.10487e-05 sec
13:    
13:    Check: |norm(v4)/norm(v1)| = 0.29299 <= forward_default_solution_error_error_tol() = 1e-06 : FAILED
13:    
13:    Check: |norm(v5)/norm(v2)| = 5.91491e-06 <= forward_default_residual_error_tol() = 2e-06 : FAILED
13:  
13:  this->check_forward_residual()==true: Checking the forward solve with a tolerance on the residual ... op.solveSupports(NOTRANS) = true == true : passed
13:  
13:  Checking that the forward solve matches the forward operator to a residual tolerance:
13:  
13:    v3 = inv(Op)*Op*v1
13:                 \___/
13:                   v2
13:  
13:    v4 = Op*v3-v2
13:  
13:    norm(v4)/norm(v2) <= forward_residual_solve_tol() + forward_residual_slack_error_tol()
13:   
13:   Random vector tests = 1
13:     
13:     v1 = randomize(-1,+1); ...
13:     
13:     v2 = Op*v1 ...
13:      
13:      => Apply time = 6.79493e-05 sec
13:     
13:     v3 = inv(Op)*v2 ...
13:     
13:     Solving system using Amesos2 solver Amesos2::Superludist<Tpetra::CrsMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false>, Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false> > ...
13:     
13:      
13:      => Solve time = 0.0063262 sec
13:     
13:     solve status:
13:       solveStatus = SOLVE_STATUS_CONVERGED
13:       achievedTol = unknownTolerance()
13:       message:extraParameters: NONE
13:     
13:     check: solveStatus = SOLVE_STATUS_CONVERGED == SOLVE_STATUS_CONVERGED : passed
13:     
13:     v4 = Op*v3 - v2 ...
13:      
13:      => Apply time = 7.00951e-05 sec
13:     
13:     Check: |norm(v4)/norm(v2)| = 6.72255e-06 <= forward_residual_solve_tol()+forward_residual_slack_error_tol() = 2e-06 : FAILED
13:  
13:  this->check_adjoint_default()==false: Skipping the check of the adjoint solve with a default tolerance!
13:  
13:  this->check_adjoint_residual()==false: Skipping the check of the adjoint solve with a tolerance on the residual!
13:  
13:  Oh no, at least one of the tests performed with this LinearOpWithSolveBase object failed (see above failures)!
13:  
13:  *** Leaving LinearOpWithSolveTester<double>::check(...)
13: 
13: amesos2LOWSFPL after solving:
13:  Solver Type : string = SuperLU_DIST
13:  Refactorization Policy : string = RepivotOnRefactorization   [default]
13:  Throw on Preconditioner Input : bool = 1   [default]
13:  VerboseObject -> 
13:   Output File : string = none   [default]
13:   Verbosity Level : string = default   [default]
13: 
13: Oh no! At least one of the tests failed!
1/1 Test #13: Stratimikos_test_single_amesos2_tpetra_solver_driver_SuperLU_DIST_MPI_1 ...***Failed    0.44 sec

0% tests passed, 1 tests failed out of 1

Label Time Summary:
Stratimikos    =   0.44 sec (1 test)

Total Test time (real) =   0.48 sec

The following tests FAILED:
     13 - Stratimikos_test_single_amesos2_tpetra_solver_driver_SuperLU_DIST_MPI_1 (Failed)
Errors while running CTest

It looks like the most outstanding issue is that inv(A)*A*v != v, by a large error (~0.2), in part (E) of the testing.

@srajama1

mhoemmen commented 7 years ago

@trilinos/amesos2

srajama1 commented 7 years ago

@ibaned : I am curious. Does it work with other solvers (KLU2) ? Even if it does, the problem could be in the interface to SuperLU_Dist ? Can you dump the matrix for us ?

ibaned commented 7 years ago

Correct, the KLU2 test passes, which is why I think it might be the Amesos2 -> SuperLU_Dist interface rather than the new Stratimikos -> Amesos2 interface proposed in the PR. I'll try to dump the matrix, but I have to rebuild the code first...

ibaned commented 7 years ago

Actually, no need, the matrix just comes from this file: https://github.com/trilinos/Trilinos/blob/master/packages/ml/examples/BasicExamples/A.mm

srajama1 commented 7 years ago

@ibaned : Can we get some info on SuperLU_Dist version, configure options etc ?

ibaned commented 7 years ago

@srajama1 I made a few CMake fixes to the branch while getting this set up. I'm using SuperLU_Dist 5.3.1, but I'm positive I went back and used the older version that Trilinos is originally compatible with, so you should see the same failure either way. Here is my configure script:

#!/bin/bash -ex

MPI_BASE_DIR=$HOME/install/gcc/mpich
BOOST_DIR=$HOME/install/gcc/boost
NETCDF_DIR=$HOME/install/gcc/netcdf
HDF5_DIR=$HOME/install/gcc/hdf5
PARMETIS_DIR=$HOME/install/gcc/parmetis
SUPERLUDIST_DIR=$HOME/install/gcc/SuperLU_DIST

cmake $HOME/src/Trilinos-superludist \
-DCMAKE_INSTALL_PREFIX:PATH=$HOME/install/gcc/Trilinos-superludist \
-DCMAKE_BUILD_TYPE:STRING=NONE \
-DBUILD_SHARED_LIBS:BOOL=ON \
-DTPL_FIND_SHARED_LIBS:BOOL=ON \
-DTPL_ENABLE_MPI:BOOL=ON \
-DMPI_BASE_DIR:PATH=${MPI_BASE_DIR} \
-DCMAKE_CXX_COMPILER:FILEPATH=${MPI_BASE_DIR}/bin/mpicxx \
-DCMAKE_C_COMPILER:FILEPATH=${MPI_BASE_DIR}/bin/mpicc \
-DTrilinos_ENABLE_Fortran:BOOL=OFF \
-DCMAKE_CXX_FLAGS:STRING='-O3 -g' \
-DCMAKE_C_FLAGS:STRING='-O3 -g' \
 \
-DTrilinos_ENABLE_ALL_PACKAGES:BOOL=OFF \
-DTrilinos_ENABLE_ALL_OPTIONAL_PACKAGES:BOOL=OFF \
 \
-DTrilinos_ENABLE_Teuchos:BOOL=ON \
-DTeuchos_ENABLE_LONG_LONG_INT:BOOL=ON \
 \
-DTrilinos_ENABLE_Tpetra:BOOL=ON \
-DTpetra_INST_INT_LONG_LONG:BOOL=ON \
-DTpetra_INST_INT_INT:BOOL=ON \
-DTpetra_INST_DOUBLE:BOOL=ON \
-DTpetra_INST_FLOAT:BOOL=OFF \
-DTpetra_INST_COMPLEX_FLOAT:BOOL=OFF \
-DTpetra_INST_COMPLEX_DOUBLE:BOOL=OFF \
-DTpetra_INST_INT_LONG:BOOL=OFF \
-DTpetra_INST_INT_UNSIGNED:BOOL=OFF \
 \
-DTPL_ENABLE_Boost:BOOL=ON \
-DTPL_ENABLE_BoostLib:BOOL=ON \
-DBoost_INCLUDE_DIRS:PATH=$BOOST_DIR/include \
-DBoost_LIBRARY_DIRS:PATH=$BOOST_DIR/lib \
-DBoostLib_INCLUDE_DIRS:PATH=$BOOST_DIR/include \
-DBoostLib_LIBRARY_DIRS:PATH=$BOOST_DIR/lib \
 \
-DTPL_ENABLE_Zlib:BOOL=ON \
 \
-DTPL_ENABLE_HDF5:BOOL=ON \
-DHDF5_INCLUDE_DIRS:PATH=$HDF5_DIR/include \
-DTPL_HDF5_LIBRARIES:STRING='/home/daibane/install/gcc/hdf5/lib/libhdf5.so;/home/daibane/install/gcc/hdf5/lib/libhdf5_hl.so' \
 \
-DTPL_ENABLE_ParMETIS:BOOL=ON \
-DParMETIS_INCLUDE_DIRS:PATH="$PARMETIS_DIR/include" \
-DParMETIS_LIBRARY_DIRS:PATH="$PARMETIS_DIR/lib" \
 \
-DTPL_ENABLE_SuperLUDist:BOOL=ON \
-DSuperLUDist_INCLUDE_DIRS:PATH="$SUPERLUDIST_DIR/include" \
-DSuperLUDist_LIBRARY_DIRS:PATH="$SUPERLUDIST_DIR/lib" \
 \
-DTrilinos_ENABLE_Kokkos:BOOL=ON \
-DTrilinos_ENABLE_KokkosCore:BOOL=ON \
-DTrilinos_ENABLE_KokkosContainers:BOOL=ON \
-DTrilinos_ENABLE_KokkosExample:BOOL=OFF \
-DKokkos_ENABLE_Serial:BOOL=ON \
-DKokkos_ENABLE_OpenMP:BOOL=OFF \
-DKokkos_ENABLE_Pthread:BOOL=OFF \
-DKokkos_ENABLE_Cuda:BOOL=OFF \
-DTPL_ENABLE_CUDA:BOOL=OFF \
 \
-DTrilinos_ENABLE_Amesos2:BOOL=ON \
-DAmesos2_ENABLE_KLU2:BOOL=ON \
\
-DTrilinos_ENABLE_EpetraExt:BOOL=ON \
-DTrilinos_ENABLE_ThyraTpetraAdapters:BOOL=ON \
 \
-DTrilinos_ENABLE_Stratimikos:BOOL=ON \
-DStratimikos_ENABLE_TESTS:BOOL=ON \
\
2>&1 | tee config_log

Its a trimmed down version of another script, so some TPLs may not be needed. You should get three tests, with results as follows:

Test project /home/daibane/build/gcc/Trilinos-superludist
    Start 1: Stratimikos_test_single_amesos2_tpetra_solver_driver_KLU2_MPI_1
1/3 Test #1: Stratimikos_test_single_amesos2_tpetra_solver_driver_KLU2_MPI_1 ...........   Passed    0.17 sec
    Start 2: Stratimikos_test_single_amesos2_tpetra_solver_driver_SuperLU_DIST_MPI_1
2/3 Test #2: Stratimikos_test_single_amesos2_tpetra_solver_driver_SuperLU_DIST_MPI_1 ...***Failed    0.29 sec
    Start 3: Stratimikos_ValidParameters_MPI_1
3/3 Test #3: Stratimikos_ValidParameters_MPI_1 .........................................   Passed    0.03 sec
krcb commented 7 years ago

After some fighting with SuperLU_Dist, I've finally got to the point where we can investigate further. I'm able to reproduce the SuperLU_Dist failure; however, the KLU2 test also fails for me:

D) Testing the LinearOpBase interface of nsA ...

 *** Entering LinearOpTester<double,double>::check(op,...) ...

 describe op:
 Thyra::Amesos2LinearOpWithSolve<double>{rangeDim=10000,domainDim=10000}
  fwdOp = Thyra::TpetraLinearOp<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >{rangeDim=10000,domainDim=10000}
  amesos2Solver=Amesos2::KLU2<Tpetra::CrsMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false>, Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false> >

 Checking the domain and range spaces ... 
 op.domain().get() != NULL ? passed

 op.range().get() != NULL ? passed

 this->check_linear_properties()==true:Checking the linear properties of the forward linear operator ... op.opSupported(NOTRANS) = true == true : passed

 Checking that the forward operator is truly linear:

   0.5*op*(v1 + v2) == 0.5*op*v1 + 0.5*op*v2
           \_____/         \___/
              v3            v5
   \_____________/     \___________________/
          v4                    v5

            sum(v4) == sum(v5)

 Random vector tests = 1

  v1 = randomize(-1,+1); ...

  v2 = randomize(-1,+1); ...

  v3 = v1 + v2 ...

  v4 = 0.5*op*v3 ...

  v5 = op*v1 ...

  v5 = 0.5*op*v2 + 0.5*v5 ...

  Check: rel_err(sum(v4), sum(v5))
         = rel_err(1.23077, 1.23077) = 3.42782e-14
           <= linear_properties_error_tol() = 1e-14 : FAILED

It's not clear to me how significant this failure is though as 1e-14 is fairly strict tolerance and the test result misses it by a factor of a few. (This is on OS X with Clang 7.3.1+mpich 3.1.4).

ibaned commented 7 years ago

@krcb thanks for looking into this ! I agree that the KLU2 failure looks too borderline to be significant. I think there are options you can pass to this test system to change that tolerance, and raising it to 1e-12 so you can get through KLU2 and focus on SuperLU_Dist makes sense to me.

krcb commented 7 years ago

I've run the provided matrix through the Amesos2 SuperLU_Dist test driver. The output is as follows:

Test matrix A.mm ... 
  | with SuperLU_DIST : 
    Testing Tpetra objects
    Doing tpetra test run `run0' with s=double lo=int go=int ... 
Running test with types S=double, LO=int, GO=int, N=Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>

      Reading matrix from /Volumes/Scratch/build/trilinos/trilinosrepo/parAmesos2/packages/amesos2/test/solvers/../matrices/A.mm ... Matrix Market reader: readSparse:
-- Reading banner line
-- Reading dimensions line
-- Making Adder for collecting matrix data
-- Reading matrix data
-- Successfully read the Matrix Market data
-- Tolerant mode: rebroadcasting matrix dimensions
----- Dimensions before: 10000 x 10000
----- Dimensions after: 10000 x 10000
-- Converting matrix data into CSR format on Proc 0
----- Proc 0: Matrix has numRows=10000 rows and numEntries=29998 entries.
----- Proc 0: numEntriesPerRow[0..9999] (only showing first and last few entries) = [2 3 ... 3 2]
----- Proc 0: rowPtr (only showing first and last few entries) = [0 2 ... 29993 29996 29998]
-- Making range, domain, and row maps
-- Distributing the matrix data
-- Proc 0: Copying my data from global arrays
-- Proc 0: I own 2500 rows and 7499 entries
-- Proc 0: Processing proc 1
-- Proc 0: Proc 1 owns 2500 rows
-- Proc 0: Proc 1 owns 7500 entries
-- Proc 0: Finished with proc 1
-- Proc 0: Processing proc 2
-- Proc 0: Proc 2 owns 2500 rows
-- Proc 0: Proc 2 owns 7500 entries
-- Proc 0: Finished with proc 2
-- Proc 0: Processing proc 3
-- Proc 0: Proc 3 owns 2500 rows
-- Proc 0: Proc 3 owns 7499 entries
-- Proc 0: Finished with proc 3
-- Proc 0: About to fill in myRowPtr
-- Proc 0: Done with distribute
-- Inserting matrix entries on each processor and calling fillComplete()
-- Done creating the CrsMatrix from the Matrix Market data
done
 Tpetra::CrsMatrix (Kokkos refactor):
  Template parameters:
   Scalar: double
   LocalOrdinal: int
   GlobalOrdinal: int
   Node: Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>
  isFillComplete: true
  Global dimensions: [10000, 10000]
  Global number of entries: 29998
  Global number of diagonal entries: 10000
  Global max number of entries in a row: 3

      Creating right-hand side and solution vectors

      Creating near-copy of matrix for refactor test
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
Comparing true_solution == given_solution ... passed
passed
      - Tpetra test succeeded
  + Testing with SuperLU_DIST passed
A.mm passed
========================================================================================================================

As far as I can see, this looks ok (but I may be missing something). As an experiment, I took a few of the test matrices in Amesos2 that are used for testing the SuperLU_Dist interface and ran them through the new Stratimikos Amesos2 SuperLU_Dist interface. For example, on orsirr_2.mtx, I get:

"/Volumes/Scratch/install/trilinos/mpich-3.1.4-static/bin/mpiexec" "-np" "1" "/Volumes/Scratch/build/trilinos/trilinosrepo/parAmesos2/packages/stratimikos/adapters/amesos2/test/Stratimikos_test_single_amesos2_tpetra_solver_driver.exe" "--show-all-tests" "--solver-type=SuperLU_DIST" "--verbose" "--matrix-file=/Volumes/Scratch/checkout/trilinosall/trilinosrepo/packages/amesos2/test/matrices/orsirr_2.mtx"
Teuchos::GlobalMPISession::GlobalMPISession(): started processor with name tabby.txcorp.com and rank 0!

***
*** Testing Thyra::BelosLinearOpWithSolveFactory (and Thyra::BelosLinearOpWithSolve)
***

Echoing input options:
  matrixFile             = /Volumes/Scratch/checkout/trilinosall/trilinosrepo/packages/amesos2/test/matrices/orsirr_2.mtx
  numRhs                 = 1
  numRandomVectors       = 1
  maxFwdError            = 1e-14
  maxResid               = 1e-06
  showAllTests           = 1
  dumpAll                = 0

A) Reading in a tpetra matrix A from the file '/Volumes/Scratch/checkout/trilinosall/trilinosrepo/packages/amesos2/test/matrices/orsirr_2.mtx' ...

B) Creating an Amesos2LinearOpWithSolveFactory object opFactory ...

lowsFactory.getValidParameters():
 Solver Type : string = KLU2
 Refactorization Policy : string = RepivotOnRefactorization
 Throw on Preconditioner Input : bool = 1
 VerboseObject -> 
  Verbosity Level : string = default
  Output File : string = none

amesos2LOWSFPL before setting parameters:
 Solver Type : string = SuperLU_DIST   [unused]

amesos2LOWSFPL after setting parameters:
 Solver Type : string = SuperLU_DIST
 Refactorization Policy : string = RepivotOnRefactorization   [default]
 Throw on Preconditioner Input : bool = 1   [default]
 VerboseObject -> 
  Output File : string = none   [default]
  Verbosity Level : string = default   [default]

C) Creating a Amesos2LinearOpWithSolve object nsA from A ...

D) Testing the LinearOpBase interface of nsA ...

 *** Entering LinearOpTester<double,double>::check(op,...) ...

 describe op:
 Thyra::Amesos2LinearOpWithSolve<double>{rangeDim=886,domainDim=886}
  fwdOp = Thyra::TpetraLinearOp<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >{rangeDim=886,domainDim=886}
  amesos2Solver=Amesos2::Superludist<Tpetra::CrsMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false>, Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false> >

 Checking the domain and range spaces ... 
 op.domain().get() != NULL ? passed

 op.range().get() != NULL ? passed

 this->check_linear_properties()==true:Checking the linear properties of the forward linear operator ... op.opSupported(NOTRANS) = true == true : passed

 Checking that the forward operator is truly linear:

   0.5*op*(v1 + v2) == 0.5*op*v1 + 0.5*op*v2
           \_____/         \___/
              v3            v5
   \_____________/     \___________________/
          v4                    v5

            sum(v4) == sum(v5)

 Random vector tests = 1

  v1 = randomize(-1,+1); ...

  v2 = randomize(-1,+1); ...

  v3 = v1 + v2 ...

  v4 = 0.5*op*v3 ...

  v5 = op*v1 ...

  v5 = 0.5*op*v2 + 0.5*v5 ...

  Check: rel_err(sum(v4), sum(v5))
         = rel_err(-32673.4, -32673.4) = 1.55881e-15
           <= linear_properties_error_tol() = 1e-14 : passed
  Warning! rel_err(sum(v4), sum(v5))
         = rel_err(-32673.4, -32673.4) = 1.55881e-15
           >= linear_properties_warning_tol() = 1e-16!

 (this->check_linear_properties()&&this->check_adjoint())==false: Skipping the check of the linear properties of the adjoint operator!

 this->check_adjoint()==false: Skipping check for the agreement of the adjoint and forward operators!

 this->check_for_symmetry()==false: Skipping check of symmetry ...

 Congratulations, this LinearOpBase object seems to check out!

 *** Leaving LinearOpTester<double,double>::check(...)

E) Testing the LinearOpWithSolveBase interface of nsA ...

 *** Entering LinearOpWithSolveTester<double>::check(op,...) ...

 describe forward op:
 Thyra::Amesos2LinearOpWithSolve<double>{rangeDim=886,domainDim=886}
  fwdOp = Thyra::TpetraLinearOp<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >{rangeDim=886,domainDim=886}
  amesos2Solver=Amesos2::Superludist<Tpetra::CrsMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false>, Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false> >

 this->check_forward_default()==true: Checking the default forward solve ... op.solveSupports(NOTRANS) = true == true : passed

 Checking that the forward default solve matches the forward operator:

   inv(Op)*Op*v1 == v1
           \___/
            v2
   \___________/
          v3

   v4 = v3-v1
   v5 = Op*v3-v2

   norm(v4)/norm(v1) <= forward_default_solution_error_error_tol()
   norm(v5)/norm(v2) <= forward_default_residual_error_tol()

  Random vector tests = 1

   v1 = randomize(-1,+1); ...

   v2 = Op*v1 ...

    => Apply time = 1.0547e-05 sec

   v3 = inv(Op)*v2 ...

   Solving system using Amesos2 solver Amesos2::Superludist<Tpetra::CrsMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false>, Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false> > ...

    => Solve time = 0.00256535 sec

   solve status:
     solveStatus = SOLVE_STATUS_CONVERGED
     achievedTol = unknownTolerance()
     message:extraParameters: NONE

   v4 = v3 - v1 ...

   v5 = Op*v3 - v2 ...

    => Apply time = 4.5857e-05 sec

   Check: |norm(v4)/norm(v1)| = 7.61771e-14 <= forward_default_solution_error_error_tol() = 1e-06 : passed

   Check: |norm(v5)/norm(v2)| = 2.3389e-16 <= forward_default_residual_error_tol() = 2e-06 : passed

 this->check_forward_residual()==true: Checking the forward solve with a tolerance on the residual ... op.solveSupports(NOTRANS) = true == true : passed

 Checking that the forward solve matches the forward operator to a residual tolerance:

   v3 = inv(Op)*Op*v1
                \___/
                  v2

   v4 = Op*v3-v2

   norm(v4)/norm(v2) <= forward_residual_solve_tol() + forward_residual_slack_error_tol()

  Random vector tests = 1

    v1 = randomize(-1,+1); ...

    v2 = Op*v1 ...

     => Apply time = 2.3606e-05 sec

    v3 = inv(Op)*v2 ...

    Solving system using Amesos2 solver Amesos2::Superludist<Tpetra::CrsMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false>, Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false> > ...

     => Solve time = 0.00089881 sec

    solve status:
      solveStatus = SOLVE_STATUS_CONVERGED
      achievedTol = unknownTolerance()
      message:extraParameters: NONE

    check: solveStatus = SOLVE_STATUS_CONVERGED == SOLVE_STATUS_CONVERGED : passed

    v4 = Op*v3 - v2 ...

     => Apply time = 1.0879e-05 sec

    Check: |norm(v4)/norm(v2)| = 1.64104e-16 <= forward_residual_solve_tol()+forward_residual_slack_error_tol() = 2e-06 : passed

 this->check_adjoint_default()==false: Skipping the check of the adjoint solve with a default tolerance!

 this->check_adjoint_residual()==false: Skipping the check of the adjoint solve with a tolerance on the residual!

 Congratulations, this LinearOpWithSolveBase object seems to check out!

 *** Leaving LinearOpWithSolveTester<double>::check(...)

amesos2LOWSFPL after solving:
 Solver Type : string = SuperLU_DIST
 Refactorization Policy : string = RepivotOnRefactorization   [default]
 Throw on Preconditioner Input : bool = 1   [default]
 VerboseObject -> 
  Output File : string = none   [default]
  Verbosity Level : string = default   [default]

Congratulations! All of the tests checked out!

Similar are results are seen for other Amesos2 test matrices. This seems to indicate that for this matrix, SuperLU_Dist is having a problem and Amesos2 is not catching it during it's tests. In order to make progress on #1090, the way forward could be to use the current set of test matrices from Amesos2 to test the Stratimikos interface, while we investigate precisely what is happening with the matrix here. Once we have figured that out, we could restore this matrix back to the Stratimikos tests and add it to the Amesos2 tests.

ibaned commented 7 years ago

In order to make progress on #1090, the way forward could be to use the current set of test matrices from Amesos2 to test the Stratimikos interface, while we investigate precisely what is happening with the matrix here

That approach would be fine with me

ibaned commented 7 years ago

I just pushed this change as commit 066f244 to the branch for #1090. Indeed, the SuperLU_Dist test passes now. I'm not sure whether debugging the old matrix should be considered part of this issue, or if we should just close this one.

krcb commented 7 years ago

I just ran this matrix through the Stratimikos->Amesos->SuperLU_Dist test driver. It appears to pass there. Could you comment on whether you are performing additional tests in the Stratimikos->Amesos2 driver c.f. the Stratimikos->Amesos driver?

ibaned commented 7 years ago

Just to be clear: you mean that ML's A.mm matrix passes with the Stratimikos->Amesos->SuperLU_Dist codepath ? I tried to just copy that test in #1090, if it is doing something more that wasn't my intention.

krcb commented 7 years ago

@ibaned that appears to be the case, yes. We'll have to compare the Amesos/Amesos2 SuperLU_Dist implementations to understand what is going on in more detail and whether there's something about how we call SuperLU_Dist from Amesos2 that causes an issue with this matrix. For now though, I think the fact that the Amesos2 SuperLU_Dist test matrices pass when run through the Stratimikos->Amesos2->SuperLU_Dist interface is sufficient to demonstrate the Stratimikos driver.

srajama1 commented 7 years ago

Can we close this now ?

ibaned commented 7 years ago

Took me a while to recall where we were, but yes I think so.