hypre-space / hypre

Parallel solvers for sparse linear systems featuring multigrid methods.
https://www.llnl.gov/casc/hypre/
Other
675 stars 187 forks source link

GMRES fail to converge #351

Closed DragonDlut closed 3 years ago

DragonDlut commented 3 years ago

Hello, I am developing a nonlinear CSD code with FEM. I tried to use GMRES in Hypre to solve the linear system. Before adding the subroutine to my CSD code, I have added it to a FEM based CFD code and it works well to solve Poisson equation about pressure. When I use the same subroutine in my CSD code, GMRES fail to converge. Below is my subroutine and convergence history: subroutine fem_matrix_solve(mat_a,mat_x,mat_b,nc,op) type(fem_matrix)::mat_a type(fem_vector)::mat_x,mat_b integer nc integer,optional::op !---- FROM EXAMPLE-5F integer HYPRE_PARCSR parameter (HYPRE_PARCSR=5555) integer ilower, iupper integer8 parcsr_A integer8 A integer8 b integer8 x integer8 par_b integer8 par_x integer8 solver integer8 precond double precision rhs integer num_iterations double precision final_res_norm !---------------------- INTEGER I,J,K,col,row real*8 t1,t2

!C LOCAL SIZE ilower=nc(nod_s-1)+1-1 iupper=ncnod_e-1 if(present(op)) then write(,) myid,nod_s,nod_e,ilower, iupper,nc stop end if !c Create the matrix. call HYPRE_IJMatrixCreate(mpi_comm, ilower,iupper, ilower, iupper, A, ierr) call HYPRE_IJMatrixSetObjectType(A, HYPRE_PARCSR, ierr) call HYPRE_IJMatrixInitialize(A, ierr)

!c Create the rhs and solution call HYPRE_IJVectorCreate(mpi_comm,ilower, iupper, b, ierr) call HYPRE_IJVectorSetObjectType(b, HYPRE_PARCSR, ierr) call HYPRE_IJVectorInitialize(b, ierr)

   call HYPRE_IJVectorCreate(mpi_comm,ilower, iupper, x, ierr)
   call HYPRE_IJVectorSetObjectType(x, HYPRE_PARCSR, ierr)
   call HYPRE_IJVectorInitialize(x, ierr)

!C Assemble do i=nc(nod_s-1)+1,ncnod_e col=i-1 do j=mat_a%mat_entrance(i),mat_a%mat_entrance(i+1)-1 row=mat_a%mat_location(j)-1 call HYPRE_IJMatrixSetValues(A, 1, 1, col, row, mat_a%mat_value(j), ierr) end do call HYPRE_IJVectorSetValues(b, 1, col, mat_b%vec_value(i), ierr) call HYPRE_IJVectorSetValues(x, 1, col, 0.0d0, ierr) end do

  CALL MPI_BARRIER(mpi_comm,IERR)

  call HYPRE_IJMatrixAssemble(A, ierr)
  call HYPRE_IJMatrixGetObject(A, parcsr_A, ierr)
  call HYPRE_IJVectorAssemble(b, ierr)
  call HYPRE_IJVectorAssemble(x, ierr)
  call HYPRE_IJVectorGetObject(b, par_b, ierr)
  call HYPRE_IJVectorGetObject(x, par_x, ierr)

  if(myid==0) write(*,*)"     ------Hypre Matrix Setting Finished "
     t1=mpi_wtime()
    call HYPRE_ParCSRGMRESCreate(MPI_COMM, solver, ierr)
    call HYPRE_ParCSRGMRESSetKDim(solver,10, ierr)
    call HYPRE_ParCSRGMRESSetMaxIter(solver, 100, ierr)
    call HYPRE_ParCSRGMRESSetTol(solver,1.0d-7, ierr)
    call HYPRE_ParCSRGMRESSetPrintLevel(solver,2, ierr)  

    call HYPRE_ParCSRPilutCreate(MPI_COMM,precond, ierr) 
    call HYPRE_ParCSRPilutSetDropToleran(precond,1.0d-7, ierr)

    call HYPRE_ParCSRGMRESSetPrecond(solver, 3,precond, ierr)

    call HYPRE_ParCSRGMRESSetup(solver, parcsr_A, par_b,par_x, ierr)
    call HYPRE_ParCSRGMRESSolve(solver, parcsr_A, par_b,par_x, ierr)
    call HYPRE_ParCSRGMRESGetNumIteratio(solver,num_iterations, ierr)
    call HYPRE_ParCSRGMRESGetFinalRelati(solver,final_res_norm, ierr)

     if(myid==0) write(*,*) " Final Relative Residual Norm = ", final_res_norm,"Iter=",num_iterations
     if(num_iterations==100) then
       if(myid==0) write(*,*) "Something Wrong with Hypre---Not Converge"
       call MPI_stop()
     end if
     t2=mpi_wtime()
     if(myid==0) write(*,*) "Time for Solve=",t2-t1

!c Destroy solver call HYPRE_ParCSRPilutDestroy(precond, ierr) call HYPRE_ParCSRGMRESDestroy(solver, ierr) !c copy answer do i=nc(nod_s-1)+1,ncnod_e col=i-1 call HYPRE_IJVectorGetValues(x, 1, col, mat_x%vec_value(i), ierr) end do

!c Clean up call HYPRE_IJMatrixDestroy(A, ierr) call HYPRE_IJVectorDestroy(b, ierr) call HYPRE_IJVectorDestroy(x, ierr) CALL MPI_BARRIER(mpi_comm,IERR)

end subroutine

Screenshot from 2021-05-01 11-23-27

Actually, I have tried Super_LU to solve the system directly and it seems good. I have no idea about such issue and I really need your kind help.

Thank you.

Cong

victorapm commented 3 years ago

Hi @DragonDlut,

Pilut might not be the right preconditioner for your problem. Does your discretization produce a symmetric matrix? If so, you could try using BoomerAMG as the preconditioner (option 2).

Best, Victor

DragonDlut commented 3 years ago

Hi @victorapm ,

Thank you for your kind reply. Actually, I have tried BoomerAMG itself and BoomerAMG-GMRES as the solver, the failure of convergence leads my attempt to Pilut-GMRES.

My discretization produce a symmetric matrix, but it is not a diagonally dominant one. Does such characteristic lead to the failure?

Cong

victorapm commented 3 years ago

Hi Cong,

I assume your matrix is positive definite also, right? BoomerAMG should work for this. Do you apply any scaling in the matrix? (This could be a problem for BoomerAMG) If you can print a small size matrix to file, I can take a look.

Thanks, Victor

DragonDlut commented 3 years ago

Hi @victorapm ,

Thank you for your help, the matrix has been attached. It is in the fomat (i,j,value).

Cong MatrixID 0.txt

victorapm commented 3 years ago

Hi @DragonDlut,

since you are solving a system of PDEs, I suggest considering AMG for systems from our user manual. When variables are interleaved, the easiest way to improve convergence of multigrid is to call HYPRE_BoomerAMGSetNumFunctions. In your case, num_functions=3. Moreover, since your problem is SPD, the PCG solver will provide better performance than GMRES.

Lastly, I see you impose boundary conditions by setting a very large value for the diagonal coefficient of BC rows. I recommend the following strategy instead: Let x_i and x_b refer to the interior and boundary unknowns from the solution vector x. If we split the matrix A as [A_ii A_ib; A_bi A_bb], then solve [A_ii 0; 0 I] [x_i ; x_b] = [b_i - A_ib u_0; u_0].

In your Matrix_ID_0.txt problem, I assumed that rows 1 to 75 are BCs; then using the above suggestions and default input parameters for BoomerAMG, I got the following result:

 Number of functions = 3 
Solver: AMG-PCG
HYPRE_ParCSRPCGGetPrecond got good precond

 Num MPI tasks = 1

 Num OpenMP threads = 1

BoomerAMG SETUP PARAMETERS:

 Max levels = 25
 Num levels = 5

 Strength Threshold = 0.250000
 Interpolation Truncation Factor = 0.000000
 Maximum Row Sum Threshold for Dependency Weakening = 1.000000

 Coarsening Type = HMIS 
 measures are determined locally

 No global partition option chosen.

 Interpolation = extended+i interpolation

Operator Matrix Information:

             nonzero            entries/row          row sums
lev    rows  entries sparse   min  max     avg      min         max
======================================================================
  0    1500    86018  0.038    20   81    57.3  -8.129e-12   3.333e+03
  1     606    63792  0.174    35  152   105.3  -2.485e+01   2.286e+03
  2     170    15287  0.529    47  132    89.9  -5.183e+00   3.402e+03
  3      26      626  0.926    19   26    24.1  -4.323e+01   4.119e+03
  4       5       25  1.000     5    5     5.0   2.081e+02   4.039e+03

Interpolation Matrix Information:
                    entries/row        min        max            row sums
lev  rows x cols  min  max  avgW     weight      weight       min         max
================================================================================
  0  1500 x 606     1    4   4.0   9.398e-03   5.838e-01   4.462e-01   1.000e+00
  1   606 x 170     1    4   4.0   3.092e-02   5.662e-01   4.270e-01   1.002e+00
  2   170 x 26      1    4   3.5   2.413e-02   6.123e-01   3.061e-01   1.000e+00
  3    26 x 5       1    2   1.5   5.914e-02   1.000e+00   1.195e-01   1.000e+00

     Complexity:    grid = 1.538000
                operator = 1.926899
                memory = 2.004081

BoomerAMG SOLVER PARAMETERS:

  Maximum number of cycles:         1 
  Stopping Tolerance:               0.000000e+00 
  Cycle type (1 = V, 2 = W, etc.):  1

  Relaxation Parameters:
   Visiting Grid:                     down   up  coarse
            Number of sweeps:            1    1     1 
   Type 0=Jac, 3=hGS, 6=hSGS, 9=GE:     13   14     9 
   Point types, partial sweeps (1=C, -1=F):
                  Pre-CG relaxation (down):   0
                   Post-CG relaxation (up):   0
                             Coarsest grid:   0

=============================================
Setup phase times:
=============================================
PCG Setup:
  wall clock time = 0.000000 seconds
  wall MFLOPS     = 0.000000
  cpu clock time  = 0.006323 seconds
  cpu MFLOPS      = 0.000000

<b,b>: 1.500000e+03

Iters       ||r||_2     conv.rate  ||r||_2/||b||_2
-----    ------------   ---------  ------------ 
    1    5.539568e+02    14.303104    1.430310e+01
    2    5.877453e+02    1.060995    1.517552e+01
    3    5.044162e+02    0.858223    1.302397e+01
    4    7.508240e+02    1.488501    1.938619e+01
    5    6.131390e+02    0.816621    1.583118e+01
    6    2.071663e+02    0.337878    5.349011e+00
    7    8.476657e+01    0.409172    2.188664e+00
    8    1.651901e+01    0.194876    4.265190e-01
    9    9.762470e+00    0.590984    2.520659e-01
   10    9.801471e+00    1.003995    2.530729e-01
   11    4.797484e+00    0.489466    1.238705e-01
   12    1.026666e+01    2.140010    2.650841e-01
   13    7.900023e+00    0.769483    2.039777e-01
   14    3.201598e+00    0.405264    8.266491e-02
   15    1.300573e+00    0.406226    3.358066e-02
   16    6.667450e-01    0.512655    1.721528e-02
   17    5.216729e-01    0.782418    1.346954e-02
   18    1.267920e-01    0.243049    3.273755e-03
   19    2.254736e-02    0.177830    5.821703e-04
   20    6.290290e-03    0.278981    1.624146e-04
   21    5.511805e-03    0.876240    1.423142e-04
   22    1.529703e-02    2.775321    3.949677e-04
   23    6.728106e-03    0.439831    1.737189e-04
   24    1.565929e-03    0.232744    4.043212e-05
   25    2.931947e-04    0.187234    7.570255e-06
   26    8.707850e-05    0.296999    2.248357e-06
   27    3.427471e-05    0.393607    8.849691e-07
   28    1.000677e-05    0.291958    2.583738e-07
   29    1.684858e-06    0.168372    4.350284e-08
   30    2.409176e-07    0.142990    6.220464e-09

=============================================
Solve phase times:
=============================================
PCG Solve:
  wall clock time = 0.020000 seconds
  wall MFLOPS     = 0.000000
  cpu clock time  = 0.011511 seconds
  cpu MFLOPS      = 0.000000
DragonDlut commented 3 years ago

Hi @victorapm ,

Thank you for your kind help and wish you all the best!

Cong