LA: allow to solve block linear systems efficiently

Block matrices represent an important class of problems in numerical linear algebra and offer the possibility of far more efficient iterative solvers than just treating the entire matrix as black box. Following the common linear algebra definition, a block matrix is a matrix divided in a small, problem-size independent (two, three or so) number of very large blocks. These blocks arise naturally from the underlying physics or discretization of the problem, for example, they may be associated with different variables of a physical problem. Under a certain numbering of unknowns the matrix can be written as

                         [ A00  A01  A02 ]
                         [ A10  A11  A12 ]
                         [ A20  A21  A22 ]

where each A_ij is an entire block (see the paragraph "Solving Block Matrices" in the PETSc manual).

When assembling the matrix, a monolithic matrix should be provided. For example, assuming to group the elements of the matrix in five-by-five groups (here, five is the so-called block size of the matrix and usually rises when coupling variables with different meaning, for example pressure, the three components of the velocity and temperature) the assembler will provide the following system:

    [           |           |          |           ]   [           ]     [           ]
    [  (5 x 5)  |  (5 x 5)  |    ...   |  (5 x 5)  ]   [  (5 x 5)  ]     [  (5 x 5)  ]
    [           |           |          |           ]   [           ]     [           ]
    [----------------------------------------------]   [-----------]     [-----------]
    [           |           |          |           ]   [           ]     [           ]
    [    ...    |    ...    |    ...   |  (5 x 5)  ]   [    ...    ]  =  [    ...    ]
    [           |           |          |           ]   [           ]     [           ]
    [----------------------------------------------]   [-----------]     [-----------]
    [           |           |          |           ]   [           ]     [           ]
    [  (5 x 5)  |  (5 x 5)  |    ...   |  (5 x 5)  ]   [  (5 x 5)  ]     [  (5 x 5)  ]
    [           |           |          |           ]   [           ]     [           ]

Internally, the monolithic matrix will be split into blocks. For example, considering two splits, the first one that group together the first four variables and the second one that holds the last variable (i.e., split sizes equal to [4, 1]), the internal split system will be:

      [                     |                     ]  [           ]     [           ]
      [  (4 * N) x (4 * M)  |  (4 * N) x (1 * M)  ]  [  (4 x M)  ]     [  (4 x N)  ]
      [                     |                     ]  [           ]     [           ]
      [-------------------------------------------]  [-----------]  =  [-----------]
      [                     |                     ]  [           ]     [           ]
      [  (1 * N) x (4 * M)  |  (1 * N) x (1 * M)  ]  [  (1 x M)  ]     [  (1 x N)  ]
      [                     |                     ]  [           ]     [           ]

where M and N are the number of rows and columns respectively.

The PETSc PCFIELDSPLIT preconditioner is used to solve the split system. There are different split strategies available:

1) DIAGONAL: this strategy assumes that the only non-zero blocks are the diagonal ones.

Considering a two-by-two block block matrix

                           [ A00    0  ]
                           [  0    A11 ],

the preconditioned problem will look like

          [ KSPSolve(A00)        0         ]  [ A00    0  ]
          [      0           KSPSolve(A00) ]  [  0    A11 ],

in other words the preconditioner is:

                                 ( [ A00    0  ] )
             approximate inverse (               )
                                 ( [  0    A11 ] ).

The system is solved efficiently by solving each block independently from the others.

Blocks are solved using a flexible GMRES iterative method. If the system is partitioned each block is preconditioned using the (restricted) additive Schwarz method (ASM). On each block of the ASM preconditioner an incomplete LU factorization (ILU) is used. There is one ASM block per process. If the system is not partitioned it is preconditioned using the incomplete LU factorization (ILU).

2) LOWER: this strategy assumes that the only non-zero blocks are the ones on an below the diagonal.

Considering a two-by-two block block matrix

                           [ A00    0  ]
                           [ A01   A11 ],

the preconditioner is

                                 ( [ A00    0  ] )
             approximate inverse (               )
                                 ( [ A01   A11 ] ).

The system is solved efficiently by first solving with A00, then applying A01 to that solution, removing it from the right hand side of the second block and then solving with A11, in other words

              [ I      0   ]  [    I    0 ]  [  A00^-1  0 ]
              [ 0   A11^-1 ]  [ - A10   I ]  [   0      I ].

This strategy can be seen as a "block" Gauss-Seidel with the blocks being the splits.

3) FULL: this strategy doesn't make assumptions on the structure of the blocks.

Considering a two-by-two block block matrix

                           [ A00   A01 ]
                           [ A11   A11 ],

the preconditioned problem will look like

          [ KSPSolve(A00)        0         ]  [ A00   A01 ]
          [      0           KSPSolve(A00) ]  [ A11   A11 ],

in other words the preconditioner is:

                                 ( [ A00    0  ] )
             approximate inverse (               )
                                 ( [  0    A11 ] )

The preconditioner is evaluated considering only the diagonal blocks and then the full system is solved.

The system is solved using a flexible GMRES iterative method. If the system is partitioned each diagonal block is preconditioned using the (restricted) additive Schwarz method (ASM). On each block of the ASM preconditioner an incomplete LU factorization (ILU) is used. There is one ASM block per process. If the system is not partitioned it is preconditioned using the incomplete LU factorization (ILU).

optimad / bitpit

LA: allow to solve block linear systems efficiently #466