pmgbergen / porepy

Python Simulation Tool for Fractured and Deformable Porous Media
GNU General Public License v3.0
251 stars 88 forks source link

Reimplementation of Ad projection operators #1182

Open keileg opened 5 months ago

keileg commented 5 months ago

Background

In the Ad operator framework, the classes SubdomainProjection, MortarProjection, Trace and InvTrace all represent mappings betweeen grid entities of various kinds. These are all implemented as projection matrices that, during parsing of an operator tree, are left multiplied with arrays and matrices. This is unsatisfactory for two reasons:

  1. Constructing the projection matrices is slow and adds significantly to the computational cost of working with the Ad operators. As a partial remedy, some if not all of the projection classes construct the projection matrices during initializationof the projection object, but this to a large degree just moves the computational burden. The ahead-of-time construction will also be hard to make compatible with the caching outlined in #1181.
  2. Projection operations are in reality slicing (for a global-to-local restriction) or filling in of selected rows in large arrays and matrices (for local-to-global prolongation). Representing this as matrix-matrix / matrix-vector products likely adds significantly to the computational cost.

Suggested change

The projection methods, say SubdomainProjection.face_restriction(), now return pp.ad.SparseArrayss. Instead, it should return a new slicer-object that roughly looks like this:

class RestrictionSlicer:

    self._target_indices: np.ndarray
    # EK note to self: In the future, this may also be a PETSc IS object

    self._weights
    # Weights to be assigned to the different rows

    def __init__(self, int: dim, target_indices: np.ndarray, weights: np.ndarray) -> None:
        self._target_indices = target_indices
        self._weights = weights

    # We may need to use typing overload here to make sure mypy is happy: The return type
    # should always be the same as the input other.
    def __matmul__(self, other: pp.ad.AdArray | np.ndarray) -> pp.ad.AdArray | np.ndarray:
        # In its simplest form (if all of self._weights is one), this can be implemented as
        return other[self._target_indices]
        # However, for non-unitary weights, we need special treatment. For np.ndarrays, this is something like
        return other[self._target_indices] * self._weights
        # AdArrays are a bit more complicated: In principle we can make a diagonal matrix of the weights and 
        # right multiply, so
        scaling = sps.dia_array((self._weights, 0), shape=(self._weights.size, self._weights.size)))
        return scaling @ other[self._target_indices]
        # This risks being slow, though (profiling is needed). The alternative is to work directly with the sparse data structure
        # in the AdArray, using pp.matrix_operations.rldecode to expand the weight array to the right size
        # (can be different from one row to the next). A rough outline of the code is:
        tmp_mat = other[self._target_indices]
        data_ind = pp.matrix_operations.rldecode(np.arange(tmp_mat.jac.shape[0]), np.diff(tmp_mat.indptr))
        tmp_mat.jac.data *= self._weights[data_ind]
        return tmp_mat
        # A final question is whether slicing (all calls above to other[self._target_indices]) is
        # sufficiently efficient, or if we should rather construct a new array by exploiting the CSR format
        # in scipy.arrays. Again, profiling will be needed.

    # NB: The other dunder methods (__add__ etc.) should possibly raise an error.

class ProjectionSlicer:

    self._target_indices: np.ndarray
    self._target_num_rows: int

    self._weights: np.ndarray

    def __init__(self, int: dim, mdg: MixedDimensionalGrid, domain: list[pp.Grid]) -> None:
        # self._target_indices and self._target_num_rows are computed here

    def __matmul__(self, other: pp.ad.AdArray | np.ndarray) -> pp.ad.AdArray | np.ndarray:
        if isinstance(other, np.ndarray):
            result = np.ndarray(self._target_size, dtype=other.dtype)
            result[self._target_ind] = other
        else:  # pp.ad.AdArray
            res_val = np.ndarray([self._target_size, dtype=other.val.dtype)
            res_jac = sps.csr_matrix((self._target_size, other.jac.shape[1]))
            res_val[self._target_ind] = other.val
            res_jac[self._target_ind] = other.jac
            return pp.ad.AdArray(res_val, res_jac)

Comments:

  1. It is not clear whether we need one slicer each for SubdomainProjection, MortarProjection etc.
  2. Similar thoughts may apply to the geometry basis functions pp.models.geometry.basis, research is needed.
  3. The implementation should change the Ad backend only, no changes in the computational grid.
  4. Testing will be needed.

Regarding the divergence class

The class pp.ad.Divergence represents a differential operator rather than a mapping. This cannot readily be implemented as slicing, and although improvements should be possible there as well, it will not be part of this issue.

Dependencies

This can be done independently of #1179, #1181, but should not be done in parallel.

keileg commented 5 months ago

Some of the functionality in po.models.geometry may also be suited for this approach.

IvarStefansson commented 5 months ago

Some of the functionality in po.models.geometry may also be suited for this approach.

That's an interesting comment. Is there any chance this approach can help us with representation of tensors and their products?

keileg commented 5 months ago

Perhaps, it depends on what you mean. I suggest we discuss in person.

keileg commented 1 week ago

Implementing this will require three main steps, each split into separate tasks.

Preparatory steps

Implementation of the new projection operators

Inclusion of the new slicers in the Ad operator system

keileg commented 1 week ago

Benchmarking (to be updated)

Observations from benchmarking using line_profiler (which btw seems to have a substantial overhead with my setup, but comparing relative performance, rather than absolute numbers, should give some indications)

Running flow_benchmark_2d_case3, with a grid of 24K cells in the matrix, ~800 mortar cells

  1. Substantial time is spent in the mortar projections. Less so in the subdomain projections (but this is reasonable, the physics is homogeneous between the subdomains).
  2. The time spent on defining the equations (which will include the projections) is roughly equal to that of mpfa discretization. There are indications that the mortar projections take the main part of this construction, though this is not certain, given the noise in profiling.
  3. Time for assembly is (for this case) considerable less than defining equations and discretizing.
  4. Time for linear solve is also not that big.

Conclusion so far: It seems the projection update project can make an impact.

IvarStefansson commented 1 week ago

This may be a stupid question: How will the slicing work for non-matching mortar grids?

keileg commented 1 week ago

This may be a stupid question: How will the slicing work for non-matching mortar grids?

It will be more complicated, involving not only slicing, but weighted sums over rows. I haven't worked out the details, but if it gets too complicated, it could be that we end up using the old solution for non-matching grids, at least until we start using these in earnest.

keileg commented 1 week ago

Benchmarking of the actual slicing and reconstruction

There turned out to be two natural ways to implement these operations:

  1. Operating directly on the sparse data structures as outlined in the initial posting of this issue. Referred to as 'Option 1' below'.
  2. Explicitly constructing a restriction/prolongation matrix. Referred to as 'Option 2' below.

Comments:

The benchmarks:

Two md geometries were considered:

  1. The regular geometry (case 1) from the 2d flow benchmark, with 400, 24K and 2400K cells in total.
  2. Case 3 from the 3d flow benchmarks, with 30K, 150K and 350K cells in total.

We test both restriction to and prolongation from each subdomain. This is representative for both subdomain and mortar projections, with the assumption that we treat the task of identifying indices to project to and from different from the task of mapping. By similar reasoning, mapping to sets of domains is also covered.

We comment below on time for initialization, and of mapping of vectors, sparse matrices and AdArrays. In Option 2, initialization entails construction of a projection matrix, while Option 1 only stores passed information during initialization. Thus the relative importance of initialization cost depends on how many times a restriction/prolongation object is used; currently this is often only once, but future caching (#1181) may change this.

Results

Summarized as follows:

  1. Generally, prolongation is more costly than restriction, reflecting the need to construct and operate on larger vectors and matrices. The difference increases with cell count.
  2. Initialization is fastest with Option 1, with a cost of roughly 2-20% of that for Option 2.
  3. Restricting and prolonging a vector with Option 1 takes 20-100% of the corresponding time for Option 2. For Option 1, the time for vector operations and initialization are of the same order of magnitude (though the scaling of operations is poorer than that of initialization), while for Option 2, initialization can be 1-2 orders of magnitude more costly.
  4. For matrix operations, the scaling is roughly linear for Option 1, while for Option 2 it is in general slightly worse than linear. In total, the cost for Option 1 is about 20-100% of that for Option 2, with the difference increasing with matrix size. Testing on matrices with different number of nonzeros per row indicated that Option 1 can be considerably better than Option 2 in some respect (improved prolongation, no difference in behavior for restriction). The cost of initialization is negligible compared to matrix operations.
  5. For Ad arrays, Option 1 takes 50-100% of the time for Option 2, with better scaling with cell count.

Conclusions

  1. Option 1 is significantly faster than Option 2, in particular for larger grids where the difference matters the most. Option 1 is therefore the preferred method.
  2. The need to stay compatible with non-conforming grids means we also will need Option 2. The projection objects will need to decide which restrictor / prolongator to use based on the provided weights.

Next steps

  1. Implement the new classes for restriction and prolongation as helpers in grid_operators.py.
  2. Refactor the SubdomainProjection to use the new classes. This will entail deciding which indices to use based on input tot the methods cell_prolongation() etc. Note that for the SubdomainProjection, the weights will always be unitary.
  3. Introduce tests of the new classes.
  4. Do benchmarking to compare performance with the existing implementation.
  5. Start thinking of the MortarProjection