Efficient creation of PETSc matrices

wcdawn commented 3 years ago

I am building a pair of large PETSc matrices to pass to SLEPc for a generalized eigenvalue problem. To translate from the MFEM matrix to a PETSc matrix, my code currently is

a.Assemble(skip_zeros);
a.Finalize(skip_zeros);
OperatorHandle Ah{Operator::PETSC_MATAIJ};
a.ParallelAssemble(Ah);
PetscParMatrix* A{nullptr};
Ah.Get(A);
Ah.SetOperatorOwner(false);

I got this structure from ex11p. Is there a more efficient method for this?

My computation is extremely memory limited. I've already had to add an option in BilinearForm::Assemble() to treat skip_zeros here differently and skip all zeros regardless of the matrix sparsity pattern and this reduced my memory usage from 400GB to 200GB. I'll likely be submitting a PR to allow skip_zeros==2 to use this behavior (or use an enum).

Right now, parallel assembly is running out of memory. OperatorHandle::MakeSquareBlockDiag() here is using an additional 70GB for a 200GB matrix. The code finally runs out of memory and crashes during OperatorHandle::MakePtAP() here.

Is there some way to assemble directly into the PETSc format? If there isn't, how could I go about implementing such a method?

Thanks!

wcdawn commented 3 years ago

I've spent some time quantifying how this time & memory is being spent. It looks like an incredible amount of time & memory is being used in ParallelAssemble(), particularly in MakePtAP() during the prolongation operation. My problem has no hybridization, no static condensation, and no essential boundary conditions. Therefore, this step is simply rearranging rows/columns but doing so with a matmatmat operation. (If only one MPI rank is used, the prolongation matrix is the identity for these problems.)

I'm looking for a more efficient method to map the local matrices to a global PETSc matrix. I'm willing to write it myself but could use some help finding a starting direction. Basically, I'd like to perform the prolongation operation without a linear algebra operation.

Also, note that MakePtAP() is particularly inefficient for the PETSc matrix formats. The same operation using HypreParMatrix requires about half of the memory....

jandrej commented 3 years ago

Creating a PETSc matrix from a HypreParMatrix shouldn't allocate much memory because it is mainly just a wrapper around the low level hypre_ParCSRMatrix object - meaning there is an "implementation" for a Mat in PETSc.

stefanozampini commented 3 years ago

MFEM always uses the triple matrix product formulation to assemble the global operators. The reason you see twice the memory used with PETSc is because the operators need to be converted on the fly to the PETSc format to run the PtAP operation. It is probably better in your case to assemble in Hypre format, and then convert the resulting operator only. You can convert in MATHYPRE format and reuse the underlying ParCSR, or use PETSc MATAIJ format. In the latter case
HYPRE and PETSc use very similar storage for parallel matrices, but not identical. We cannot reuse the same memory for the IJ structure and the matrix values.

wcdawn commented 3 years ago

Thanks for the information! This turns out to have a pretty major impact.

ParBilinearForm Apbf(fespace);
/* populate Apbf */
HypreParMatrix* Ahypre = Apbf.ParallelAssemble();
PetscParMatrix Apetsc(Ahypre, Operator::PETSC_MATHYPRE);

The above code has much less memory overhead. Performing MakePtAP still requires an additional ~50% memory (i.e. for a 200GB matrix, this process requires an additional 100GB).

mfem / mfem

Efficient creation of PETSc matrices #2070