Using MPI to parallelize SLEPc eigensolver

I am trying to use "quimb" for calculating the ground state of an extremely large scipy sparse Linear operator, which is generated using the package "quspin". (We are using Linear operator instead of a matrix because it takes lesser time to build and has OpenMP support.)

We are using "quimb" to convert the linear operator to a PETSc matrix, which is then diagonalized using SLEPc.

However I am having a little difficulty in understanding how to run it in parallel. Let me share the relevant part of the python code

from mpi4py import MPI
from petsc4py import PETSc
from slepc4py import SLEPc
from quimb.linalg.slepc_linalg import convert_mat_to_petsc

HH= H_Lin_operator()   ## HH is the scipy sparse Linear Operator 
                                  ## calculated using quspin in another part of the code

petsc_matH = convert_mat_to_petsc(HH, comm =PETSc.COMM_WORLD)     ### Conversion to petsc matrix using quimb

print("Petsc matrix formed")

Print = PETSc.Sys.Print

########### Calling the Slepc eigen solver #######

E = SLEPc.EPS(); E.create(PETSc.COMM_WORLD)

E.setOperators(petsc_matH)
E.setProblemType(SLEPc.EPS.ProblemType.HEP)
E.setFromOptions()
E.setWhichEigenpairs(SLEPc.EPS.Which.SMALLEST_REAL)

E.solve()

Print()
Print("******************************")
Print("*** SLEPc Solution Results ***")
Print("******************************")
Print()

its = E.getIterationNumber()
Print("Number of iterations of the method: %d" % its)

eps_type = E.getType()
Print("Solution method: %s" % eps_type)

nev, ncv, mpd = E.getDimensions()
Print("Number of requested eigenvalues: %d" % nev)

tol, maxit = E.getTolerances()
Print("Stopping condition: tol=%.4g, maxit=%d" % (tol, maxit))

nconv = E.getConverged()
Print("Number of converged eigenpairs %d" % nconv)

if nconv > 0:
    # Create the results vectors
    vr, wr = petsc_matH.getVecs()
    vi, wi = petsc_matH.getVecs()
    # Store eigenvalues and eigenvectors as an array
    evals=[]
    evecs=[]

    Print()
    Print("        k          ||Ax-kx||/||kx|| ")
    Print("----------------- ------------------")
    for i in range(nconv):
        k = E.getEigenpair(i, vr, vi)
        error = E.computeError(i)

        evals.append(k)
        evecs = [complex(vr0, vi0) for vr0, vi0 in zip(vr.getArray(), vi.getArray())]

        if k.imag != 0.0:
            Print(" %9f%+9f j %12g" % (k.real, k.imag, error))
        else:
            Print(" %12f      %12g" % (k.real, error))

    Print()
    evals = np.asarray(evals)
    evecs = np.asarray(evecs).T

This runs successfully using 1 processor (for a smaller system size of HH), using the following command"

quimb-mpi-python file.py

But I wish to run it in 16 processors(say).

Then according to the documentation of "quimb", I think the syntax to run it should be:

quimb-mpi-python --syncro -n 16 file.py

The breaking up of the job will automatically be done whenever it encounters a SLEPc process and all the processors run the rest of the main script.

However, it shows the following error:

Traceback (most recent call last):
  File "main_diag.py", line 72, in <module>
    E.setOperators(petsc_matH)
  File "SLEPc/EPS.pyx", line 1012, in slepc4py.SLEPc.EPS.setOperators
petsc4py.PETSc.Error: error code 80
[ 9] EPSSetOperators() line 411 in /home/samudrasur/slepc-3.15.1/src/eps/interface/epssetup.c
[ 9] Arguments must have same communicators
[ 9] Different communicators in the two objects: Argument # 1 and 2 flag 3

Then how can I do the eigenstate calculation of this large operator in parallel ?

Yes syncro means the usual SPMD mode that MPI usually is - everything runs the same file, but depending on calls to the MPI communicator and objects you usually construct different ranges of the operators etc. Does the script run fine without MPI?

Part of the problem is most likely that quimb does not support MPI for LinearOperators ('shell' matrices) at the moment. It has no way to automatically infer how to perform the matrix-vector product in parallel across the distributed vector.

For that you might have to work out two things:

Does quspin support only acting on a subrange of a vector, e.g. rows 1024-2048 out of operator of size 4096 etc. (how big is your vector space?) you can get this range from PETSc.Mat().getOwnershipRange.
What interface (if any!) does petcs4py use for distributed linear operators? This is slightly tricky to work out, as I mentioned in the email, there are not a huge amount of docs for petsc4py and slepc4py - the one example seems to be this https://gitlab.com/petsc/petsc/-/blob/main/src/binding/petsc4py/demo/poisson2d/poisson2d.py.

Currently quimb just calls:

class PetscLinearOperatorContext:
    def __init__(self, lo):
        self.lo = lo
        self.real = lo.dtype in (float, np.float_)

    def mult(self, _, x, y):
        y[:] = self.lo.matvec(x)

    def multHermitian(self, _, x, y):
        y[:] = self.lo.rmatvec(x)

def linear_operator_2_petsc_shell(lo, comm=None):
    PETSc, comm = get_petsc(comm=comm)
    context = PetscLinearOperatorContext(lo)
    A = PETSc.Mat().createPython(lo.shape, context, comm=comm)
    A.setUp()
    return A

basically you would need to augment the mult method essentially so that you work out the local vector range and only apply the operator to that part. It might be as well to ask in the petsc4py or slepc4py repos.

The script runs absolutely fine without MPI, although for a smaller size system.

Part of the problem is most likely that quimb does not support MPI for LinearOperators ('shell' matrices) at the moment. It has no way to automatically infer how to perform the matrix-vector product in parallel across the distributed vector.

Regarding this, I thought the command quimb.linalg.slepc_linalg.convert_mat_to_petsc(HH, comm =PETSc.COMM_WORLD) does the job of converting the Linear operator to a PETSc.Mat() and also takes care of distributing to different processors using communicator comm =PETSc.COMM_WORLD. It is because of this part of the function quimb.linalg.slepc_linalg.convert_mat_to_petsc() in the main source code

def convert_mat_to_petsc(mat, comm=None):
    """Convert a matrix to the relevant PETSc type, currently
    only supports csr, bsr and dense matrices formats.

    Parameters
    ----------
    mat : dense, sparse, LinearOperator or Lazy matrix.
        The operator to convert.
    comm : mpi4py.MPI.Comm instance
        The mpi communicator.

    Returns
    -------
    pmat : petsc4py.PETSc.Mat
        The matrix in petsc form - only the local part if running
        across several mpi processes.
    """
    if isinstance(mat, sp.linalg.LinearOperator):
        return linear_operator_2_petsc_shell(mat, comm=comm)

    PETSc, comm = get_petsc(comm=comm)
    mpi_sz = comm.Get_size()
    pmat = PETSc.Mat()

    # retrieve before mat is possibly built into sliced matrix
    shape = mat.shape

    pmat.create(comm=comm)
    pmat.setSizes(shape)
    pmat.setFromOptions()
    pmat.setUp()
    ri, rf = pmat.getOwnershipRange()

I think I am completely wrong here.

I have to do this distribution into different processors myself by looking at PETSc.Mat().getOwnershipRange

Does quspin support only acting on a subrange of a vector, e.g. rows 1024-2048 out of operator of size 4096 etc. (how big is your vector space?) you can get this range from PETSc.Mat().getOwnershipRange.

I don't think quspin supports this.

What interface (if any!) does petcs4py use for distributed linear operators? This is slightly tricky to work out, as I mentioned in the email, there are not a huge amount of docs for petsc4py and slepc4py - the one example seems to be this https://gitlab.com/petsc/petsc/-/blob/main/src/binding/petsc4py/demo/poisson2d/poisson2d.py.

If I understand correctly, it uses MPI.

I will look at this example and try to do the division of the matrices and vectors into different processors manually.

Yeah the thing is that a LinearOperator by design doesn't expose any of its internal structure, it only defines its action on a vector, meaning the distribution can't be done automatically by any library. Mentioning this could be added to the docs probs.

I don't think quspin supports this.

I think if quspin doesn't support this, or functionality can't be hacked, then this particular route (quspin linear operator + slepc + MPI) is going to be tricky!

If I understand correctly, it uses MPI. I will look at this example and try to do the division of the matrices and vectors into different processors manually.

Yes sorry to clarify by interface I meant how petsc tells the local shell matrix / linear operator, what range of subvector to act on, and whether just implementing that is sufficient.

Are you totally sure building the actual sparse matrix in parallel (each process just builds its own local slice) is not possible? Do you have an estimate for the size and density (non-zero elements) of the operator?

Yes sorry to clarify by interface I meant how petsc tells the local shell matrix / linear operator, what range of subvector to act on, and whether just implementing that is sufficient.

I think I really need to dig into the documentation of PETSc and SLEPc to know the interface.

Regarding the size of the sparse matrix, I can tell that the dimension of the Hilbert space is ~126 million, of which only about a few thousand elements are non-zero.

jcmgray / quimb

Using MPI to parallelize SLEPc eigensolver #93