jcmgray / quimb

A python library for quantum information and many-body calculations including tensor networks.
http://quimb.readthedocs.io
Other
478 stars 107 forks source link

Get rid of MPI launcher #52

Open pulkin opened 4 years ago

pulkin commented 4 years ago

I have several complaints about the interaction with mpi4py. It could be worth splitting into separate issues.

  1. quimb-mpi-python and mpi_launcher.py do not really belong to this package. As far as I understand, they solve some sort of issues with slepc and are not needed otherwise.
  2. Otherwise, they are not reliable. They absolutely ignore mpi environment if run with SLURM srun because the latter does not set PMI_SIZE. This causes all sorts of unexpected outcomes.
  3. mpi_launcher.py is hard to understand. The get_mpi_pool function returns non-mpi stuff as well, for example. Most part of the code seems to deal with everything except mpi4py. Does this mean that quimb can be MPI-parallelized without mpi4py? Does this mean that non-MPI concurrent.futures parallelization is implemented?
  4. mpi_launcher.py is hard to debug. It is not clear how get_mpi_pool behaves and how it is expected to behave. The use of @CachedPoolWithShutdown decorator is ambiguous: it replaces num_workers up to the point when the only reliable way to understand what's going on during runtime is to check the type of the returned pool. The most ridiculous thing is that for production this means writing another wrapper over get_mpi_pool!
  5. Finally, quimb-mpi-python is enforcing. My original intention was to run multiple calculations on a grid (for example, a phase diagram of a spin model). This means that I do not necessarily need quimb's parallelization. But I have to do that because of that check in mpi_launcher.py! Or, again, I have to wrap some scripts and invocations to cleanup environment variables and to make quimb act as there is no MPI interface at all.
  6. In addition, I do not think mpi_launcher.py does well with its purpose. It seems to be nailed firmly to COMM_WORLD while totally neglecting the possibility of nested parallelization, etc.
pulkin commented 4 years ago

Regarding SLURM and srun, the variable to check is SLURM_NTASKS.

jcmgray commented 4 years ago

I think these are generally pretty valid comments, this stuff worked well for my simple use case but could definitely be improved. Having said that, I'm not sure 'get rid of it' is super useful without some kind of vision of an alternative!

Possibly improvements with regard to your points:

  1. SLURM_NTASKS could be added
  2. For consistency get_mpi_pool probably should always a MPIPoolExecutor or SynchroMPIPool rather than a ProcessPoolExecutor. The motivation for returning the latter is just efficiency when you are only requesting a single process. mpi4py is always required for slepc stuff but its (potentially not so robust) MPIPoolExecutor can be avoided with the more traditional SynchroMPIPool.
  3. I'm not sure how this could be changed. The idea here is just to avoid spinning up pools over and over again.
  4. I agree that that is annoying and should change!

In general, this functionality is aimed at conveniently using slepc at small scale, rather than targeting big distributed MPI calculations. Obviously it would be good not to interfere with those (i.e. point .5)! but I guess the assumption is at this point one has to take a more hands-on approach in any case. For example you can call eigs_slepc with your own comm= if you want.

If you have some ideas/API of how this could look let me know.

pulkin commented 4 years ago

I thought a little bit about it: I think the first step is to get rid of quimb-mpi-python and to make the desirable setup through python -m quimb.mpi_launcher. It will address the most annoying p.5, remove bash scripting (possibly, making it a bit more compatible with windows) and accumulate all mpi setup stuff in a single place.

I'd see the next step to be a careful description of how slepc should be run with MPI but without mpi_launcher.

Then, I'd prefer to unmount mpi_launcher.py part by part.

The first part is determining the mpi setup strictly through mpi4py interface: no psutil (because it is not user-controlled and not cross-platform) and no OMP_NUM_THREADS. FYI, the latter is typically a lower-end part of a parallel cluster setup: a lot of stuff run on hpc is parallelized with both MPI and OpenMP. It probably corresponds to num_workers and num_threads arguments but it looks like you break this correspondence. I also do not quite see how custom environment variables QUIMB_* provide new functionality.

The second part is to make comm to be the argument to SyncroMPIPool: no need to import inside the body and all that.

The third part is to, probably, get rid of spawning code. As far as I understand, you make a syncro gateway to, potentially, non-syncro MPI setup. The non-synchro setup looks like a legacy code that you started with and do not need any more. Unless there are solid performance reasons for the non-synchro setup, I suggest to firmly nail quimb to SynchroMPIPool which requires a minimal maintenance. From my personal experience, non-synchro setups are very rare because they require new API, code overhead, testing from hpc teams and some magic in python. Finally, quimb competes here with loads of infrastructure code that also does the same thing: spawns MPI processes in the most correct way.

jcmgray commented 4 years ago

I thought a little bit about it: I think the first step is to get rid of quimb-mpi-python and to make the desirable setup through python -m quimb.mpi_launcher. It will address the most annoying p.5, remove bash scripting (possibly, making it a bit more compatible with windows) and accumulate all mpi setup stuff in a single place.

Yes this seems like a sensible thing to do. And in general I agree that it would be worthwhile to have a more explicit way to control various aspects of the MPI stuff. Are you wanting just more control over n_procs and n_threads or also managing computations with various different COMMs as well?

Just to explain the current way things work:

  1. The spawning is for small local computations, where you just want to use SLEPc without thinking about MPI but SLEPc is most efficient when parallelized with N mpi processes each with one thread (since PETSc/SLEPc was designed with MPI parallelism in mind).

This would be good to keep for ease of use in e.g. notebooks. No thinking about HPC, mpiexec, communicators etc.

  1. Inferring the COMM etc. is meant to make it super simple to write the MPI code without having to handle all the communicator stuff oneself. I.e. some sensible defaults such that when you run the same script under MPI it just uses the default global communicator (which is all I have generally ever needed).

I think it would be good to keep this design if possible (i.e. the script here which doesn't need to be modified when moving to MPI.

At least initially, some steps might be:

pulkin commented 4 years ago

Are you wanting just more control over n_procs and n_threads or also managing computations with various different COMMs as well?

I think these are two independent worthy changes.

This would be good to keep for ease of use in e.g. notebooks. No thinking about HPC, mpiexec, communicators etc.

Unfortunately, this does not really work: a number of things may break down along the way. I think it is better to run it in serial by default, until a certain explicit import or a setting. At least, the user will already know the entry point for MPI (which is currently not be the case until some sort of exception is raised) and will be able to debug it.

I think you do a reasonable job regarding the choice of linear algebra solvers to "just make it work with what is available", however, I am afraid, MPI is a bit too fragile for this approach.