precice / micro-manager

A manager tool to facilitate two-scale coupling in multi-physics simulations using preCICE.
GNU Lesser General Public License v3.0
15 stars 10 forks source link

Call micro simulation solve routine with multiple CPUs #68

Open kalupaika opened 11 months ago

kalupaika commented 11 months ago

In case of a large micro simulation (in the orders of million elements or degrees of freedom at each Gauss point of the macro problem) or even multiple scales for a single solve call, access to multiple CPUs will be required for undertaking the analysis. Can we give each solve() call access to multiple CPUs?

IshaanDesai commented 9 months ago

One way to do this would be using pympipool. Then the Micro Manager is itself not launched with mpi4py, but just in serial, and MPI is called in sub processes or functions, which are then run with the Executor instance. This would also allow for greater freedom in controlling how different parts of the micro simulations are run.

steffenger commented 6 months ago

I would also appreciate such a feature. In theory it should be doable to do some multiprocessing on the solve loop, since the microsimulations are idenpendent of each other in the timestep itself. The questions is how to handle this with the adaptivity in a meaningfull way. If i could a achieve a perfect mutliprocessing, wouldn't that make adaptivity obsolete in terms of wall time?

uekerman commented 5 months ago

In theory it should be doable to do some multiprocessing on the solve loop, since the microsimulations are idenpendent of each other in the timestep itself.

I think, there is some misunderstanding. How I understood this issue, each micro simulation should be called by multiple ranks. Not a multiprocessing of the solve loop. The latter, you kind-of can already achieve now via the domain decomposition.

steffenger commented 5 months ago

Yes, it would probably have been smarter to make a new issue:D I had something like hybrid parallelization MPI openMP in my mind (although I'm not sure if this makes sense in this context). Maybe you want to have a certain number of microsimulations on an mpiporcess, so that the local adaptivity can still be used to avoid memory requirements and multiprocessing to make the solve loop as fast as possible.

IshaanDesai commented 5 months ago

The latter, you kind-of can already achieve now via the domain decomposition.

This is correct. In principle the user is free to do multiprocessing (shared-memory parallelization) within the micro simulation file that they write. The effect of this would however be limited if the micro simulation is on one rank. This issue is about distributed-memory parallelization of every micro simulation. For a large machine this could also mean that we assign a single compute node for one micro simulation.

Maybe you want to have a certain number of microsimulations on an mpiporcess, so that the local adaptivity can still be used to avoid memory requirements and multiprocessing to make the solve loop as fast as possible.

Also a separate discussion. Lets open separate issues for this.