lettucecfd / lettuce

Computational Fluid Dynamics based on PyTorch and the Lattice Boltzmann Method
MIT License
203 stars 39 forks source link

MPI parallelization #5

Open Olllom opened 4 years ago

Olllom commented 4 years ago

Can we do Multi-GPU over MPI?

Here is a summary of distributed computing capabilities https://pytorch.org/docs/stable/distributed.html

AFAIK there are no "distributed tensors" or anything like that. But there are things like sending and receiving tensors via MPI. This means that we have to set up the ghost layers in a distributed simulation by hand (only in the streaming step and grid generation).

This requires a version of PyTorch that is built with MPI (I don't think they have these on conda, so we need to build it by hand).

Here is a list of things that would be required on the way to a usable multinode implementation:

Here is an example of distributed computing in pytorch https://www.glue.umd.edu/hpcc/help/software/pytorch.html

At this point, I am not sure how efficient and feasible such an MPI implementation would be. We should try this outside lettuce first for a mini example before proceeding.

Olllom commented 3 years ago

This has been partly done by @MartinKliemank -- PR is hopefully coming soon