Support Dask

Dask is a flexible library for parallel computing in Python. It is growing its popularity among Python ecosystems. Because libyt does the in-situ analysis by running Python script, it is important to support this feature as well.

Current `libyt` structure

Each MPI rank initializes a Python interpreter, and they work together through mpi4py.

MPI 0 ~ (N-1)
Python
libyt Python Module
libyt C/C++ library
Simulation

How should `dask` be set up inside embedded Python?

We can make two additional ranks specifically for scheduler and client (not necessarily to be MPI 0 and 1), and the rest of MPI nodes for workers. Each simulation also runs inside workers. By following how dask-mpi initialize() initializes scheduler, client, and workers, it is possible to wrap this inside libyt.

MPI 0	MPI 1	MPI 2	...	MPI (N-1)
Scheduler	Client	Worker	Worker	Worker
libyt Python Module	libyt Python Module	libyt Python Module	libyt Python Module	libyt Python Module
libyt C/C++ library	libyt C/C++ library	libyt C/C++ library	libyt C/C++ library	libyt C/C++ library
Empty	Empty	Simulation	Simulation	Simulation

Solve data exchange problem

Because we use Remote Memory Access (one-sided MPI) with some settings that required every rank to participate in the procedure (#26). libyt suffers from data exchange process between MPI nodes. Every time yt reads data, all ranks should wait for each other and synchronize. However, if we move this data exchange process from C/C++ to Python side, then it is possible to exchange data with more flexibility using dask and exchange data in a asynchronous way. By encoding what MPI ranks should get into a Dask graph, asking worker to prepare local grid data, and exchanging data between workers, it will be much easier. _(At least much easier than using C/C++. :sweatsmile: )

yt-project / libyt

Support Dask #59

Support Dask

Current `libyt` structure

How should `dask` be set up inside embedded Python?

Solve data exchange problem

yt-project / libyt

Support Dask #59

Support Dask

Current libyt structure

How should dask be set up inside embedded Python?

Solve data exchange problem

Current `libyt` structure

How should `dask` be set up inside embedded Python?