Open yizhang-yiz opened 6 years ago
Sure, we can make mpi work together with other libs. What concerns me though is how your propsed change interferes with map rect in its current form. I would be worried that we would break map rect if we would merge your pr.
I anyway wanted to refactor the mpi cluster object and turn it into a proper Meyer singleton type of thing. Thus, we should probably do that in a single change. What do you think?
Frankly I don't see the need of singleton at all, since MPI_COMM_WORLD
is global already, and one can access it without touching mpi_cluster
, which makes singleton pointless.
In the current form mpi_cluster
is only implicitly a singleton. This is because there can only ever be a single instance of the boost::mpi::environment
variable in any given program. Because of that you cannot have two instances of mpi_cluster
in any stan program. At least this is what I recall.
What is also a problem is that I am using a lot in the code boost::mpi::communicator world
objects. To make things work one would have to change those instances to use the one in mpi_cluster
. This sounds like the need for a singleton here to me.
This is because there can only ever be a single instance of the boost::mpi::environment variable in any given program.
This is not true. boost::mpi::environment
just manages MPI sessions. You can have multiple instances of it, and it will check MPI_Initialized
to make sure it'll work. Try run the following
TEST(mpi_env, multiple) {
boost::mpi::environment a;
boost::mpi::environment b;
boost::mpi::environment c;
}
in mpi_cluster_test.cpp
you'll see what I mean.
What is also a problem is that I am using a lot in the code boost::mpi::communicator world objects. To make things work one would have to change those instances to use the one in mpi_cluster.
This is not true. As long as MPI is initialized, boost::mpi::communicator world
always points to the default communicator, no matter where it's used. Check the boost mpi manual. You can also check by adding multiple instances of boost::mpi::communicator world
in the above example and examine their contents.
To demonstrate my point, try run the following
#include <boost/mpi/communicator.hpp>
#include <boost/mpi/environment.hpp>
void foo() {
boost::mpi::environment c;
boost::mpi::communicator world;
std::cout << "world size in foo: " << world.size() << "\n";
}
int main() {
boost::mpi::environment a;
boost::mpi::environment b;
boost::mpi::communicator world1;
boost::mpi::communicator world2;
foo();
std::cout << "world size in main: " << world1.size() << "\n";
std::cout << "world size in main: " << world2.size() << "\n";
foo();
return 0;
}
The output will be
world size in foo: 3
world size in main: 3
world size in main: 3
world size in foo: 3
world size in foo: 3
world size in main: 3
world size in main: 3
world size in foo: 3
world size in foo: 3
world size in main: 3
world size in main: 3
world size in foo: 3
Because world
, world
, world2
are all attached to the global communicator.
Ah, ok. And how does this change when we use the boost::mpi::comm_create_kind
argument? Do we then need a singleton so that we always construct world
object using this specific kind of world communicator?
Not sure how to subscribe best, so just commenting to do that.
Do we then need a singleton so that we always construct world object using this specific kind of world communicator?
As long as it's constructed using default constructor, it'll be attached to the global communicator. So no. The bottom line is environment
is just wrapper around MPI_Init
, and its status is always global.
Ok. I need to read up a bit.
One more thing: the boost::environment
object going out of scope is also important in order to ensure proper shutdown of the MPI resource. Do we need to take special considerations for that?
The only boost::environment
object that matters is the one created first. When it goes out of scope, MPI gets finalized. Whether or not this object is in a singleton class is irrelevant: if it's in a singleton, the singleton object goes out of scope, MPI gets finalized; If it's in a regular class, the first object of that class goes out of scope, MPI gets finalized. In the second case, one can create as many boost::environment
objects after the first one as he wants, just that they won't be of any effects.
Is there an example somewhere online with code that does this kind of thing (and the stuff in #948) (or some other reference material) that you know offhand?
Cool, thanks.
Summary:
Allows other types of mpi communicator used in
mpi_cluster
Description:
Currently we only allow
MPI_COMM_WORLD
in MPI jobs. This is not enough if we want to mix Stan with other MPI-enabled libraries when Stan only utilized part of the MPI cluster in the whole cluster, or to allow Stan to dispatch part of cluster to a specific task. An application is when used with a PDE solver where there may be multiple groups of nodes, each of which running in its own communicator for part of the PDE problem.Reproducible Steps:
n/a, new feature
Current Output:
n/a
Expected Output:
mpi_cluster
uses user-suppliedcommunicator
.Additional Information:
Provide any additional information here.
Current Version:
v2.18.0