open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.14k stars 859 forks source link

MPI internal error in MPI_Dist_graph_create_adjacent #8425

Open rajicon opened 3 years ago

rajicon commented 3 years ago

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

v5.0.0a1

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

git clone

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

4a43c39c89037f52b4e25927e58caf08f3707c33 opal/mca/hwloc/hwloc2/hwloc (hwloc-2.1.0rc2-53-g4a43c39c) 8283e81d1c0fd078e0f7fa85a383b633c328254b opal/mca/pmix/pmix4x/openpmix (v1.1.3-2431-g8283e81d) 66c73f74cc4afd4ead5454771d98b5f199b7fe0e prrte (dev-30650-g66c73f74cc)

Please describe the system on which you are running


Details of the problem

I get the following error:

[MacBook-Pro:00000] *** An error occurred in MPI_Dist_graph_create_adjacent
[MacBook-Pro:00000] *** reported by process [1631191041,0]
[MacBook-Pro:00000] *** on communicator MPI_COMM_WORLD
[MacBook-Pro:00000] *** MPI_ERR_INTERN: internal error
[MacBook-Pro:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[MacBook-Pro:00000] ***    and potentially your MPI job)

I'm using Java, and this appears after running my program for a while (so it doesn't break down the first call). Do you have any suggestions on where to start looking if an internal error like the above occurs? Any insight or tips would be appreciated!

devreal commented 3 years ago

@rajicon It would be helpful to a) know your setup (system configuration, network, configure command for OMPI); and b) to have a small reproducer. Can you reproduce the issue by calling MPI_Dist_graph_create_adjacent in a loop for example? What does the application do leading up to the error?

rajicon commented 3 years ago

Thanks for the reply! Our code is quite complex, so I will try and figure out a small reproducer and get back to you. In the meantime, do you have any suggestions of what the internal error could mean? Is there anything to double check first?

devreal commented 3 years ago

Unfortunately, I am neither familiar with the Java interface, nor with the Dist-graph implementation. Maybe someone else can chime in. But since this is a fairly generic error I'm afraid it\s hard to make sense of it without more details.

rajicon commented 3 years ago

While I try to find a simple version of the problem, I will explain the issue in more detail. We are working on an agent simulation project, where different cpus handle different portions of the environment. This is managed by a QuadTree, where each partition is rebalanced to make sure the amount of work each partition is doing is similar. After rebalancing, the MPI_Dist_graph_create_adjacent is called. Here is the code:

https://github.com/eclab/mason/blob/distributed-3.0/contrib/distributed/src/main/java/sim/field/partitioning/QuadTreePartition.java

specifically line 581 createMPITopo() call, which calls createDistGraphAdjacent() on line 87.

In order to replicate this issue, running the dflockers module will lead to the error after running for a while.

This is quite involved, so I will try to isolate the issue more, but does this suggest any potential problem?

ggouaillardet commented 3 years ago

@rajicon one thing you can do is monitor the memory usage, and check if there is a correlation between the MPI error, and the nodes (or a given Java virtual machine) running out of memory.

you can also try blacklisting the topo/treematch module and see if it helps

mpirun --mca topo ^treematch ...
rajicon commented 3 years ago

Unfortunately blacklisting top/treematch did not solve the issue. I have been looking into the problem some more, and it does seem to be a memory issue. I've noticed that the error always occurs after a specific amount of rebalancing calls (MPI_Dist_graph_create_adjacent calls). I now suspect that perhaps the old mpi graphs are not getting deleted, but I'm still not sure as to why. Have you seen anything like this before?

ggouaillardet commented 3 years ago

Running out of communicator ids (CIDs) could explain this behavior.

rajicon commented 3 years ago

Hi, Can you clarify this some more? How can I verify CIDs and whether we are running out?

ggouaillardet commented 3 years ago

in C, you can use MPI_Comm_c2f(MPI_Comm) in order to get the CID (e.g. an int) of a given communicator.

If there is no communicator leak (e.g. MPI_Comm_free() matches communicator creation), the CID of any newly created communicator should remain low during the application lifetime.

jsquyres commented 3 years ago

Just a quick comment: what @ggouaillardet says is correct, but note that it is a feature of how Open MPI works.

The MPI standard itself does not define a "communicator ID" entity, nor how to portably obtain it. What @ggouaillardet is stating is that Open MPI has a finite number of CIDs (i.e., effectively the number of concurrent communicators that can exist in an Open MPI MPI process). Using MPI_Comm_c2f() will effectively get you the CID because Open MPI's implementation of a Fortran communicator handle is the same thing as Open MPI's CID value.

I just didn't want you to think that this method is guaranteed to work in other MPI implementations.