Open changliu777 opened 1 year ago
You should rely on proven benchmarks such IMB from Intel or the OSU test suite from Ohio University in order to evaluate MPI performances.
The very first collective from any communicator might require some connexions to be established, and hence be much slower from the following ones, so you should at least run a few warmup iterations in order to hide these one time costs.
Since the simpler MPI_Allgather()
can do the trick here, did you compare
MPI_Allgather()
vs MPI_Allgatherv()
performances?
The default algorithm used here might also not be the fastest, so you can
consider evaluating
coll/tuned
vs coll/han
and try to change the algorithms used by these
modules.
Message ID: @.***>
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
openmpi-v5.0.x-202306140342-9260266
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
from a source/distribution tarball
Details of the problem
I am trying to understand the performance issue of OpenMPI
MPI_Allgatherv
performance, and compare with the the MPICH shipped by Cray. Here is the test code I used,and here is the results of running OpenMPI compiled version on 32 nodes,
For comparison, the results using Cray MPICH
So there is a big slowdown on OpenMPI side. This results is obtained using the daily tarball openmpi-v5.0.x-202306140342-9260266. I have tested the master branch and got similar result. For openmpi-v4.0.x branch the performance is 3x slower.
I also found another interesting issue. The above test was done by allocating one process per node. If I put all 32 processes on one node, I got the following results for OpenMPI
and for Cray MPICH
so the performance is close if data transfer happens on one single node.