Open k202077 opened 9 months ago
This is a request for a performance improvement of MPI_Group_difference()
. It is unlikely that we'll take such an improvement back on the v4.1.x series -- that series is (slowly) being retired in favor of the v5.0.x series. I.e., we're still actively taking bug fixes, but not necessarily new features / overhauls of existing algorithms.
In a setup (using OpenMPI 4.1.3) with >14,000 processes, we noticed an unusually long initialization time. While investigating this, we found out that ~60 consecutive calls to
MPI_Group_difference
involving a group, which contained all processes of the run, took several minutes. I suspect that the implementation ofompi_group_dense_overlap
(used byMPI_Group_difference
) is sub optimal for such cases, because it seems to use an algorithm with a time complexity of O(n²) .We could replicate a similar functionality using a collective
MPI_Allreduce
, which was many times faster, even thoughMPI_Group_difference
is a local operation.A more sophisticated algorithm (by for example by using sorted lists of the processes of each group) should be able to improve the performance significantly.