open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.12k stars 856 forks source link

Is there any Scientific or Engineering Application dominated by the performance of LARGE message MPI_Allreduce? #10417

Open pengjintao opened 2 years ago

pengjintao commented 2 years ago

Thank you for taking the time to submit an issue!

Background information

I have developed a new Allreduce algorithm for large message, and integrated into OpenMPI. But I cannot find a batch of applications with dominated by large message allreduce (Except the distributed deep learning and Mini-AMR applications.

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

OMPI 4.1.3

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

From release source, configured with ucx, CentOS 7.6.

Please describe the system on which you are running


Details of the problem

Except the distributed deep learning and Mini-AMR applications,which applications are dominated by LARGE message MPI_Allreduce on node number in 16-256, ppn=24。

jsquyres commented 2 years ago

@bosilca @janjust Can you guys chime in here?

bosilca commented 2 years ago

Nothing to add, these are the apps we are using for testing allreduce.

ggouaillardet commented 2 years ago

What about HPCG?

My understanding is the communication part (that is generally not very significant compared to the compute part) is basically based on MPI_Allreduce(). Does the typical message size qualify as "large"?

edgargabriel commented 2 years ago

@bosilca If I recall correctly, some of the applications doing in-memory checkpointing and using encoding approaches for reliability purposes where pretty heavy on reductions (both for the encoding and decoding part). Not sure whether they are still in use or not.

pengjintao commented 2 years ago

Nothing to add, these are the apps we are using for testing allreduce.

Where can i find these applications.

pengjintao commented 2 years ago

What about HPCG?

My understanding is the communication part (that is generally not very significant compared to the compute part) is basically based on . Does the typical message size qualify as "large"?MPI_Allreduce()

In my implementation, As the message size larger than 64KB, my algorithm will be used.

wzamazon commented 2 years ago

HPCG is at https://www.hpcg-benchmark.org/

bosilca commented 2 years ago

I meant we are using the same app that you mentionned in your message.

HPCG could be an interesting case, in order to increase the share of the reduction on the execution time you will need to 1) have a large problem size and 2) have a large number of participants. Not sure what exactly you are targeting, but you might need more than 128 participants.

@edgargabriel is right, the reduction was cornerstone in some of the reliability approaches. However, they can hardly be considered as benchmarks (mostly ignored by the community), and would also test a similar case as ML (which is an accepted benchmark).