Introduction

The PGMPITuneLib library is a tool that relies on self-consistent performance guidelines to automatically tune the performance of MPI libraries.

Performance guidelines require that specialized MPI collective functions are not slower than semantically equivalent implementations using less-specialized functions, which we call mock-up versions. For example, MPI_Allgather should provide a better latency than the (semantically equivalent) call to MPI_Gather followed by an MPI_Broadcast of the results.

PGMPITuneLib is designed to transparently replace the default implementation of an MPI collective function with one of its mock-up implementations, if the corresponding performance guideline is violated.

More details can be found in:

Sascha Hunold and Alexandra Carpen-Amarie. 2018. Autotuning MPI Collectives using Performance Guidelines. In Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2018). Association for Computing Machinery, New York, NY, USA, 64–74. DOI: https://doi.org/10.1145/3149457.3149461

Quick Start

Prerequisites

an MPI library
CMake

Building the Code

The code can be built as follows:

cd ${PGMPITUNELIB_PATH}
cmake ./
make

Using PGMPITuneLib

PGMPITuneLib provides two different libraries:

PGMPITuneCLI enables the user to select a specific mock-up function implementation for each MPI collective, and can be used to benchmark the performance of MPI applications
PGMPITuneD uses performance profiles to automatically tune applications by redirecting MPI calls to the mock-up implementation that achieved the best performance

PGMPITuneCLI

The user code has to be linked against the PGMPITuneCLI library and then the selected mock-up is transparently used instead of the default implementation.

Example

replace calls to MPI_Allgather with a semantically equivalent function that uses MPI_Gather followed by an MPI_Bcast to obtain the same results

mpicc *.c -o mympicode -lpgmpitunecli -lmpi 

mpirun -np 2 ./mympicode --module=allgather=alg:allgather_as_gather_bcast

If the command line arguments are interfering with the paramater parsing of the actual binary, you can pass the options to the library by an environment variable called PGMI_PARAMS:

export PGMPI_PARAMS="--module=allgather=alg:allgather_as_gather_bcast"
mpirun -np 2 ./mympicode

PGMPITuneD

The user code has to be linked against the PGMPITuneD library.

To inform the library which MPI collectives should be replaced with mock-up implementations, the user needs to provide the path to a directory containing performance profiles as a command-line argument to the application call.

A performance profile records the MPI collective name, the number of processes for which the tuning was performed, and a list of message size ranges for which the function should be replaced with a different algorithm. An example is provided in ${PGMPITUNELIB_PATH}/test/perfmodels/models1/p_allgather.prf.

# test profile
#
MPI_Allgather # collective name
4 # profile for p=4 procs
3
1 allgather_as_allreduce 
2 allgather_as_alltoall 
3 allgather_as_gather_bcast
4 # nb of (msg size range + alg id)
16 16 1
32 32 2
64 128 1
1024 2048 3

Example

use the provided test profile to tune MPI_Allgather

mpicc *.c -o mympicode -lpgmpituned -lmpi 

mpirun -np 2 ./mympicode --ppath=${PGMPITUNELIB_PATH}/test/perfmodels/models1

Similarly, you can also pass this option by an environment varrable:

PGMPI_PARAMS="--ppath=${PGMPITUNELIB_PATH}/test/perfmodels/models1" mpirun -np 2 ./mympicode

Use a configuration file for memory requirements

Add the --config command-line argument to specify the path to a configuration file. The configuration file should contain a list of key-value pairs, one per line, separated by a single space character. Comment lines (starting with =#=) are also accepted.

The configuration file is useful to modify the default amount of memory that can be used in the implementation of mock-up functions. If a mock-up requires more memory than the limit imposed by the configuration file, the default MPI collective will be used instead.

Configuration file example

# Size limit for the additional data buffers used by mock-up functions
size_msg_buffer_bytes 100000

# Size limit for the additional counts arrays used by mock-up functions
size_int_buffer_bytes 10000

List the mock-up functions implemented for each MPI collective

${PGMPITUNELIB_PATH}/bin/pgmpi_info

hunsa / pgmpitunelib

readme

Introduction

Quick Start

Prerequisites

Building the Code

Using PGMPITuneLib

PGMPITuneCLI

Example

PGMPITuneD

Example

Use a configuration file for memory requirements

Configuration file example

List the mock-up functions implemented for each MPI collective