hunsa / pgmpitunelib

Autotuning MPI Collectives using Performance Guidelines
GNU Lesser General Public License v2.1
2 stars 1 forks source link

Introduction

The PGMPITuneLib library is a tool that relies on self-consistent performance guidelines to automatically tune the performance of MPI libraries.

Performance guidelines require that specialized MPI collective functions are not slower than semantically equivalent implementations using less-specialized functions, which we call mock-up versions. For example, MPI_Allgather should provide a better latency than the (semantically equivalent) call to MPI_Gather followed by an MPI_Broadcast of the results.

PGMPITuneLib is designed to transparently replace the default implementation of an MPI collective function with one of its mock-up implementations, if the corresponding performance guideline is violated.

More details can be found in:

Quick Start

Prerequisites

Building the Code

The code can be built as follows:

cd ${PGMPITUNELIB_PATH}
cmake ./
make

Using PGMPITuneLib

PGMPITuneLib provides two different libraries:

  1. PGMPITuneCLI enables the user to select a specific mock-up function implementation for each MPI collective, and can be used to benchmark the performance of MPI applications
  2. PGMPITuneD uses performance profiles to automatically tune applications by redirecting MPI calls to the mock-up implementation that achieved the best performance

PGMPITuneCLI

The user code has to be linked against the PGMPITuneCLI library and then the selected mock-up is transparently used instead of the default implementation.

Example

mpicc *.c -o mympicode -lpgmpitunecli -lmpi 

mpirun -np 2 ./mympicode --module=allgather=alg:allgather_as_gather_bcast

If the command line arguments are interfering with the paramater parsing of the actual binary, you can pass the options to the library by an environment variable called PGMI_PARAMS:

export PGMPI_PARAMS="--module=allgather=alg:allgather_as_gather_bcast"
mpirun -np 2 ./mympicode

PGMPITuneD

The user code has to be linked against the PGMPITuneD library.

To inform the library which MPI collectives should be replaced with mock-up implementations, the user needs to provide the path to a directory containing performance profiles as a command-line argument to the application call.

A performance profile records the MPI collective name, the number of processes for which the tuning was performed, and a list of message size ranges for which the function should be replaced with a different algorithm. An example is provided in ${PGMPITUNELIB_PATH}/test/perfmodels/models1/p_allgather.prf.

# test profile
#
MPI_Allgather # collective name
4 # profile for p=4 procs
3
1 allgather_as_allreduce 
2 allgather_as_alltoall 
3 allgather_as_gather_bcast
4 # nb of (msg size range + alg id)
16 16 1
32 32 2
64 128 1
1024 2048 3

Example

Similarly, you can also pass this option by an environment varrable:

PGMPI_PARAMS="--ppath=${PGMPITUNELIB_PATH}/test/perfmodels/models1" mpirun -np 2 ./mympicode 

Use a configuration file for memory requirements

Add the --config command-line argument to specify the path to a configuration file. The configuration file should contain a list of key-value pairs, one per line, separated by a single space character. Comment lines (starting with =#=) are also accepted.

The configuration file is useful to modify the default amount of memory that can be used in the implementation of mock-up functions. If a mock-up requires more memory than the limit imposed by the configuration file, the default MPI collective will be used instead.

Configuration file example

# Size limit for the additional data buffers used by mock-up functions
size_msg_buffer_bytes 100000

# Size limit for the additional counts arrays used by mock-up functions
size_int_buffer_bytes 10000

List the mock-up functions implemented for each MPI collective

${PGMPITUNELIB_PATH}/bin/pgmpi_info