pmodels / casper

Process-based Asynchronous Progress Model for MPI Communication
https://pmodels.github.io/casper-www/
Other
9 stars 4 forks source link
                             Casper Notes

==================================== Introduction

This project provides a portable and flexible process-based asynchronous progress model for MPI remote memory access (RMA) communication and two-sided (point-to-point) communication.

==================================== Getting Started

  1. Installation

    The simplest way to configure Casper is using the following command-line:

    $ ./configure --prefix=/your/casper/installation/dir CC=/path/to/mpicc

    If you have an MPI installation in your path (which mpicc), you do not need to explicitly provide the CC variable:

    $ ./configure --prefix=/your/casper/installation/dir

    Once you are done configuring, you can build the source and install it using:

    $ make
    $ make install
  2. To check if Casper built correctly, you can do the following:

    $ make check
  3. To start using Casper, you need to let Casper intercept your calls to MPI before they reach the MPI library.

    • If your application is using a dynamic MPI library (common case), you can simply preload the Casper library while executing the application:

      $ mpiexec -genv LD_PRELOAD=/your/casper/installation/dir/lib/libcasper.so -n 4 ./test

    • If your application is using a static MPI library, you'll need to recompile (actually just relink) your application by placing the Casper symbols before the MPI symbols. You can do so using:

      $ mpicc -o test test.c -lcasper -lmpi

      Once you have linked with Casper, you can run as normal:

      $ mpiexec -n 4 ./test

  4. Execution

    • You can set the number of ghost processes per node through the environment variable CSP_NG, it is set to 1 by default.

      [Example 1] Running on 2 node, 2 ghost processes and 6 user processes on eacn node.

      $ mpiexec -genv CSP_NG=2 -np 16 -ppn 8 ./test

      [Example 2] Running on 4 node, 4 ghost processes and 20 user processes on eacn node.

      $ mpiexec -genv CSP_NG=4 -np 96 -ppn 24 ./test

==================================== Support

If you have problems or need any assistance about the Casper installation and usage, please contact casper-users@lists.mpich.org mailing list.

==================================== Bug Reporting

If you have found a bug in Casper, please contact casper-users@lists.mpich.org mailing list. If possible, please try to reproduce the error with a smaller application or benchmark and send that along in your bug report.

==================================== Casper Test Suite

Capser includes a set of testing programs located under test/. These programs can be compiled and run via:

$ cd test
$ make
$ make testing MPIEXEC="<your mpiexec> -n"

For more options to run the test suite, please check:

$ ./test/runtest -h

==================================== Environment Variables

  1. Basic Variables CSP_NG (integer) Specify the number of ghost processes per node, 1 by default.

  2. Advanced Variables (Only specify them if you know what they mean) CSP_RUMTIME_LOAD_OPT (random|op|byte, default random) Specify how to distribute operations when runtime load balancing enabled, random by default.

    CSP_RUNTIME_LOAD_LOCK (nature|force, default nature) Specify how to grant lock when runtime load balancing enabled, nature by default.

==================================== Debugging Options

Enable debugging messages by compiling with "-DCSP_DEBUG -DCSPG_DEBUG". ./configure --prefix= CC= \ CFLAGS="-DCSP_DEBUG -DCSPG_DEBUG" make && make install

==================================== Known Issues

  1. Some MPI features are not or partially supported in Casper. a. MPI Fortran Binding Some MPI Fortran routines are not translated into MPI standard functions, thus cannot be correctly wrapped in Casper. For example, MPICH uses MPIR_* functions to implement following Fortran routines: mpi_comm_getattr mpi_comm_setattr mpi_type_getattr mpi_type_setattr mpi_win_getattr mpi_win_setattr mpi_attrget mpi_attrput

    b. Dynamic Process Routine

    c. No support for derived datatype in point-to-point message offloading routines. User must explicitly set "datatype_used=predefined" info at communicator creation time to enable offloading.

    d. No support for special wildcard message (Use MPI_ANY_SOURCE with distinct tag matching) in point-to-point message offloading routines. User must explicitly set "wildcard_used=none" or "wildcard_used=any_src|any_tag_same_tag" to enable offloading.

  2. Casper currently disables user info "alloc_shared_noncontig=true" in MPI_Win_allocate, to avoid complex management of shared segments displacement on ghost processes.

  3. Casper does not support MPI Error Handler Callback in C++ and Fortran Binding.

  4. The Point-to-Point asynchronous progress routines are not thread-safe.