About
Building
Portals utilizes the Autoconf/Automake/Libtool build system. The standard GNU configure script ane make system is used to build Portals
To build:
% autoreconf (only once, or when the build system has been updated)
% ./configure
The "make check" step is not strictly necessary, but is a good idea.
Options to configure include:
--prefix=
Portals 4 Implementation Notes
There is currently one Portals 4 implementation, ib. The ib implementation supports multiple transports, including InfiniBand, UDP/Reliable UDP (experimental), and SHMEM.
Infiniband: this transport has multiple options, including not actualling using IB:
Infiniband, selected by default, can be disabled with --disable-transport-ib
shared memory, not selected by default, can be enabled with --enable-transport-shmem. It has the option of using KNEM for the large transfer (used if --with-knem=xxxx is present), or to use an internal slower shared memory protocol.
Both Infiniband and shmem can be used at the same time. In that case, Portals will use infinband between nodes, and shmem intra-node. If only shmem is used, then portals will not run between nodes (obviously).
On top of that, 2 different environments can be generated:
the "fat library" which contains all the portals API. Each process has its own progress thread which leads to the use of too many cores on a machine. This is the default.
the Portals Progres Engine (PPE) and the light library. The PPE is a daemon that regroups the progress threads (one by default) for all the portals process on the host. The portals application links to the "light library" which is only a pass through between the application and the PPE. Very litlle processing is done in the light library; most of it is done inside the PPE. This environment is selected with --enable-ib-ppe. The XPMEM driver must be present, and so it is mutually exclusive with the shmem transport. The daemon must be started prior to running a Portals4 application:
Both the fat and light library are called the same, thus only one type can exist on a machine at the same time; use LD_LIBRARY_PATH and/or LD_PRELOAD to work around.
Pre-requisites
libev-4.0.4 (or similar), available at http://software.schmorp.de/pkg/libev.html
The KNEM driver from
http://runtime.bordeaux.inria.fr/knem/. Will be used for
shared memory if --with-knem=
The XPMEM driver from https://code.google.com/p/xpmem/. Mandatory if the PPE is selected, unused otherwise. The current driver has crashing issues and doesn't compile on recent kernel versions.
The ummunotify driver from https://github.com/Portals4/ummunotify/ (required for correctness with the IB transport). Ensure that /dev/ummunotify is readable/writable by the user running the portals software. WARNING: Not using ummunotify may result in program incorrectness..
Build: Note that the paths to mpi may not be necessary or may need to be changed depending on your linux distribution.
minimal, no shared memory: ./configure
optimized, with shared memory and KNEM support: ./configure --enable-transport-shmem --with-knem=/opt/knem --enable-fast
PPE, no IB transport (so local node only): ./configure --enable-ib-ppe --disable-transport-ib
Then type "make".
Test:
Environment variables:
PTL_DISABLE_MEM_REG_CACHE=[0|1] deactivates/activates the IB memory registration cache. Disabling it no longer requires ummunotify, and the implementation does not keep a registered memory cache.
For instance: PTL_LOG_LEVEL=3 PTL_DEBUG=1 yod -n 1 ./spam
Building an RPM:
Limitations:
Why Portals may not run properly:
The user cannot pin enough memory. By default linux systems allow about 32KB. Check with "ulimit -l". This is for IB only. Edit /etc/security/limits.co onf and change/add the following 2 lines (including the stars):
IB is not up and running (check with "ibv_devinfo") or the default IPoIB interface, ib0, doesn't have an IP address. All interfaces should be on the same IP range.
slow performances can be caused by cpuspeed. Disable it ("service cpuspeed stop").
uneven performance between repeats can be caused by the threads being on different cores. Pin them.
When shmem is used, under the wrong circumstances, the files /dev/shm/portals4-shmem-* are not removed. If a subsequent run tries to re-use the same name, it will fail and the application may exit.
If you get a mr_lookup: Assertion `res == ((void *)0)' failed error, you need to configure Portals 4 with --enable-zero-mrs to enable zero length MR suport.