parallel-runtimes / lomp

Little OpenMP Library
Apache License 2.0
153 stars 17 forks source link

Little OpenMP* Runtime

LOMP, short for Little OpenMP (runtime), is a small OpenMP runtime implementation that can be used for educational or prototyping purposes. It currently only implements a rather small subset of the OpenMP Application Programming Interface for CPUs (i.e., it has no support for offload to target devices and is also missing many CPU-only features).

LOMP was written to demonstrate the design principles outlined in the book High-Performance Parallel Runtimes.

The library uses the same binary interface as clang/LLVM*, and thus is compatible with several compilers that use that interface. Unless you use a feature that LOMP does not currently support, LOMP can serve as a drop-in replacement for the native OpenMP runtime library of a compatible compiler without requiring re-compilation of your application.

The runtime is mostly written in C++14, though with some features of C++17, and can be compiled for a variety of different architectures. There are no assembler files in the runtime, and the use of inline assembly is restricted to a few features (such as reading the high-resolution clock). For architectures that do not have a code path to access the high-resolution clock via inline assembly, we rely on C++ features to measure time. Atomic operations are all accessed though std::atomic.

As well as the source for the runtime there are also a few micro-benchmarks and some (extremely minimal) sanity tests.

As its name suggest, the LOMP library is significantly smaller than the production LLVM OpenMP runtime library. At the time of the initial check-in, the cloc utility in the source directory shows under 6,000 lines of C++ code (and no assembly code). For comparison, the production LLVM OpenMP runtime has around 63,500 lines of C/C++ code and 1400 lines of assembly code in the CPU part of the library. Of course, this is an unfair comparison, since LOMP is missing many features which are supported by the production runtime. However, it does make the point that if you want somewhere to experiment, or an environment in which to set a student project, LOMP may be an easier codebase to work with!

Supported Target Platforms

The LOMP runtime supports the following target architectures (in parentheses we show the architecture name as reported by the uname command):

The library works with Arm 64-bit processors running macOS (announced as (arm64) by uname there), but an LLVM compiler from at least April 2021 is required, since there was a compiler bug(unrelated to the runtime) which broke OpenMP tasks there as a result of incorrect assumptions about varargs argument passing on that platform. (See this bug for details.)

Supported OpenMP Features

The language supported by LOMP is restricted to a small subset of the OpenMP API for shared-memory multi-threading. Supported OpenMP features are

The supported OpenMP runtime routines are:

Important Things Which Are Not Yet Supported

Since this is a small and relatively simple runtime (at least for now), there are few restrictions and many things which have not yet been implemented. A, possibly incomplete, list is:

The runtime is also, of course, limited in the language it can support by the compiler. There are therefore some OpenMP API version 5.1 features which are not yet implemented since there is no compiler support for them yet.

If you would like to contribute any features to LOMP, please see below.

How to Build (and Install)

Here are some, hopefully useful, remarks about how you can set up LOMP on your system. The instructions come without any warranty, and may be wrong, or incomplete.

Software Versions

To build the LOMP library, you need the following software environment:

Other versions of software tools may work, but we have not tested them with our code.

While the LOMP library itself compiles fine with the GNU Compiler Collection (GCC), we have not implemented all of the entry points that GCC requires for its OpenMP support. So, while you can compile the LOMP runtime code with GCC, you will need a clang-compatible compiler to generate code to exercise the LOMP library that you have built.

The micro-benchmarks (in the directory microBM) should work with any OpenMP implementation.

Building LOMP

Building LOMP follows the usual process of building a CMake-based project. Here are the steps needed:

The default build configuration is "Release" mode, which enables compiler optimizations. Please see below for how to change this default.

To use LOMP with an existing code, once you have built the library, you should be able to use LD_LIBRARY_PATH on Linux (or DYLD_LIBRARY_PATH on macOS) to place its directory before the system one where the production OpenMP library lives so that LOMP is used without needing to recompile your executable. If you also set LOMP_DEBUG=1 you should see some output that proves that you are using the library you expect. (Of course, the ldd command on Linux can also show you that.)

If you compiled the Hello World example that comes with LOMP using one of the supported OpenMP compilers and if LOMP has been compiled in $HOME/build_lomp, the following will dynamically bind LOMP to your compiled code:

$ export LD_LIBRARY_PATH=$HOME/build_lomp/src/:$LD_LIBRARY_PATH
$ LOMP_DEBUG=1 OMP_NUM_THREADS=4 ./a.out
Before parallel region
=======================================
LOMP:runtime version 0.1 (SO version 1) compiled at 19:26:59 on Jan 28 2021
from Git commit 0abcdef for x86_64 by LLVM:11:0:0
LOMP:with configuration -mrtm;-mcx16;DEBUG=10;LOMP_GNU_SUPPORT=1;LOMP_HAVE_RTM=1;LOMP_HAVE_CMPXCHG16B=1
Hello World: I am thread 1, and my secrets are 42.000000 and 21
Hello World: I am thread 2, and my secrets are 42.000000 and 21
Hello World: I am thread 0, and my secrets are 42.000000 and 21
Hello World: I am thread 3, and my secrets are 42.000000 and 21
=======================================
After parallel region
$

By using the export statement you will make LOMP your default OpenMP runtime for processes started from this shell. If you didn't want that, remember to reset LD_LIBRARY_PATH.

CMake Configuration Options

The following options can be set using the cmake command line interface:

Installing LOMP

LOMP supports to be installed using the install target. The location is determined via the -DCMAKE_INSTALL_PREFIX=<path> configuration option for CMake. After a successful build, the following will install LOMP

Environment Variables

The LOMP runtime library supports various environment variables that control its behavior:

Note that when debugging the library it is often convenient to change the order of the debug tags, so as only to print information from the subsystem of interest, so the values for LOMP_DEBUG may change over time.

Micro-Benchmarks

The micro-benchmarks are in the microBM directory. These were used to measure hardware properties shown in the book. You can use them to measure the properties of your own machines. Each benchmark can be invoked by an appropriate Python script which will run all of the available measurements, or the ones which you request and write appropriately named files containing the results.

To use the scripts, ensure that your current directory is the microBM directory in the appropriate build, then execute the relevant Python script from the microBM source directory, e.g.

    $ cd lomp_build/microBM
    $ python ~/lomp/microBM/runAtomics.py
    Arch (may be wrong!):
    Model:
    Cores:  8
    Running  OMP_NUM_THREADS=8 KMP_HW_SUBSET=1T KMP_AFFINITY='compact,granularity=fine' ./atomics Ie > AtomicsIe_Mac-mini_2021-01-28_1.res
    ./atomics Ie
    ........
    Running  OMP_NUM_THREADS=8 KMP_HW_SUBSET=1T KMP_AFFINITY='compact,granularity=fine' ./atomics If > AtomicsIf_Mac-mini_2021-01-28_1.res
    ./atomics If
    ........
    Running  OMP_NUM_THREADS=8 KMP_HW_SUBSET=1T KMP_AFFINITY='compact,granularity=fine' ./atomics Ii > AtomicsIi_Mac-mini_2021-01-28_1.res
    ./atomics Ii
    ........
    Running  OMP_NUM_THREADS=8 KMP_HW_SUBSET=1T KMP_AFFINITY='compact,granularity=fine' ./atomics It > AtomicsIt_Mac-mini_2021-01-28_1.res
    ./atomics It
    ........
    $

The output files can be converted into web pages containing tabulated results and plots by using the plot.py script in the scripts directory and feeding it the various files for a specific measurement, and should also be easy to read. If you feel the desperate need to use a spreadsheet, the toCSV.py script in the scripts directory will convert them into a comma-separated file format which can be read into whichever spreadsheet you suffer.

Bugs

Please submit bug reports or other feedback via GitHub issues by filling in the issue templates that are provided.

Contributions

Contributions are welcome, as is other feedback. We hope that this runtime will provide a useful environment for experimentation with new OpenMP features (such as different loop schedules, or barrier implementations), while remaining simple enough to be easy to use in a university course.

If you want to contribute (for fame and glory), please fork the repository and submit a pull request with the changes you would like to make

License

The LOMP runtime is licensed under the "Apache 2.0 License with LLVM exceptions" license. Any contributions must use that license to be acceptable.

Reference

If you use LOMP in your research and publish a paper, please use the following in your citation:

Jim Cownie and Michael Klemm, Little OpenMP Runtime, https://github.com/parallel-runtimes/lomp, February 2021.

Contributors

Trademarks

Trademarks and registered names are marked with an asterisk (*) at their first use where we have recognized them. Other names and trademarks may be the property of others.