multicore-locks / litl

LiTL: Library for Transparent Lock Interposition
MIT License
79 stars 21 forks source link

LiTL: Library for Transparent Lock interposition

LiTL is a library that allows executing a program based on Pthread mutex locks with another locking algorithm.

Building

make

Dependencies

Execution

Launch your application using one of the provided scripts. The name of the script to you use depends on the chosen lock.

Examples:

Details

Usage

The algorithms are named according to the following schema: lib{algo}_{waiting_policy}.sh.

The waiting policy can either be spinlock, spin_then_park or original.

Some algorithms come with different waiting policies (Malthusian, MCS, C-BO-MCS, CLH). For example, if you want to execute your application with the MCS lock, using a spin-then-park waiting policy, use libmcs_spinlock.sh.

For the other algorithms, the name of the script to use is of the form lib{algo}_original.sh. In this case, the waiting policy depends on the one used in the original design of the corresponding lock (spin in most cases).

The library uses LD_PRELOAD to intercept calls to most of the pthread_mutex_* functions.

Supported algorithms

Name Ref Waiting Policy Supported Name in the Paper [LOC] Notes and acknowledgments
ALOCK-EPFL [AND] original (spin) alock-ls From libslock
Backoff [MCS] original (spin) backoff From concurrencykit
C-BO-MCS [COH] spinlock or spin-then-park c-bo-mcs_spin and c-bo-mcs_stp
CLH [SYN] spinlock of spin-then-park clh_spin and clh_stp
CLH-EPFL [SYN] original (spin) clh-ls From libslock
C-PTL-TKT [COH] original (spin) c-ptl-tkt
C-TKT-TKT [COH] original (spin) c-tkt-tkt
HMCS [HMC] original (spin) hmcs
HT-LOCK-EPFL [EVR] original (spin) hticket-ls From libslock
HYS-HMCS [HYS] original (spin) ahmcs
Malthusian [MAL] spinlock or spin-then-park malth_spin and malth_stp This is the Malthusian-MCS version.
Mutexee [MUT] original (spin_then_park) mutexee From lockin
MCS [MCS] spinlock or spin-then-park mcs_spin and mcs_stp From RCL
MCS-EPFL [MCS] original (spin) mcs-ls From libslock
MCS-TP [PRE] original (spin hybrid) mcs-timepub From RCL
Partitioned [PAR] original (spin) partitioned
Pthread-Adaptive [ADP] original (spin_then_park) pthreadadapt Wrapper around pthread lock with adaptive policy
Pthread-Interpose - original (park) pthread Wrapper around classic pthread lock
Spinlock [SYN] original (spin) spinlock
Spinlock-EPFL [SYN] original (spin) spinlock-ls From libslock
Ticket [MCS] original (spin) ticket From lockless
Ticket-EPFL [MCS] original (spin) ticket-ls From libslock
TTAS [AND] original (spin) ttas
TTAS-EPFL [AND] original (spin) ttas-ls From libslock

Note that the pthread-adaptive and pthread-interpose wrappers are provided only for fair comparison with the other algorithms (i.e., to introduce the same library interposition overhead).

Support for condition variables

Summary of the approach

As explained in [LOC], we rely on classic Pthread condition variables to implement condition variables. Here is some pseudo-code to summarize the approach:

// return values and error checks
// omitted for simplification

pthread_mutex_lock(pthread_mutex_t *m) {
    optimized_mutex_t *om;
    om = get_optimized_mutex(m);
    if (om == null) {
        om = create_and_store_optim_mutex(m);
    }
    optimized_mutex_lock(om);
    real_pthread_mutex_lock(m);
}

pthread_mutex_unlock(pthread_mutex_t *m) {
    optimized_mutex_t *om;
    om = get_optimized_mutex(m);
    optimized_mutex_unlock(om);
    real_pthread_mutex_unlock(m);
}

pthread_cond_wait(pthread_cond_t *c,
                  pthread_mutex_t *m) {
    optimized_mutex_t *om;
    om = get_optimized_mutex(m);
    optimized_mutex_unlock(om);
    real_pthread_cond_wait(c, m);
    real_pthread_mutex_unlock(m);
    optimized_mutex_lock(om);
    real_pthread_mutex_lock(m);
}

// Note that the pthread_cond_signal and
// pthread_cond_broadcast primitives
// do not need to be interposed

This strategy does not introduce contention on the Pthread lock, as the latter is only requested by the holder of the optimized lock associated with the critical section.

Some people have raised concerns about the possibility that several threads contended on the Pthread lock, especially for workloads using pthread_cond_broadcast. However, on Glibc/Linux, pthread_cond_broadcast is implemented (via the futex syscall) in a way such that it does not wake up several threads at the same time. Indeed, according to the source code of the pthread_cond_broadcast implementation, the broadcast function simply wakes up a single thread from the wait queue of the condition variable and transfers the remaining threads to the wait queue of the Pthread lock. This is also confirmed in the futex(2) man page. So, overall, for workloads that use pthread_cond_broadcast and/or pthread_cond_signal, it is unlikely to have more than two threads contending for the Pthread lock at the same time.

Disabling support for condition variables

By default, the locks are built with the above-described support for condition variables. However, if you know that the target application does not use condition variables, you can build the locks without it. This allows optimizing the critical path for lock acquisition/release by removing the need to acquire/release the underlying Pthread lock. To do so, use make no_cond_var.

Trylock primitives

Implementation

Some algorithms do not come officially with a trylock primitive (i.e., non blocking lock request). Here is how we implemented trylock for them :

Unsupported locks

To the best of our knowledge, the design of these algorithms does not support trylock semantics:

Adding a new lock

The library is designed to be extensible. In order to introduce a new lock algorithm, consider the following steps:

  1. Edit Makefile.config: add one line with the format algo_waitingstrategy (algo without spaces in lowercase, waitingstrategy is original, spinlock or spin_then_park)
  2. Add a file include/algo.h: take spinlock as an example
  3. Add a file src/algo.c: take spinlock as an example
  4. Modify the top of the interpose.c file to add a #ifdef for your algorithm (the Makefile takes care of everything)

Remarks:

Cascading interposition libraries

You may want to capture statistics for different locks. For example, if you want to capture the concurrency of the MCS algorithm, you can do the following:

./libconcurrency_original.sh ./libmcs_spinlock.sh my_program

Details

In order to be able to chain interposition libraries, we must add versions to the symbols we export. This is done using a symbol map (see src/interpose.map) and by adding a symver asm symbol after the function declaration (see src/interpose.c). Without that, the library is not able to get the function pointer address of the next function using dlvsym.

References and acknowledgments

Lock algorithms

Implementations

Some lock implementations are borrowed (fully or partially) from source code repositories developed by other people. While we try to cite the authors of the original implementation (and the corresponding license) at the beginning of each file, we may have made mistakes and omissions. Please contact us if you notice any issue.

Sources:

LiTL also uses the CLHT hashtable. This hashtable is used to link a pthread_mutex_lock with the underlying data structure of the interposed lock (e.g., MCS).