This repository contains the prototype of the Memory Stall Software Harvester (MSH), the first system designed to transparently and efficiently harvest memory-bound CPU stall cycles in software. Why harvest memory-bound stalls through a software mechanism when there are well-known hardware harvesting mechanisms like Intel Hyperthreads? The answer is that hardware mechanisms are inflexible: they cannot differentiate between latency-sensitive applications and others, and they only provide limited concurrency (e.g., 2 threads), often harvesting too much or too little. MSH allows for adjusting the length and frequency of cycle harvesting more precisely, providing a unique opportunity to utilize stalled cycles of latency-sensitive applications while meeting different latency SLOs. For more details about MSH, please take a look at our OSDI'24 paper.
Our prototype has some assumptions on primary and scavenger applications to simplify the implementation
pthread
library.-fno-omit-frame-pointer
and -mno-red-zone
flags enabled.
The use of MSH involves three pieces of software: profiler (scripts based on perf
), binary instrumentation (llvm-bolt
), and MSH runtime(libmsh
). We assume that a user has a primary application and a set of scavenger applications that will run when there are stalled cycles in the primary application. One can use MSH in the following way.
We'll show you how this workflow works with one simple primary(ptrchase
) and scavenger(compute.so
).
MSH requires a scavenger to have the following symbols in the file containing main
function.
extern "C" {
int crt_pos = 0;
int argc = 0;
char **argv = 0;
}
Then, rename main
function to entry
as shown below.
extern "C" int
entry(void) {
...
}
Lastly, compile the scavenger to a shared object file.
We implemented all the binary-level instrumentation in BOLT. Here is the patch instruction:
git clone https://github.com/llvm/llvm-project.git
git checkout 30c1f31
patch -p1 < msh_bolt.diff
Then, compile BOLT by following the instruction in BOLT page. We'll assume that llvm-bolt
is in PATH
from now on.
# Compile ptrchase
cd apps
mkdir build
make primary
# usage: ./do_prof_primary.sh [binary] [args]
# Pointer chase 50MB array
cd ${HOME}
./do_prof_primary.sh ./apps/build/ptrchase 13107200
Makefile
in apps
has rules that use BOLT to perform binary instrumentation. We assume that llvm-bolt
is in PATH
.
cd apps
make build/ptrchase.bolt
# Compile compute.so
cd apps
make scavenger
cd ${HOME}
echo "$(pwd)/apps/build/compute.so" > scav.txt
./do_prof_scavenger.sh
You can instrument scavenger in a similar way. However, you need to specify the average yield distance (in nanoseconds) in scavenger to bound it.
cd apps
YIELD_DISTANCE=100 make build/compute.so.bolt
We used LD_PRELOAD
trick to attach the runtime to the primary without recompilation.
cd libmsh
make all
cd ..
export LD_PRELOAD=$(pwd)/build/libmsh.so:${LD_PRELOAD}
export LD_LIBRARY_PATH=$(pwd)/build:${LD_LIBRARY_PATH}
echo "$(pwd)/apps/build/compute.so.bolt" > scav.txt
export MSH_SCAV_POOL_PATH=$(pwd)/scav.txt
export SKIP_FIRST_THREAD=1
./build/apps/ptrchase.bolt 13107200
Note: don't forget to reset LD_PRELOAD
when you finish testing MSH. MSH runtime will intercept all the following pthread
functions otherwise
Sam Son and Zhihong Luo