Ring-buffer Inter-Process Interface

CUPTI PC Sampling (see #294) can only be done from the program that executes the CUDA Kernels itself.

This means that implementing CUPTI support in lo2s is only possible by creating a separate CUPTI sampling support library and using LD_PRELOAD to inject it into the application under measure.

This of course needs some mechanism for the injected library to communicate with lo2s itself, most likely using a ring buffer over shared-memory.

As such a foreign interface might be useful outside of the CUPTI directly, i think this inter-process interface warrants its own discussion.

There are two direct questions:

How should the technical solution look like? shm_open+mmap+own ring buffer implementation, or is there already a turnkey solution for it?
How much genericity should we bake into the design?

tud-zih-energy / lo2s

Ring-buffer Inter-Process Interface #302