stephenrkell / liballocs

Meta-level run-time services for Unix processes... a.k.a. dragging Unix into the 1980s
http://humprog.org/~stephen/research/liballocs
Other
213 stars 25 forks source link

Temporal safety feature: mirroring mmaps #68

Open stephenrkell opened 2 years ago

stephenrkell commented 2 years ago

In a hypothetical dynamically safe C, we want to implement a temporally safe malloc using virtual address rotation techniques. Basically, each heap arena is mapped N times (say 64). In effect, each heap chunk has a one or more dedicated PTEs, allowing access rights to be taken away. Although doing this on every free() would be slow, this can be mitigated: by batching, by refraining from reusing vaddrs until a background GC-alike has shown them unreachable, and so on. (A performance hazard is that PTs get much bigger, costing memory, and VAS utilisation gets much more sparse, costing additional TLB pressure.)

In liballocs, we don't implement this funky malloc but we can provide some building blocks that make it possible. In fact, going further, we can possibly provide primitives that make it work for (almost) any allocator. One primitive would be to nudge the underlying arena mmap so that it is aligned to a bigger boundary (say, 64 times). Next, we can somehow support mapping 64 copies of it contiguously, transparently to the allocator. Another would be to nudge it from MAP_PRIVATE to MAP_SHARED, again transparently to the allocator. If we can mess with the allocator from the outside like this, we start to be able to support custom allocators at reasonable cost. A final flourish would be to do the bit-twiddling in our generated wrapper code. So, it would be us that's responsible for spreading the issued pointers over the 64-way-mapped VAS. It's ambitious but it'd be a pretty cool demo of our allocator-awareness powers -- enabling a temporal safety trick not just for a single malloc but for any allocator that fits the prerequisite properties (probably, backing directly onto mmap and not allocating super-small super-packed chunks).

stephenrkell commented 2 years ago

A very simple low-effort version would work like this: we batch free-derived mprotect requests. By default, we do them every N calls to free. Wart: if a malloc wants to reuse a region of VAS that is mprotect-pending, we just do the mprotect up-front. This probably has some bad patterns which we'd need to investigate and use to refine the approach.

stephenrkell commented 2 years ago

We need to think about how to re-add permissions. In fact that exposes a flaw in the above: if a malloc wants to re-use that VAS region, it's too late to protect it -- by definition, it needs to be not-protected so that the malloc's client can use it! So it's our job to prevent this from happening. We twiddle the bits, so we need to twiddle them such that they land on a VAS region that is not only free but not mprotected and not-mprotect-pending.

Obviously, the available VAS ranges will decline over time unless we can unprotect pages, based on a GC-like analysis.

stephenrkell commented 2 years ago

This segues into our idea of providing coarse-grained (bigalloc-level) may-reach and must-reach approximations. Doing our GC sweep we only need to consider bigallocs (roots) that may-reach our heap.

Do we want to conceptualise the mmap-mirroring stage as an interposing allocator? i.e. instead of directly targeting mmap we logically rebase it onto our mmap-mirroring layer that itself targets mmap. We have the choice of whether we do this by dispatching intercepted mmap calls or by link-time-rewriting the code to use a magic mmap-like function.

stephenrkell commented 2 years ago

The may-reach relation between data and text segments is populated by (1) fixed-offset knowledge within a DSO, and (2) symbol binding info available from ld.so audit upcalls. Note there is no 'relocation' case because that is subsumed by symbol binding (in the case of relocs with symbols) or fixed-offset (in the case of RELATIVE relocs).

Also remember we proposed to keep approximations for may-reach-immediately and may-reach-transitively....