stephenrkell / liballocs

Meta-level run-time services for Unix processes... a.k.a. dragging Unix into the 1980s
http://humprog.org/~stephen/research/liballocs
Other
213 stars 24 forks source link

Union tracking and pointer stuffing #79

Open stephenrkell opened 9 months ago

stephenrkell commented 9 months ago

Arguably this is more of a libcrunch issue than a liballocs issue, but one way or another it becomes necessary to track additional metadata for when pointers or pointer-derived values are hiding in longs or the like. My imagined answer has been to have some "shadow memory"-esque structure which remains all-zeroes for well-behaved programs that don't do this, but contains some sparse non-zero data in cases where pointers get stuffed.

I think the same structure should probably deal for this and for write-tracking of (non-simultaneous, non-data-discriminable) unions... it's conceptually similar. We really don't want too many shadow memory structures around, and we want them to be easily swappable-out e.g. to enable Valgrind-capable builds (until it does Umbra or something similarly sensible -- does it already? it's been years...).

For unions whose arms are not address-taken, maybe a local metadata trick could be made to work, by effectively changing the ABI to add at least a byte (say) of metadata nearby if no byte can be salvaged internally from alignment. This relates to the 'ABI polymorphism' approach envisaged for fat pointers inside structs/arrays.

For the shadow thing, a first-level structure of one bit per word could be enough, then maybe some kind of splay tree would do -- but this will be slow for workloads that use it heavily. Maybe I need to dust off my "virtual memory data structures" thinking cap (i.e. also revisiting generic_small.c).

stephenrkell commented 9 months ago

It also becomes necessary to handle 'non-pointer stuffing', i.e. putting an integer in pointer-typed storage... the same structure would have to deal with both.

As a possible alternative, in liballocs we have some precedent for dynamically updating the type of some storage to reflect what is actually stored there. This is easy enough for malloc chunks but harder for others. It is worth bearing in mind for this use case also... I concluded elsewhere that make_precise was the wrong solution and that some allocator-level operations should replace it (#53).

stephenrkell commented 9 months ago

It feels like this facility is basically recording "any exception to the declared type", where for unions, a single valid arm is considered a deviation, but also (say) a long stuffed into a pointer would also be an exception, or a pointer into a long.

If we use such an overlay on the "main" type information, we would consult it frequently. I need to think carefully about whether this is wise.

The alternative is for every allocator to support stuffing separately, by varying the types that it exposes for a given allocation. So, say I stuff a long into a stack-allocated pointer. I'd need to ask the stack allocator to vary the type information for that field. We can do this, in principle, although some allocators might decline to support it.

This does not seem great. For void/long issue, we can perhaps sidestep this by deeming them almost alike, i.e. stripping void of its pointerness and more-or-less declaring it to be an opaque value. That is fine for libcrunch because we can check its actual pointerness on a cast to an actually dereferenceable pointe type. But this matters in a GC context where a void* might be keeping an object alive. (So might a pointer-stuffed long!)

stephenrkell commented 6 months ago

The overlay idea is basically an explicit "timely revocation" facility, which must be explicitly checked frequently by (instrumented) code. It could catch use-after-free, non-simultaneous union changes and stuffing (an undeclared version of the non-simultaneous union). In other words there are union-style revocations as well as 'free'-style revocations.

Having an overlay is a bit like having a software null-pointer check.

This feels more expensive than we want. For a long time I've been collecting alternative approaches that can cover some cases or other.

One partial alternative to "sometimes revoke" is "never usably issue". This was our plan for address-taken union arms... if you want a pointer to the union member, you get a trapping one not a usable one. The cost could be high, though. You could save the trapping if you're prepared to "lock" the union at a given member, i.e. later writes to other members will fail.

Another partial alternative is "disambiguable stuffing". If we stuff, we may have enough value-space (e.g. in a 64-bit word) that we can still tell an integer from a pointer, say (if we know the integer is small... NaN-boxing-esque tricks also apply here). This is what I've called a "self-discriminating union" or "data-discriminable union", elsewhere.

"Always reuse compatibly" is another partial approach -- if the type of a given piece of memory is always the same, maybe we don't care about client code confusing lifetimes/epochs because it doesn't break too disastrously. I'm less sure about the wisdom of this... it is a weakened property. It is sometimes called an "iso-heap" approach. It is essentially the same idea to the "locking" trick, but transcends allocation lifetime.

I think a mixture of approaches is necessary. But I do think that avoiding an overlay or (equivalently) a software null/revoked check is a good target. E.g. we can avoid the cost of "always check before use" if we've arranged that a hardware trap will occur and we can avoid this happening too frequently (e.g. my "synchronous depth-bounded scan on free" idea).

A particularly nasty case in Ruby was described to me by Kunshang Wang... could become a useful case study.

stephenrkell commented 6 months ago

This is all tied up with #53 of course.

stephenrkell commented 6 months ago

And for moving GC, I was imagining tracking where "pointer-derived values" escaping to non-pointer-typed storage, with a special instrumentation/analysis for C code to catch these. How would I track them at run time? Using an overlay! So I can't get rid of the idea that easily. That overlay was by analogy with relocation records.

One could do it instead just by allocator-level operations, perhaps. But then we presumably need to capture the pointer-derivedness in the uniqtype (how do I get back the pointer? or know how the representation bytes are functionally related to the pointer?). That doesn't feel quite right.

stephenrkell commented 6 months ago

"Use after free" is actually a nice case: the free is a hint that a re-use is coming, but it gives us a window of time to act: "use after re-use" is really the dangerous bug. We can even delay re-use until we know it's safe (and without leaving memory underutilised, e.g. by using virtual address rotation techniques). Is this an equal cop-out to the iso-heap? No, I think it's less. E.g. if a given data type has an "epoch invariant" (e.g. imagine a generation number that keeps going up) we'll never let a client see the wrong epoch value.

Unions don't have this property: the type-changing write is both the free and the re-use, together. That said, if we have to tackle this using a write barrier, that's still much cheaper than having a read barrier a.k.a. an overlay checked on every dereference including reads.

stephenrkell commented 2 months ago

A possible way to deal with union arms is issuing multiple virtual addresses mapping the same memory, with only one active per type/member at a time. We'd have to update the page tables somehow when a type-changing write occurs... so maybe "one read-write, others no-access" (of course we can take a trap and update PTs). A drawback is that even if the type-changing write is anticipated in software, this necessitates a trap via the kernel... it is not suitable for frequently changing types.

This "multiple mappings" idea can be seen as a generalisation of the "set to trapping pointer" approach to address-taken union arms.

stephenrkell commented 1 month ago

There could be a fun paper in enumerating the different invariants and the checking strategies that they (complementarily) require, then figuring out rules for how different invariants may be composed. I am thinking of how references flow from one bigalloc to another, e.g. during linking... we might know that text segment A is "barriered" with a software validity/null check, whereas text segment B relies on an "only issue valid, only de-issue provably dead" invariant. It's safe for data to flow from segment B to segment A, because A will always to extra checks. But it's not safe to let data flow from segment A to segment B, because segment A's data doesn't respect the validity invariant.

It will be necessary to track bindings between segments. These are formed during linking, but also as pointers are written. Some kind of bootstrapping property is needed to keep track of extant bindings: I think we always need to be able to catch the first write of a pointer pointing into some bigalloc, written into some bigalloc. What are the ways in which a bigalloc acquire knowledge of a pointer? A write instruction is one, if its read operand value "is" (has the taint of) the known pointer. But munging an instruction byte sequence is clearly another way, and there may be others (e.g. acquiring an offset, whose base was acquired earlier?).

And then "written to" is not enough. This chimes with the trickiness of defining "pointer", as encountered during Alex M-B's project. Performing linking is not directly writing a "pointer" as a word-sized stored value, but it is munging instructions (e.g. a jump instruction) such that an address may later be issued.

stephenrkell commented 1 month ago

Can type-changing writes be instrumented to cause a non-kernel-trapping permission switcheroo, using something like Intel's "Memory Protection Keys"? In short, no. There are only 16 keys and each page is optionally labelled with one of them. So we could issue each of the distinct union arms its own key, up to a 16-element union, and then (on an instrumented write) switch which one was allowed to be read. We would want to keep a unique write-allowing mapping in reserve, accessible only from the instrumentation logic, so we'd tolerate only 15 elements. But that's just for a single union-typed allocation! Each allocated union has its own last-written state, to which current access must be controlled independently. That's way, way more state than 32 x 2 read/write-protection bits, which is the state we can change cheaply. The 4 bits per page is closer to the amount of state we need, but that is not cheap to change; we'd be changing the page tables so it's just like an old-fashioned change to read/write access.

Could we bootstrappingly apply this to page tables themselves? No because even if we have a trusted "process supervisor", the ability to write to a page table entry is catastrophic to security. Ditto for page directories and up, because the next level down is addressed physically, so we could create a fake PT.

stephenrkell commented 1 month ago

Popping up a level... one takeaway from the thoughts above is that the issue of tracking changes to memory's "current type" via unions or pointer stuffing is a temporality issue, just like catching use-after-free is. There are many ways to deal with it, analogous with the same issues in GC (e.g. read-barriered or no).

At the liballocs level we should try to accommodate multiple approaches, rather than mandate any one approach. One way to do that might be to (1) keep track of the contract on pointers may be issued to a given allocation (e.g. whether they should be write-invalid according to the MMU, vs whether the recipient is assumed to be software-write-barriered), and keep track of how pointers flow around (so that specific applications of liballocs could check or enforce safety properties -- not that the core liballocs would do so).

That should perhaps say "how inter-allocation pointers flow around inter allocations".