Open stephenrkell opened 3 years ago
Also, my idea of "synchronous pointer poisoning on free()" may be interesting. On a free(), do some amount of synchronous work by reflecting on the call site and finding (up to depth M, say) pointers to the now-quarantined allocation. Poison them. This will allow a greater degree of synchronous trap on use-very-soon-after-free bugs which would otherwise be missed.
In CHERI, if I recall, access to the tag bits lets the sweep zoom in on real pointers very quickly. Indeed this uses CLoadTags to get at tag bits in a nice packed form. My idea of adding a "pointer map word" to uniqtypes might be useful here. It may even do better, e.g. because a huge array of ints can be skipped in one go.
Was there an equivalent "huge array" short-circuit opt in the Cornucopia stuff? Indeed there is: it uses "trap on capability write" MMU protections to maintain a per-page "sweep clean" bit. The sweep-clean and full-clean bits are what enable the a short-circuit "skip lots of non-pointers" optimisation: "if a revoker encounters a page that is both full-clean and sweep-clean, that page is guaranteed to be devoid of capabilities and can be bypassed". i.e. two per-page flags.
Also, in CHERI quarantine is recorded at word granularity ("is this word in quarantine?")... we will do it at alloc granularity, which potentially brings some wins. But then variable-size allocations are less convenient in other ways.
This issue collects thoughts about how to provide temporal safety in libcrunch.
One idea is to start with a quarantining malloc. Wes has a version of dlmalloc that we could perhaps use for this.
The basic idea is that a "free" event causes memory to be quarantined, but it is not reused until it is known to have no live inbound pointers. This already trades away some bug-finding power (those live pointers may still be used, albeit illegally), but seems a good trade and it preserves a sensible notion of safety.
The "no live inbound pointers" is the hard part. The most obvious way is to use vaguely GC-like sweep techniques.
What about custom allocators? Want there to be a "quarantine protocol" supported by liballocs. An allocator that follows this guarantees not to re-use memory that the system has not attested is no longer reachable. That means the sweep can be written at the liballocs level. Also raises my often-mooted idea of tracking both over- and under-approximations of reachability at the bigalloc level.
Unlike the CHERI work, we probably need the sweep to proceed allocationwise, rather than by page and cacheline. Need to remind myself exactly how this works. How was the "does this point to quarantine?" test done? Brief answer: a per-page quarantine bitmap, one bit per word.
Also, the usual tricks: can use mprotect if there will be no or likely-few valid use of page-overlapping objects. Use of this trick lets us decouple virtual address reuse from physical memory reuse. Use of virtual address rotation lets us reduce the frequency of page-overlapping objects. These are all "per-allocator" tricks, but maybe some plumbing in liballocs would help, e.g. to convert segfaults into allocator upcalls.