For discussion: MSR's CHERI+MTE composition

As mentioned in the meeting yesterday, Microsoft has spent some time considering the composition of CHERI and MTE, specifically as part of our work on CHERI heap temporal safety. Our design is geared towards supporting Cornucopia Reloaded's sweeping revocation while offering, among other things,

decreased rate of quarantine buildup (inversely proportional to the number of MTE tag values),
stronger security (closing the UAF/UAR distinction, as we did in CHERIoT), and
lower software complexity (permitting safe in-band allocator metadata, also as on CHERIoT).

The key bits of this have been presented publicly before, but it'd be good to have it all here, too. We consider these proposed architectural semantics to be Capability Essential IP.

Please find attached some slideware (pptx with animations and extensive slide notes, or pdf without animations) with most of the details and (hopefully pretty) pictures. But, in quick summary:

Memory metadata tags expand to fit both CHERI and MTE tags. Any positive natural number of MTE tag bits provides all the security benefits; additional MTE tag bits "just" improve software performance.
In-capability MTE tags are protected, dedicated fields, not merely part of the address field. New accessor instructions are added. [0]
Capabilities are differentiated into "polychromatic" (or "rainbow") and "monochromatic" by reserving one MTE tag value for the former state (and all the rest the latter).
- Polychromatic capabilities can be attenuated to monochromatic progeny with the setter instruction.
- Monochromatic capabilities' tags are fixed; attempting modification clears the tag.
- Memory capability roots are polychromatic.
Memory metadata tags may be manipulated only with polychromatic authority. There are new accessor instructions to manipulate memory metadata tags. An exciting atomic instruction on memory metadata tags is also included to allow lockless fast paths in allocators like snmalloc to detect attempts at double-free.
The rule for dereference is "the authority is polychromatic or the authority's and memory's tags match".
- As a consequence, "polychromatic" memory is accessible only to polychromatic authorities, making it perfect for in-band allocator state.
- Mismatching loads precisely trap. This is not a significant microarchitectural burden: the load instruction cannot retire without its data, and the memory metadata tags can come along for the ride.
- Mismatching stores "fizzle" rather than trap: the store instruction may commit prior to knowing the memory metadata tag value, and mismatch results in the store being silently dropped without altering memory.
The rule for capability revocation is that any capability whose MTE tag mismatches the memory metadata tag of its base is subject to tag-clearing at any point.

[0] And, assuming we steal address bits for MTE tags, CSetAddr will need to clear tags if that tries to change the MTE tag. More general alternatives like #341 seem probably not palatable.

Mismatching stores "fizzle" rather than trap: the store instruction may commit prior to knowing the memory metadata tag value, and mismatch results in the store being silently dropped without altering memory.

I think that we proposed this as an optional extension. Ideally, a core would support both stores-fizzle and stores-trap mode and would expose a counter of the number of fizzled stores. Running in stores-trap mode would come with a noticeable performance penalty (possibly as high as 20%), so would typically not be the default, but if a store has fizzled then it means that you have a store-after-free bug and probably want to fix it at some point. Typically, you'd want to use stores-trap mode during debugging. Systems with good telemetry infrastructure would want to run stores-trap as a sampling mode after detecting some threshold number of fizzled stores (could be one). If an app has store-after-free bugs, the OS would turn on the trapping mode, log stack traces (but still fizzle the stores by skipping the instruction prior to mret) to telemetry, and then resume.

riscv / riscv-cheri

For discussion: MSR's CHERI+MTE composition #340