stephenrkell / libcrunch

A dynamically safe implementation of C, using your existing C compiler. Tolerates idiomatic C code pretty well. Not perfect... yet.
100 stars 4 forks source link

128-bit ABI and related research ideas #9

Open stephenrkell opened 3 months ago

stephenrkell commented 3 months ago

Rather than a shadow space, it'd be better to move to a 128-bit pointer representation (or maybe 96 bits for 32-bit platforms, etc). Only with something like this can we provide atomic update of pointers, which is obviously important.

This change obviously breaks the ABI, although in quite a mild way if we preserve the property that the low-order word is an ordinary pointer. Only pointers and pointer-containing aggregates (structs, unions and arrays) are affected by the ABI change. Remember also that we can always work with a plain word-sized pointer, at cost of having to re-fetch the bounds for it. So if we have to drop down to a bare pointer, it's only a performance hit, not a correctness hit.

We could undertake to provide compatibility across multiple ABIs. Some functions would then need two (or, in bad cases, exponentially many) versions of their code, to deal with wide vs narrow ABI w.r.t. some data type. We can probably speculate on the wide ABI and push the narrow ABI into a slow path, although sometimes we might want to flip that (e.g. heavy consumers of C library functions, assuming the C library remains narrow-ABI).

Cue various research ideas into ABI polymorphism. Suppose we parameterise generated code on various ABI details, like the size of a pointer or the offset of a given struct field. We could decide these only at link time and fix them up with relocations, given sufficiently funky relocations. This might mean our "ABI monomorphisation" could be done lazily at run time rather than requiring us to generate special code. I think some kind of multi-ABI runtime abstractions are a good idea for language interop anyhow, of course.... This ties in with my "address space extension" ideas, sketched elsewhere.

In the absence of funky relocations, wider standard relocations can be used... it might need a fixup pass on the .o file to rewrite them, or clever use of addends, etc.