stephenrkell / libcrunch

A dynamically safe implementation of C, using your existing C compiler. Tolerates idiomatic C code pretty well. Not perfect... yet.
100 stars 4 forks source link

libcrunch is a system for fast dynamic type and bounds checking in unsafe languages -- currently C, although languages are fairly pluggable in the design.

It is somewhat inaccurately named, in that it is nowadays both a runtime library and some toolchain extensions (compiler wrapper, linker plugin, auxiliary tools).

"Dynamic type checking" mostly means checking pointer casts. There is limited checking of other things like va_arg and union use; more to add in due course.

Bounds checking means probably what you think it means. The innovation of libcrunch is to do fine-grained bounds checking (sensitive to subobjects, such as arrays-in-structs), over all allocators (static, stack, heap and custom), with very few false positives. The key to doing this is run-time type information and a run-time model of allocators. Currently, bounds checking performs about the same as ASan, but does finer-grained checking. It's also comparable to SoftBound, but doesn't suffer the kind of false positives that fat-pointer systems do when they lose track of bounds.

I have some plans for temporal checking too, including a garbage collector (which masks errors) and a mostly-timely checker (which catches errors), but nothing concrete yet.

The medium-term goal is a proof-of-concept implementation of C that is dynamically safe... and runs most source code unmodified, is binary-compatible even with uninstrumented code (albeit sacrificing safety guarantees), and performs usably well (hopefully no worse than half native speed, usually better).

To get good performance, I have some plans for exploiting hardware assistance (various kinds of tagged memory that are springing up) and also speculative/dynamic optimisations. Again, nothing concrete yet (but feel free to ask).

All this is built on top of my other project, liballocs, which you should build (and probably understand) first. In a nutshell, liballocs provides the type information and other dynamic run-time services; its goal is "Smalltalk-style dynamism for Unix processes".

Building is non-trivial... but you can do it! Overall, the build looks something like this.

$ git clone https://github.com/stephenrkell/liballocs.git $ cat liballocs/README (and follow those instructions, then...) $ export LIBALLOCS=pwd/liballocs $ git clone https://github.com/stephenrkell/libcrunch.git $ cd libcrunch $ make -jn # for your favourite n $ make -C test # if this succeeds, be amazed $ frontend/c/bin/crunchcc -o hello /path/to/hello.c # your code here $ LD_PRELOAD=pwd/lib/libcrunch_preload.so ./hello # marvel!

Tips for non-Debian or non-jessie users:

Liballocs models programs during execution in terms of /typed allocations/. It reifies data types, providing fast access to per-allocation metadata.

Libcrunch extends this with check functions, thereby allowing assertions such as

assert(__is_aU(p, &__uniqtype_Widget));

to assert that p points to a Widget, and so on.

For bounds errors, libcrunch instruments /pointer derivation/. This includes array indexing and pointer arithmetic, but not pointer dereference which can safely proceed unchecked. Bad pointer uses are caught and reported in a segfault handler.

A compiler wrapper inserts these checks (and some others) automatically at particular points. The effect is to provide clean error messages on bad pointer casts, bad pointer uses and other operations that would otherwise be corrupting failure (undefined behaviour, in C). Language-wise, libcrunch slightly narrows standard C, such that all live, allocated storage has a well-defined type at any moment (cf. C99 "effective type" which is more liberal). This can be a source of false positives in the quirkiest code; there are some mitigations.

Instrumentation is currently done with CIL. There is also a clang front-end which is less mature (lacks a bounds checker) and currently rather out-of-date, but will be revived at some point.

Type-checking usually only slows execution by about 5--35%. You can also run type-check-instrumented code without the library loaded; in that case the slowdown is usually minimal (a few percent at most).

Usability quirks

Limitations of metadata