New service: stable allocation identities

A useful primitive for a lot of tool and diagnostic applications would be "stable allocation identities": instead of a trace or dump that has inscrutable 0xdeadbeef-esque numbers that change each time, a stable identity could say something like site-myfunc-2-thread-3-1+0x22. This identifier would be the same each execution, for the allocation created at the same point in the dynamic instruction trace.

I'm anticipating that

there would be multiple schemes, roughly one per allocator or groups of allocators; here site means the identifier is based on the allocation site, which is appropriate or heap allocations
allocation sites are identified in a function-local way, e.g. here it's site number 2 within myfunc
dynamic program points are counted in a thread-local way, e.g. 1-thread-3-1 means the second hit by thread 3
threads are (recursively) identified by the stable identity of their own control block
offsets can be tacked on to the end, to identify a byte/position within an allocation

A good test would be an ltrace-like tracer, implemented in-process, but where we print addresses as stable identifiers. It should give the same output for any run, even for multithreaded programs if we discard ordering (e.g. sort the output lines).

Note that the liballocs API already allows for allocations to have names. However, these names are not the same thing -- they are (1) optional (only for subobjects, symbols etc) and (2) non-unique (directly using the symbol name, subobject name, etc). By contrast, a stable allocation identity should be globally unique across a whole execution, and every allocation should have one (if asked for one).

This is related to the need faced by my editable-assembly-generating example client (see examples/ or #88) to generate symbolic names for any referenced position. That is interesting because what's being named there is a position in the allocation containment tree, not just an address. If we have an address and a pointed-to type, we can identify a particular tree node, not just an address-equal slice through the tree. (This is the insight between my as-yet-unpublished bounds-checking work.)

So maybe the service could name a tree node not just an address? Some defaulting logic could break ties if the client doesn't know which node it wants.

Incidentally, a better API for navigating these tree slices would replace find_matching_subobject and first_subobject_spanning and walk_subobjects_spanning (others?).

stephenrkell / liballocs

New service: stable allocation identities #86