Open stephenrkell opened 5 years ago
It may be possible to eliminate the wrapping of allocator wrappers, as discussed in #20 .
Pithy summary of how malloc should be callee-instrumented: a global, preload-interposable malloc should not be callee-wrapped (because we will preload-interpose it to do the same thing). Others should be. Ditto for other malloc-family functions.
What about a malloc that has both global-interposable alias and another alias? We should callee-wrap the local alias. This will mean that the two are no longer aliased. That is fine.
I think all this should be driven from one big config file or very specialised/hackable script, rather than hard-coded logic buried somewhere in our Python compile/link-wrapper scripts as at present. Then, it can be given a clean semantics, and addition of custom allocators or allocator wrappers can be explained uniformly in terms of addition to that file/scripts.
Another way to think about the 'config file' idea is to put the built-in allocators and LIBALLOCS_* environment variables on the same level.
Roughly the level at which this needs to work is the link map... from a pooled description of what needs callee-wrapping, we should be able to explain the difference between the actual link map (including stubs linked in) and the vanilla no-liballocs link map.
We can perhaps turn the env vars into a config file that is merged with the 'built-in' config, then use the config file to produce linker options. The logic for all this belongs in the gold plugin.
What do we do when final-linking a DSO that contains a 'malloc'? It's preemptible but may or may not be the global malloc, so may or may not need the callee wrappers. Probably it's necessary to create a local alias, then ensure any locally-bound reference (maybe from protected visibility, maybe from -Bsymbolic) actually goes to the alias, which does get the callee wrapper. So the global 'malloc' (and friends) are not used internally by the object, and will only be used externally if it does indeed 'win' the contention for that global symbol.
Currently there is some confusion about what an "allocator" is, owing to the presence of
struct allocator
instances that actually cover more than one allocator, such as__generic_malloc_allocator
. All thesestruct allocator
objects are statically defined -- we never generate them. If we could generate these and related structures, we could maintain a richer run-time model of allocators, allocator wrappers and allocation call sites. This would be useful for automating some meta-level policy, such as deleting an allocation when it's no longer needed. Currently the per-allocatorfree
call is too stupid to allow this, as it can't identify which is the right free function to call when there are many alternatives (in cases where freeing and finalisation are baked into the same operation, for example).Currently, link-time code generation (using the macros in tools/stubgen.h) is used to wrap each allocator wrapper (yes, two levels of wrapping) and also to wrap any linked-in definitions of
malloc
(all in allocscompilerwrapper.py). All this is a big mess that needs rationalising.Each allocator instance should get its own
struct allocator
instance, which should be generated and linked into the object defining its "first" entry point (f.s.v.o. "first"... what I'm envisaging is rather like C++, where the translation unit defining the first virtual function is the one that gets the vtable).Consequently, we no longer have a
___generic_malloc_allocator
-- rather, each instance ofmalloc
should have its ownstruct allocator
. For example, if we link an executable that definesmalloc
, we generate a freshstruct allocator
including implementations of its operations (which can call into the indexing code currently used by__generic_malloc_allocator
, but acting on a separate index instance, which will have to be declared and statically constructed, also in the generated code).At the same time as generating the
struct allocator
instance, we also link in any wrapper that are necessary to observe the allocator. For example, if we have amalloc
definition, we link in wrappers that do the indexing. (See the mention of 'callee' wrappers in tools/stubgen.h and tools/allocscompilerwrapper.py.) This already happens, and probably needs to be maintained and generalised to other allocators besides malloc.The libc is handled slightly specially, since it's preload-interposable and we don't expect the libc to be liballocs-compiled. So we have a
struct allocator
instance called__libc_malloc_allocator
which is static, rather like the current__generic_malloc_allocator
. And the libc malloc does not have 'callee wrappers' -- we use the preload mechanism instead. Since malloc and libc are special, there is probably not much to change here.The proposed change assumes we have some way to identify when a link job contains an allocator definition. The simplest way is to match symbols by name -- e.g. we consider an allocator to be defined by a set of symbols such as {malloc, free, calloc, realloc}, and pick the first of these (malloc) as the one that triggers generation of the
struct allocator
instance. (The other may still require callee-side wrapping, though.) Optional symbols that map closely to one of the liballocs meta-operations, likemalloc_usable_size
, probably need to be recognised too. Currently, we effectively have hard-coded one such set of symbols.As well as allocators themselves, each declared allocator wrapper should also (somehow) have a run-time identity. We could perhaps define a simple structure, and generate these at the same time as we generate the
struct allocator
instances (at link time). Currently we don't know which allocator is wrapped by a given wrapper; we might want to infer this dynamically by observation, and fill it in.Similarly, allocator call sites should have a run-time identity. The new static metadata handling (when it's ready) will have a notion of call sites encompassing allocator calls, system calls and perhaps others. We may want to remember call chains; each node in a call chain, if it is a call from a particular site, should be able to link to the structure describing that site (e.g. the allocator wrapper call site's allocation type record) rather than just to the raw address.