Open stephenrkell opened 3 years ago
Maybe another way to think of the 'terminal layer' problem is that some allocators are "fixed-format". There may be many layers of fixed-format, but they always reside at the bottom of the tree.
Of course one can imagine exceptions to that, e.g. using a char buffer inside a struct as a malloc arena. I find that pretty wacky, so provisionally have no problem not being able to model it.
Idle thought: is there, or should there be, a link between mincore()
and our sparseness story on read/write validity?
For mincore()
and sparse areas, maybe we need to capture the ideas that (1) access might be 'valid' but nevertheless expensive, and (2) a cheaper yet indirect way to read the value, e.g. "it's all zeroes because we've never written to it", might be available.
The issue of make_precise()
came up again in the ELF allocator example, because some data types (NUL-term'd strings, x86 instructions, ...) have uncertain length but are self-delimiting. Can/should we make a a uniqtype describing these, and if so, how can it expose the length information?
I guess the crux of the issue is that a flexible array need not be delimited by its containing allocation. We have been assuming (e.g. in libcrunch) that that's how it works. But it could instead be self-delimiting, e.g. something as simple as a null-terminated string. If the containing allocator cannot reasonably know the boundary, then the uniqtype itself has to know. We want to support both of these patterns. That means some kind of function for self-delimit logic seems inescapable.
Can we make self-delimitingness an exclusive property of a 'packed sequence' type that is analogous to uniqtyes' __ARR_
? Then the function could belong to the sequence type, not to the element type. Is that sane?
I am feeling the function actually needs to tell us two things: the distance to the end of this element, and the distance from there to the beginning of the next (default: zero). That's just in case there are alignment constraints or similar.
If we do this, then our subobject-traversal code will gain a new case (alongside struct, union and array).
Let's think about x86 some more. Ideally, a decode function would not only give us the length of the instruction, but also a precise type of the instruction (prefixes, opcodes, operands, SIB byte, immediates, etc). Is this sane? These are all bitfields. I guess we can expand every distinct format of an x86 instruction, with one struct for each. So we are not really getting away from the idea that . Still, putting it on the packed seq would make sense. It preserves the idea that only sequences (incl arrays) are variable-length (er, and structs that embed one of these as their last field).
Another way to think of this: a packed seq has no element type. It only has a decode function, which enumerates the element types as it goes along.
What happens if a get_type()
query comes in landing in the middle of a packed seq? The decode operations are potentially expensive. Maybe that's not our fault? Maybe we (configurably) cache the results? In a metavector, say? Indeed we can think of a data segment's type info as a packed seq. Elaborating it at run time is probably not a sane move in the case of data segments, but the conceptual uniformity is appealing: a starts bitmap and metavector are constructed to cache a packed sequence. We can construct them statically (eagerly) or dynamically (lazily) as we wish.
So it seems that packed sequences, together with some notions of read-validity and write-operation, are our replacement for make_precise()
. Does this handle everything we need?
If we are to have __uniqtype__PACKED_SEQ_xxx
, how do we name them? Really we want to identify them with their decode functions, which should be global symbols. How do we manage that namespace of global functions? It's almost like an /etc/services
-style registry of names. But that means some kind of real-world authority would be necessary... not clear we want that.
Another wart is that packed sequences can't be written to without messing stuff up, in the case where an element's size is changed. So now I'm thinking it's a kind of allocator. Are all packed sequences big enough to be bigallocs? This also keeps things uniform w.r.t. the use of metavectors and bitmaps. And our allocator-level knowledge of whether we've issued internal pointers, whether it's possible to resize/move stuff, etc.
Radical thought: maybe we only need allocators, and this 'uniqtype' thing is one abstraction too many?
There is now a packed sequence allocator.
Thinking about x86 again, we could imagine expanding x86 instruction format into a series of struct types. But then there'd be an overarching union type. So really, each element of a packed sequence is (morally) an instance of a discriminated (non-simultaneous) union.
A flip side of the previous comment might be "a[ny] discriminated union is really an allocator [arena]".
We can distinguish self-discriminating (the bytes of the union members themselves tell you which case they are), data-discriminated (some field in an adjoining piece of memory has the answer) and context-discriminated (the answer lies elsewhere; may or may not even be explicitly manifest in program state, e.g. temporal discrimination).
Do we want our "uniqtype line" in the allocation tree to be drawn below any (non-simultaneous) union nodes? i.e. unions are really allocators [I mean arenas] and not uniqtypes? That would be wild w.r.t. how C makes us think, but not obviously incorrect as a way to model what these unions are doing.
I think this is really the same problem as I've already anticipated, in the idea of having an alternative "allocator's view" abstraction layer e.g. where the allocator's chunk metadata has type information visible and the user data is just opaque bytes. That is "up one level", whereas for functions we are able to go "down one level". "Typed" views are a layer cake, within which we have a "default layer" which is the programmer's ordinary viewpoint, but both lower- and higher-layer typed views in the tree are possible.
How do we add an awareness of this "default abstraction line" to the bigallocs table and the query API?
Maybe we can use the same way we query subobjects?
I suspect that the synthetic stackframe struct types probably shouldn't be exposed in the default abstraction. That's potentially good because we can come up with a more optimal (maybe "compiled"?) representation for frame layouts... the current representation is not compact.
The short version of the problem is how to capture "allocation nesting below the [uppermost] uniqtype level". Probably it should not be in the bigallocs table. Agreed that subobject nesting is also modelling this. Some examples are
Maybe the answer is that from the first "never big" line down, we have to query the allocator and that's that? When we recursively search down through a struct we are arguably doing that, although without dispatching back to "could be any allocator nested under here" but rather assuming only uniqtypes lie within uniqtypes.
For functions, the fact that a function type has size zero suggests that we need more than one notion of nesting or descending... maybe somehow abstraction-breaking descent versus non-.
Perhaps the answer is that underneath any uniqtype-described range there may be inserted an allocator-defined view. So, underneath a function-typed uniqtype there may be a packed sequence; if we descend to a struct member we may find that an allocator is wrapping that member (handy for unions perhaps!? they should have been in the bulleted list above).
In other words, once we've hit a uniqtype we can still call back out to an allocator at some level in the uniqtype nesting. Where does this need to be recorded? Outside the uniqtype, so probably in the allocator that presents the uniqtype'd view, e.g. the static symbol allocator in a text segment -- it would know (say) that a symbol is a function, but also that there's a packed-sequence view of it lurking underneath the function uniqtype view. E.g. could there be an additional bitmap, mostly zeroes, to record where these lurking interpretations exist? Sounds like we need to expand the allocator API to capture this.
Perhaps: whenever we give out a uniqtype, there is conceptually the option of also giving out the "intra-allocator" for that uniqtype'd range? The default intra-allocator is just the uniqtype allocator itself, but it can be overridden.
Yes. In fact, 'uniqtype' embodies multiple kinds of intra-allocator already: array and composite. The latter has two sub-cases: simultaneous and non-simultaneous. These are all allocators! They are the default intra-allocator for things of the corresponding [kind of] uniqtype.
It sounds like we are elaborating the allocator nesting approach for small allocations, analogous with the bigalloc table for big allocations. Remember that the small/big line is approximated by the pageindex, which can be seen as a "short cut" to that line (modulo page granularity).
Some words: "infra-allocator" (beneath) vs "ultra-allocator" (above), or perhaps "hyper" and "hypo" (but these sound too similar). Also: "alloturtle" (all the way down).
Creating a 'precise' struct that ends with a flexible array member is a ballache. You have to create the array type, then create the struct type. The struct type's
make_precise
function should in theory do this. But if the flexible array member is many layers down, the number of created types andmake_precise
functions starts to add up. Currently we don't generate these functions, only the ones for the array type.(Once no array type has a definite length (#34), structs that contain arrays will still have definite length, except for flexible array members, so that does not change things here.)
Given that we are rethinking arrays, we have a chance to rethink
make_precise
in general. We want it for:not_simultaneous
andmay_be_invalid
. But maybe we want read-validity as a separate concept orthogonal to uniqtype? i.e. at the allocator level. If so it could cover struct padding, inter-allocation dead space, genuinely read-trapping regions, sparseness that should not be disturbed (?), and so on.mcontext_t
. But for these maybe we need to do the write-tracking thing, trap writes and keep last-written shadow statemake_precise
function. It seems reasonable that a variant record or union would have such a piece of code (again logically/hypothetically speaking). So if we get rid ofmake_precise
, where should it live? Perhaps as a special 'related' entry... composites already have N+1 relateds, where the extra one is the member names. We can decree that not_simultaneous composites have a function that returns read-validity... and write-validity? Or maybe a "do write to member N" memcpy-like function that respects invariants (N must be the index of a may_be_invalid member)._DYNAMIC
or auxv-style arrays this way? We can view them as composites where a given member may be present, absent or even present >1 time. It would be better to view these as an allocator. This may require us to relax our view of uniqtypes as being only at the 'terminal' layer of allocation, i.e. here we can access the arena as an array ofElf64_Dyn
entries or as an allocator managing a packed pool of discrete / singletonElf64_Dyn
entries.We seem to want