stephenrkell / liballocs

Meta-level run-time services for Unix processes... a.k.a. dragging Unix into the 1980s
http://humprog.org/~stephen/research/liballocs
Other
216 stars 26 forks source link

Line number variation causes duplicate uniqtypes #71

Open stephenrkell opened 1 year ago

stephenrkell commented 1 year ago

For anonymous structures, a declaration's line number affects [what we consider as] its name. Its name feeds in to the type summary code, so we can easily get two spuriously distinguished uniqtypes e.g. if a header file is lightly refactored to move a definition around. A telltale is two summary codes differing by 1 or some other small amount (check the relevant -dwarftypes.c under /usr/lib/meta).

It's not clear what to do. Probably we need to use some other arbitrary identifier for the anonymous thing.

We could just use the filename and skip the line numbers, though that is not perfect (filenames get changed too).

We can make a special case for anonymous structures and not feed in their name (though care is needed -- this may interact with the 'swapping' we do for typedefs).

We can try to identify somehow the anonymous definition through its relationship to other definitions, rather than by its physical position. Given that it's anonymous, it is referenced implicitly by some enclosing declaration, so we could use the name of that instead of anything about the file/line. It might be a typedef, but it might also be other things like a formal parameter. Not clear we need to cater to the formal parameter case (who declares anonymous structs inside a parameter list?!). Can there be nested cases, i.e. anonymous inside anonymous? Pretty clearly yes.

stephenrkell commented 1 year ago

The swapping thing is a hack, but maybe it does have a principle that works here: over a one-to-one link, names spread frictionlessly in either direction. So, any anonymous structure that has a typedef inherits the name of that typedef as its effective name. It doesn't matter if that cause a collusion with another user of that name (the struct-tag namespace thing) because we also have summary codes to disambiguate.

I think this means that we push the 'swapping' a level deeper: we have a notion of 'effective name' and all nominally distinguishable types must be given a code computed using their effective name, not a synthetic name based on their file or line number.

We've considered the typedef case. What does that leave? We have anonymous structures nested inside a structure. These can't take the name of the enclosing structure (there can be more than one nested) but they can take the name of the member that references them, which is unique. Ditto for local variables of anonymous structure type.

We have to think about the conflation. If we have two structurally identical types with the same effective name (e.g. thanks to two like-named local variables) they will be considered the same type. That's a form of conflation I can live with, even though it's a hack. We're introducing some false conflation in order to avoid the greater evil of false distinction owing to ABI-insignificant build skew. While 'types have no linkage' in C, in effect we're retrofitting a global namespace of them by correlating which ones are 'the same'. That is an inherently approximate exercise and we err on the side of the structural.

stephenrkell commented 1 year ago

Possibly the "effective name" could be fully qualified, e.g. we build a "::"-separated name all the way down from the compilation unit level. This would avoid the "coincidentally same-named local variable" issue. Worrying any more deeply about this is overkill.

stephenrkell commented 1 year ago

Getting a new version of this problem.

$ ../libdwarfpp.hg/examples/compare-types ./build/debug/liballocs_preload.so 42760 423f2
Comparison of DIE, offset 0x42760, tag DW_TAG_typedef, name "Dl_info" and DIE, offset 0x423f2, tag DW_TAG_typedef, name "Dl_info" returned UNEQUAL for reason self may_equal: sub-equality of At 0x4270e, tag DW_TAG_structure_type, no name and At 0x423b4, tag DW_TAG_structure_type, no name, reason: self may_equal: decl_file

The problem is that Dl_info, a typedef of an anonymous struct, has two slightly different definitions because libdwarfpp's notion of equality for anonymous struct types requires them to be defined on the same file and line. Across different CUs built against slightly different elf.h files, this is not going to be the case. There can also be differences about e.g. the exact chain of typedef indirections between uint64_t and unsigned long, say.

In libdwarfpp I have now implemented the 'swapping' thing as find_associated_name() (on program_element_die). I think in my comments above I was proposing that we just use associated names and if they're equal, consider the types equal (conflate them). For other DIEs, though, we might still want stability of naming... e.g. consider two local variables x in different lexical blocks of the same function. Under a refactoring, which is 'the same' x? It's ill-posed in general but some solutions will be better than others. I'm thinking of an idea of partial paths as a way to do naming that is likely stable under common code changes/refactorings.