Make dwarf assembly output more symbolic == editable

labath commented 5 years ago


Bugzilla Link	42910
Version	trunk
OS	All
CC	@dwblaikie,@JDevlieghere,@jmorse,@walkerkd,@pogo59,@rnk

Extended Description

Currently, we go out of our way to print all constants in the dwarf we emit as literal constants. This probably speeds up codegen, but it probably hurts the readability, and definitely hurts editability (e.g., for producing test cases with "weird" dwarf for lldb or other tools) of the generated dwarf. Here's a non-exhaustive list of things we could do (probably guarded by -fverbose-asm) to improve this. It's roughly ordered according to the impact I estimate the changes would have on the readability/editability of the generated dwarf:

make all DIE cross-references in the debug_info section symbolic (e.g. from a variable DIE to it's type). Currently, we just emit these as integral constants, which is a problem, because pretty much any change to the debug info will upset these offsets. However, if these were emitted as something like
```
.byte   2                       # Abbrev [2] 0x1e:0x15 DW_TAG_variable
.long   .Ltype47-.Lcu_start     # DW_AT_type
    ...
.Ltype47:
.byte   3                       # Abbrev [3] 0x33:0x7 DW_TAG_base_type
    ...
```
then one could freely add/remove dies or attributes and the references would remain valid.
make references from debug_info -> debug_abbrev symbolic. When adding or removing an attribute, one needs to make coordinated changes in the the abbreviation as well as in all DIEs that reference that abbreviation. This would make it easier to find all of them and jump between them. This could look something like: .section .debug_abbrev .set .Labbrev_DW_TAG_variable0, 0 .byte .Labbrev_DW_TAG_variable0 ...

.section .debug_info .uleb128 .Labbrev_DW_TAG_variable0

make all DWARF5 indexed section references (debug_str_offsets, etc.) symbolic. Ironically, these were introduced precisely to reduce the overhead of symbolic references in previous dwarf versions. However, they do make it hard to figure out what a particular (e.g.) DW_AT_name refers to as they show up as just a ".byte". This could work in a manner similar to .debug_abbrev above, but it is a lower priority because DWARF5 can be turned off. :)

labath commented 5 years ago

Thanks for the feedback - it was more positive than I expected. :)

For the record, this is not something I am going to look at in the near future. The topic came up when I was talking to Reid the other day, and so he asked me to file a bug about that.

dwblaikie commented 5 years ago

We certainly do use symbolic expressions in some places (even though they're compile-time constants), such as length of unit:

.long .Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit

which I did for the reasons you mentioned.

Doing this for any entities that are more than a constant number per CU I'd worry about, as you mentioned codegen speed. But doing this conditionally (only under verbose-asm) seems like it'd be a bit of a pain to maintain - would be useful to experiment with, say, the relative strings and constant abbrevs and do some compile-time measurement to see if there's a discernible savings by keeping the current codepath for the integrated assembler - or if it isn't observable, maybe we can switch to the symbolic version without maintaining both.

"make all DWARF5 indexed section references (debug_str_offsets, etc.) symbolic. Ironically, these were introduced precisely to reduce the overhead of symbolic references in previous dwarf versions. "

I don't think the goal of these DWARFv5 forms is contradictory to your goals here - object files will still contain the final constants and not have a bunch of extra relocations - so the forms still save object size even if we use compile/assembly-time symbolic constants.

Readability can mostly be improved through more verbose-asm comments, but I fully appreciate the editability/writability benefits & made the unit length field symbolic for that reason.

pogo59 commented 5 years ago

The asm for .debug_info typically does spit out comments that say what each individual .byte directive is for, so readability isn't terrible; but this suggestion would make editing a clang-produced .s file simpler.

I remember doing a fair amount of symbolic references in the .s files for testing the DWARF 5 headers, which were written before LLVM knew how to generate them. It really made all the sizes and references a lot easier to manage.

pogo59 commented 5 years ago

Moved to DebugInfo, because people say they'd rather have debug-info related bugs there even if they aren't bugs in the DebugInfo library.

llvm / llvm-project

Make dwarf assembly output more symbolic == editable #42255

Extended Description