Open labath opened 5 years ago
Thanks for the feedback - it was more positive than I expected. :)
For the record, this is not something I am going to look at in the near future. The topic came up when I was talking to Reid the other day, and so he asked me to file a bug about that.
We certainly do use symbolic expressions in some places (even though they're compile-time constants), such as length of unit:
.long .Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit
which I did for the reasons you mentioned.
Doing this for any entities that are more than a constant number per CU I'd worry about, as you mentioned codegen speed. But doing this conditionally (only under verbose-asm) seems like it'd be a bit of a pain to maintain - would be useful to experiment with, say, the relative strings and constant abbrevs and do some compile-time measurement to see if there's a discernible savings by keeping the current codepath for the integrated assembler - or if it isn't observable, maybe we can switch to the symbolic version without maintaining both.
"make all DWARF5 indexed section references (debug_str_offsets, etc.) symbolic. Ironically, these were introduced precisely to reduce the overhead of symbolic references in previous dwarf versions. "
I don't think the goal of these DWARFv5 forms is contradictory to your goals here - object files will still contain the final constants and not have a bunch of extra relocations - so the forms still save object size even if we use compile/assembly-time symbolic constants.
Readability can mostly be improved through more verbose-asm comments, but I fully appreciate the editability/writability benefits & made the unit length field symbolic for that reason.
The asm for .debug_info typically does spit out comments that say what each individual .byte directive is for, so readability isn't terrible; but this suggestion would make editing a clang-produced .s file simpler.
I remember doing a fair amount of symbolic references in the .s files for testing the DWARF 5 headers, which were written before LLVM knew how to generate them. It really made all the sizes and references a lot easier to manage.
Moved to DebugInfo, because people say they'd rather have debug-info related bugs there even if they aren't bugs in the DebugInfo library.
Extended Description
Currently, we go out of our way to print all constants in the dwarf we emit as literal constants. This probably speeds up codegen, but it probably hurts the readability, and definitely hurts editability (e.g., for producing test cases with "weird" dwarf for lldb or other tools) of the generated dwarf. Here's a non-exhaustive list of things we could do (probably guarded by -fverbose-asm) to improve this. It's roughly ordered according to the impact I estimate the changes would have on the readability/editability of the generated dwarf:
make all DIE cross-references in the debug_info section symbolic (e.g. from a variable DIE to it's type). Currently, we just emit these as integral constants, which is a problem, because pretty much any change to the debug info will upset these offsets. However, if these were emitted as something like
then one could freely add/remove dies or attributes and the references would remain valid.
make references from debug_info -> debug_abbrev symbolic. When adding or removing an attribute, one needs to make coordinated changes in the the abbreviation as well as in all DIEs that reference that abbreviation. This would make it easier to find all of them and jump between them. This could look something like: .section .debug_abbrev .set .Labbrev_DW_TAG_variable0, 0 .byte .Labbrev_DW_TAG_variable0 ...
.section .debug_info .uleb128 .Labbrev_DW_TAG_variable0