llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.79k stars 11.9k forks source link

Make dwarf assembly output more symbolic == editable #42255

Open labath opened 5 years ago

labath commented 5 years ago
Bugzilla Link 42910
Version trunk
OS All
CC @dwblaikie,@JDevlieghere,@jmorse,@walkerkd,@pogo59,@rnk

Extended Description

Currently, we go out of our way to print all constants in the dwarf we emit as literal constants. This probably speeds up codegen, but it probably hurts the readability, and definitely hurts editability (e.g., for producing test cases with "weird" dwarf for lldb or other tools) of the generated dwarf. Here's a non-exhaustive list of things we could do (probably guarded by -fverbose-asm) to improve this. It's roughly ordered according to the impact I estimate the changes would have on the readability/editability of the generated dwarf:

.section .debug_info .uleb128 .Labbrev_DW_TAG_variable0

labath commented 5 years ago

Thanks for the feedback - it was more positive than I expected. :)

For the record, this is not something I am going to look at in the near future. The topic came up when I was talking to Reid the other day, and so he asked me to file a bug about that.

dwblaikie commented 5 years ago

We certainly do use symbolic expressions in some places (even though they're compile-time constants), such as length of unit:

.long .Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit

which I did for the reasons you mentioned.

Doing this for any entities that are more than a constant number per CU I'd worry about, as you mentioned codegen speed. But doing this conditionally (only under verbose-asm) seems like it'd be a bit of a pain to maintain - would be useful to experiment with, say, the relative strings and constant abbrevs and do some compile-time measurement to see if there's a discernible savings by keeping the current codepath for the integrated assembler - or if it isn't observable, maybe we can switch to the symbolic version without maintaining both.

"make all DWARF5 indexed section references (debug_str_offsets, etc.) symbolic. Ironically, these were introduced precisely to reduce the overhead of symbolic references in previous dwarf versions. "

I don't think the goal of these DWARFv5 forms is contradictory to your goals here - object files will still contain the final constants and not have a bunch of extra relocations - so the forms still save object size even if we use compile/assembly-time symbolic constants.

Readability can mostly be improved through more verbose-asm comments, but I fully appreciate the editability/writability benefits & made the unit length field symbolic for that reason.

pogo59 commented 5 years ago

The asm for .debug_info typically does spit out comments that say what each individual .byte directive is for, so readability isn't terrible; but this suggestion would make editing a clang-produced .s file simpler.

I remember doing a fair amount of symbolic references in the .s files for testing the DWARF 5 headers, which were written before LLVM knew how to generate them. It really made all the sizes and references a lot easier to manage.

pogo59 commented 5 years ago

Moved to DebugInfo, because people say they'd rather have debug-info related bugs there even if they aren't bugs in the DebugInfo library.