llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.6k stars 11.82k forks source link

bolt produces dwarf data that crashes dwz #53511

Open kmod opened 2 years ago

kmod commented 2 years ago

We're currently having an issue where if you run dwz (DWARF info optimizer) on our bolt-optimized binary it fails. Running dwz on the pre-bolt binary completes successfully.

Running readelf -wi on either the pre-bolt or post-bolt binary gives a number of errors:

readelf: Error: LEB value too large
readelf: Error: LEB value too large
readelf: Error: LEB value too large
readelf: Error: LEB value too large
readelf: Error: LEB value too large
readelf: Error: LEB value too large

so it's possible that this was an issue with the pre-bolt binary, but I do get different ultimate errors depending on the version of bolt I use.

Pre-bolt binary Post-bolt binary

I created the post-bolt binary using:

llvm-bolt ./python3 -o ./python3.bolt -update-debug-sections -reorder-blocks=cache+ -reorder-functions=hfsort+ -split-functions=3 -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=all -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot

Our project currently uses a version of bolt from two months ago, and that version of bolt leads to this error:

$ dwz python3.bolt
dwz: python3.bolt: Unknown DWARF DW_OP_195

I just tested bolt from current trunk (d329dfd0) and that version leads to this error:

$ dwz python3.bolt
dwz: python3.bolt: Couldn't find DIE referenced by DW_OP_GNU_parameter_ref
llvmbot commented 2 years ago

@llvm/issue-subscribers-bolt

llvmbot commented 2 years ago

@llvm/issue-subscribers-debuginfo

ayermolo commented 2 years ago

@kmod My apologies I missed this. Can you try with latest bolt? There were some things fixed recently.

kmod commented 2 years ago

@ayermolo Still having the same issue, unfortunately

ayermolo commented 2 years ago

@kmod OK, I'll take a look. Are there a smaller binaries that exhibit this behavior? Also how was the binary build?

ayermolo commented 2 years ago

Ah ok. So issue is we re-write .debug_info so offsets of dies change. Apparently there is DW_OP_GNU_parameter_ref, which can reference another die from location expression. We don't handle this at the moment. Looking at llvm code, neither does it. llvm-dwarfdump just returns decoding error.

ayermolo commented 2 years ago

I also found an issue with updating debug information when icf=1 is used. It results with DW_AT_location pointing to the wrong location within .debug_loc section. FYI. We are looking into why.