llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.78k stars 11.9k forks source link

[Symbolizer] Support for Missing Line Numbers. #82240

Closed ampandey-1995 closed 2 months ago

ampandey-1995 commented 8 months ago

LLVM Symbolizer attempt to symbolize addresses of optimized binaries reports missing line numbers for some cases. It maybe due to compiler which sometimes cannot map an instruction to line number due to optimizations. Symbolizer should handle those cases gracefully.

Adding an option '--skip-line-zero' to symbolizer so as to report the nearest non-zero line number.

llvmbot commented 8 months ago

@llvm/pr-subscribers-lld-wasm @llvm/pr-subscribers-lld-macho @llvm/pr-subscribers-llvm-binary-utilities @llvm/pr-subscribers-lld-coff @llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-lld

Author: None (ampandey-1995)

Changes LLVM Symbolizer attempt to symbolize addresses of optimized binaries reports missing line numbers for some cases. It maybe due to compiler which sometimes cannot map an instruction to line number due to optimizations. Symbolizer should handle those cases gracefully. Adding an option '-approximate-line-info=<before/after>' to symbolizer so as to report the nearest non-zero line number. --- Full diff: https://github.com/llvm/llvm-project/pull/82240.diff 12 Files Affected: - (modified) bolt/lib/Core/BinaryFunction.cpp (+2-1) - (modified) lld/Common/DWARF.cpp (+2-1) - (modified) llvm/docs/CommandGuide/llvm-symbolizer.rst (+4) - (modified) llvm/include/llvm/DebugInfo/DIContext.h (+5-3) - (modified) llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h (+10-6) - (modified) llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h (+2) - (modified) llvm/lib/DebugInfo/DWARF/DWARFContext.cpp (+5-5) - (modified) llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp (+28-7) - (modified) llvm/lib/DebugInfo/Symbolize/Symbolize.cpp (+6-2) - (added) llvm/test/tools/llvm-symbolizer/approximate-line-info.s (+142) - (modified) llvm/tools/llvm-symbolizer/Opts.td (+1) - (modified) llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp (+8) ``````````diff diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp index 54f2f9d972a461..0b37eab302cdc9 100644 --- a/bolt/lib/Core/BinaryFunction.cpp +++ b/bolt/lib/Core/BinaryFunction.cpp @@ -192,7 +192,8 @@ static SMLoc findDebugLineInformationForInstructionAt( SMLoc NullResult = DebugLineTableRowRef::NULL_ROW.toSMLoc(); uint32_t RowIndex = LineTable->lookupAddress( - {Address, object::SectionedAddress::UndefSection}); + {Address, object::SectionedAddress::UndefSection}, + DILineInfoSpecifier::ApproximateLineKind::None); if (RowIndex == LineTable->UnknownRowIndex) return NullResult; diff --git a/lld/Common/DWARF.cpp b/lld/Common/DWARF.cpp index 2cd8ca4575dee5..8e1e9c6e530157 100644 --- a/lld/Common/DWARF.cpp +++ b/lld/Common/DWARF.cpp @@ -93,7 +93,8 @@ std::optional DWARFCache::getDILineInfo(uint64_t offset, DILineInfo info; for (const llvm::DWARFDebugLine::LineTable *lt : lineTables) { if (lt->getFileLineInfoForAddress( - {offset, sectionIndex}, nullptr, + {offset, sectionIndex}, + DILineInfoSpecifier::ApproximateLineKind::None, nullptr, DILineInfoSpecifier::FileLineInfoKind::AbsoluteFilePath, info)) return info; } diff --git a/llvm/docs/CommandGuide/llvm-symbolizer.rst b/llvm/docs/CommandGuide/llvm-symbolizer.rst index 59c0ab6d196ace..ee60d97babcbc7 100644 --- a/llvm/docs/CommandGuide/llvm-symbolizer.rst +++ b/llvm/docs/CommandGuide/llvm-symbolizer.rst @@ -216,6 +216,10 @@ OPTIONS This can be used to perform lookups as if the object were relocated by the offset. +.. option:: --approximate-line-info= + + Search the object to find the approximate non-zero line numbers nearest to for a given address. + .. option:: --basenames, -s Print just the file's name without any directories, instead of the diff --git a/llvm/include/llvm/DebugInfo/DIContext.h b/llvm/include/llvm/DebugInfo/DIContext.h index 288ddf77bdfda7..d3a625b31d2060 100644 --- a/llvm/include/llvm/DebugInfo/DIContext.h +++ b/llvm/include/llvm/DebugInfo/DIContext.h @@ -152,14 +152,16 @@ struct DILineInfoSpecifier { RelativeFilePath, AbsoluteFilePath }; + enum ApproximateLineKind { None, Before, After }; using FunctionNameKind = DINameKind; - FileLineInfoKind FLIKind; FunctionNameKind FNKind; + ApproximateLineKind ALKind; DILineInfoSpecifier(FileLineInfoKind FLIKind = FileLineInfoKind::RawValue, - FunctionNameKind FNKind = FunctionNameKind::None) - : FLIKind(FLIKind), FNKind(FNKind) {} + FunctionNameKind FNKind = FunctionNameKind::None, + ApproximateLineKind ALKind = ApproximateLineKind::None) + : FLIKind(FLIKind), FNKind(FNKind), ALKind(ALKind) {} inline bool operator==(const DILineInfoSpecifier &RHS) const { return FLIKind == RHS.FLIKind && FNKind == RHS.FNKind; diff --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h index ce3bae6a1760c2..cb3531b75730f1 100644 --- a/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h +++ b/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h @@ -240,7 +240,9 @@ class DWARFDebugLine { /// Returns the index of the row with file/line info for a given address, /// or UnknownRowIndex if there is no such row. - uint32_t lookupAddress(object::SectionedAddress Address) const; + uint32_t + lookupAddress(object::SectionedAddress Address, + DILineInfoSpecifier::ApproximateLineKind LineKind) const; bool lookupAddressRange(object::SectionedAddress Address, uint64_t Size, std::vector &Result) const; @@ -266,10 +268,10 @@ class DWARFDebugLine { /// Fills the Result argument with the file and line information /// corresponding to Address. Returns true on success. - bool getFileLineInfoForAddress(object::SectionedAddress Address, - const char *CompDir, - DILineInfoSpecifier::FileLineInfoKind Kind, - DILineInfo &Result) const; + bool getFileLineInfoForAddress( + object::SectionedAddress Address, + DILineInfoSpecifier::ApproximateLineKind LineKind, const char *CompDir, + DILineInfoSpecifier::FileLineInfoKind Kind, DILineInfo &Result) const; /// Extracts directory name by its Entry in include directories table /// in prologue. Returns true on success. @@ -301,7 +303,9 @@ class DWARFDebugLine { getSourceByIndex(uint64_t FileIndex, DILineInfoSpecifier::FileLineInfoKind Kind) const; - uint32_t lookupAddressImpl(object::SectionedAddress Address) const; + uint32_t + lookupAddressImpl(object::SectionedAddress Address, + DILineInfoSpecifier::ApproximateLineKind LineKind) const; bool lookupAddressRangeImpl(object::SectionedAddress Address, uint64_t Size, std::vector &Result) const; diff --git a/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h b/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h index 11a169cfc20a69..7b560f4b7dbb2f 100644 --- a/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h +++ b/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h @@ -44,6 +44,7 @@ using namespace object; using FunctionNameKind = DILineInfoSpecifier::FunctionNameKind; using FileLineInfoKind = DILineInfoSpecifier::FileLineInfoKind; +using ApproximateLineKind = DILineInfoSpecifier::ApproximateLineKind; class CachedBinary; @@ -52,6 +53,7 @@ class LLVMSymbolizer { struct Options { FunctionNameKind PrintFunctions = FunctionNameKind::LinkageName; FileLineInfoKind PathStyle = FileLineInfoKind::AbsoluteFilePath; + ApproximateLineKind ApproximateLineNumbers = ApproximateLineKind::None; bool UseSymbolTable = true; bool Demangle = true; bool RelativeAddresses = false; diff --git a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp index b7297c18da7c99..9bf7dbd0acc109 100644 --- a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp +++ b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp @@ -1742,8 +1742,8 @@ DILineInfo DWARFContext::getLineInfoForAddress(object::SectionedAddress Address, if (Spec.FLIKind != FileLineInfoKind::None) { if (const DWARFLineTable *LineTable = getLineTableForUnit(CU)) { LineTable->getFileLineInfoForAddress( - {Address.Address, Address.SectionIndex}, CU->getCompilationDir(), - Spec.FLIKind, Result); + {Address.Address, Address.SectionIndex}, Spec.ALKind, + CU->getCompilationDir(), Spec.FLIKind, Result); } } @@ -1838,7 +1838,7 @@ DWARFContext::getInliningInfoForAddress(object::SectionedAddress Address, DILineInfo Frame; LineTable = getLineTableForUnit(CU); if (LineTable && LineTable->getFileLineInfoForAddress( - {Address.Address, Address.SectionIndex}, + {Address.Address, Address.SectionIndex}, Spec.ALKind, CU->getCompilationDir(), Spec.FLIKind, Frame)) InliningInfo.addFrame(Frame); } @@ -1865,8 +1865,8 @@ DWARFContext::getInliningInfoForAddress(object::SectionedAddress Address, // For the topmost routine, get file/line info from line table. if (LineTable) LineTable->getFileLineInfoForAddress( - {Address.Address, Address.SectionIndex}, CU->getCompilationDir(), - Spec.FLIKind, Frame); + {Address.Address, Address.SectionIndex}, Spec.ALKind, + CU->getCompilationDir(), Spec.FLIKind, Frame); } else { // Otherwise, use call file, call line and call column from // previous DIE in inlined chain. diff --git a/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp b/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp index 28f05644a3aa11..c6baad8ee9b5ea 100644 --- a/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp +++ b/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp @@ -1297,10 +1297,11 @@ uint32_t DWARFDebugLine::LineTable::findRowInSeq( } uint32_t DWARFDebugLine::LineTable::lookupAddress( - object::SectionedAddress Address) const { + object::SectionedAddress Address, + DILineInfoSpecifier::ApproximateLineKind LineKind) const { // Search for relocatable addresses - uint32_t Result = lookupAddressImpl(Address); + uint32_t Result = lookupAddressImpl(Address, LineKind); if (Result != UnknownRowIndex || Address.SectionIndex == object::SectionedAddress::UndefSection) @@ -1308,11 +1309,12 @@ uint32_t DWARFDebugLine::LineTable::lookupAddress( // Search for absolute addresses Address.SectionIndex = object::SectionedAddress::UndefSection; - return lookupAddressImpl(Address); + return lookupAddressImpl(Address, LineKind); } uint32_t DWARFDebugLine::LineTable::lookupAddressImpl( - object::SectionedAddress Address) const { + object::SectionedAddress Address, + DILineInfoSpecifier::ApproximateLineKind LineKind) const { // First, find an instruction sequence containing the given address. DWARFDebugLine::Sequence Sequence; Sequence.SectionIndex = Address.SectionIndex; @@ -1321,7 +1323,24 @@ uint32_t DWARFDebugLine::LineTable::lookupAddressImpl( DWARFDebugLine::Sequence::orderByHighPC); if (It == Sequences.end() || It->SectionIndex != Address.SectionIndex) return UnknownRowIndex; - return findRowInSeq(*It, Address); + + uint32_t RowIndex = findRowInSeq(*It, Address); + if (LineKind == DILineInfoSpecifier::ApproximateLineKind::Before) { + for (auto SeqInst = Sequence.HighPC; SeqInst >= It->LowPC; --SeqInst) { + if (Rows[RowIndex].Line) + break; + Address.Address--; + RowIndex = findRowInSeq(*It, Address); + } + } else if (LineKind == DILineInfoSpecifier::ApproximateLineKind::After) { + for (auto SeqInst = Sequence.HighPC; SeqInst < It->HighPC; ++SeqInst) { + if (Rows[RowIndex].Line) + break; + Address.Address++; + RowIndex = findRowInSeq(*It, Address); + } + } + return RowIndex; } bool DWARFDebugLine::LineTable::lookupAddressRange( @@ -1461,10 +1480,12 @@ bool DWARFDebugLine::Prologue::getFileNameByIndex( } bool DWARFDebugLine::LineTable::getFileLineInfoForAddress( - object::SectionedAddress Address, const char *CompDir, + object::SectionedAddress Address, + DILineInfoSpecifier::ApproximateLineKind LineKind, const char *CompDir, FileLineInfoKind Kind, DILineInfo &Result) const { // Get the index of row we're looking for in the line table. - uint32_t RowIndex = lookupAddress(Address); + uint32_t RowIndex = lookupAddress(Address, LineKind); + if (RowIndex == -1U) return false; // Take file number and line/column from the row. diff --git a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp index 5f29226c14b705..18be137d91d694 100644 --- a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp +++ b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp @@ -71,7 +71,9 @@ LLVMSymbolizer::symbolizeCodeCommon(const T &ModuleSpecifier, ModuleOffset.Address += Info->getModulePreferredBase(); DILineInfo LineInfo = Info->symbolizeCode( - ModuleOffset, DILineInfoSpecifier(Opts.PathStyle, Opts.PrintFunctions), + ModuleOffset, + DILineInfoSpecifier(Opts.PathStyle, Opts.PrintFunctions, + Opts.ApproximateLineNumbers), Opts.UseSymbolTable); if (Opts.Demangle) LineInfo.FunctionName = DemangleName(LineInfo.FunctionName, Info); @@ -116,7 +118,9 @@ Expected LLVMSymbolizer::symbolizeInlinedCodeCommon( ModuleOffset.Address += Info->getModulePreferredBase(); DIInliningInfo InlinedContext = Info->symbolizeInlinedCode( - ModuleOffset, DILineInfoSpecifier(Opts.PathStyle, Opts.PrintFunctions), + ModuleOffset, + DILineInfoSpecifier(Opts.PathStyle, Opts.PrintFunctions, + Opts.ApproximateLineNumbers), Opts.UseSymbolTable); if (Opts.Demangle) { for (int i = 0, n = InlinedContext.getNumberOfFrames(); i < n; i++) { diff --git a/llvm/test/tools/llvm-symbolizer/approximate-line-info.s b/llvm/test/tools/llvm-symbolizer/approximate-line-info.s new file mode 100644 index 00000000000000..b7d56b0e64534c --- /dev/null +++ b/llvm/test/tools/llvm-symbolizer/approximate-line-info.s @@ -0,0 +1,142 @@ +# REQUIRES: x86-registered-target + +# RUN: llvm-mc -g -filetype=obj -triple=x86_64-pc-linux %s -o %t.o +# RUN: llvm-symbolizer --obj=%t.o 0xa | FileCheck --check-prefix=APPROX-NONE %s +# RUN: llvm-symbolizer --obj=%t.o --approximate-line-info=before 0xa | FileCheck --check-prefix=APPROX-BEFORE %s +# RUN: llvm-symbolizer --obj=%t.o --approximate-line-info=after 0xa | FileCheck --check-prefix=APPROX-AFTER %s + +# APPROX-NONE: main +# APPROX-NONE-NEXT: /home/ampandey/test-hip/main.c:0:6 +# APPROX-BEFORE: main +# APPROX-BEFORE-NEXT: /home/ampandey/test-hip/main.c:4:6 +# APPROX-AFTER: main +# APPROX-AFTER-NEXT: /home/ampandey/test-hip/main.c:8:2 + +## Generated from C Code +## +## int foo = 0; +## int x=89; +## int main() { +## if(x) +## return foo; +## else +## return x; +## } +## +## clang -S -O3 -gline-tables-only --target=x86_64-pc-linux + + .text + .file "main.c" + .globl main # -- Begin function main + .p2align 4, 0x90 + .type main,@function +main: # @main +.Lfunc_begin0: + .file 0 "/home/ampandey/test-hip" "main.c" md5 0x26c3fbaea8e6febaf09ef44d37ec5ecc + .cfi_startproc +# %bb.0: # %entry + .loc 0 4 6 prologue_end # main.c:4:6 + movl x(%rip), %eax + testl %eax, %eax + je .LBB0_2 +# %bb.1: # %entry + .loc 0 0 6 is_stmt 0 # main.c:0:6 + movl foo(%rip), %eax +.LBB0_2: # %entry + .loc 0 8 2 is_stmt 1 # main.c:8:2 + retq +.Ltmp0: +.Lfunc_end0: + .size main, .Lfunc_end0-main + .cfi_endproc + # -- End function + .type foo,@object # @foo + .bss + .globl foo + .p2align 2, 0x0 +foo: + .long 0 # 0x0 + .size foo, 4 + + .type x,@object # @x + .data + .globl x + .p2align 2, 0x0 +x: + .long 89 # 0x59 + .size x, 4 + + .section .debug_abbrev,"",@progbits + .byte 1 # Abbreviation Code + .byte 17 # DW_TAG_compile_unit + .byte 0 # DW_CHILDREN_no + .byte 37 # DW_AT_producer + .byte 37 # DW_FORM_strx1 + .byte 19 # DW_AT_language + .byte 5 # DW_FORM_data2 + .byte 3 # DW_AT_name + .byte 37 # DW_FORM_strx1 + .byte 114 # DW_AT_str_offsets_base + .byte 23 # DW_FORM_sec_offset + .byte 16 # DW_AT_stmt_list + .byte 23 # DW_FORM_sec_offset + .byte 27 # DW_AT_comp_dir + .byte 37 # DW_FORM_strx1 + .byte 17 # DW_AT_low_pc + .byte 27 # DW_FORM_addrx + .byte 18 # DW_AT_high_pc + .byte 6 # DW_FORM_data4 + .byte 115 # DW_AT_addr_base + .byte 23 # DW_FORM_sec_offset + .byte 0 # EOM(1) + .byte 0 # EOM(2) + .byte 0 # EOM(3) + .section .debug_info,"",@progbits +.Lcu_begin0: + .long .Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit +.Ldebug_info_start0: + .short 5 # DWARF version number + .byte 1 # DWARF Unit Type + .byte 8 # Address Size (in bytes) + .long .debug_abbrev # Offset Into Abbrev. Section + .byte 1 # Abbrev [1] 0xc:0x17 DW_TAG_compile_unit + .byte 0 # DW_AT_producer + .short 29 # DW_AT_language + .byte 1 # DW_AT_name + .long .Lstr_offsets_base0 # DW_AT_str_offsets_base + .long .Lline_table_start0 # DW_AT_stmt_list + .byte 2 # DW_AT_comp_dir + .byte 0 # DW_AT_low_pc + .long .Lfunc_end0-.Lfunc_begin0 # DW_AT_high_pc + .long .Laddr_table_base0 # DW_AT_addr_base +.Ldebug_info_end0: + .section .debug_str_offsets,"",@progbits + .long 16 # Length of String Offsets Set + .short 5 + .short 0 +.Lstr_offsets_base0: + .section .debug_str,"MS",@progbits,1 +.Linfo_string0: + .asciz "clang version 19.0.0git (git@github.com:ampandey-1995/llvm-project.git 6751baed8d1ee8c5fd12fe5a06aa67275fc1ebf6)" # string offset=0 +.Linfo_string1: + .asciz "main.c" # string offset=113 +.Linfo_string2: + .asciz "/home/ampandey/test-hip" # string offset=120 + .section .debug_str_offsets,"",@progbits + .long .Linfo_string0 + .long .Linfo_string1 + .long .Linfo_string2 + .section .debug_addr,"",@progbits + .long .Ldebug_addr_end0-.Ldebug_addr_start0 # Length of contribution +.Ldebug_addr_start0: + .short 5 # DWARF version number + .byte 8 # Address size + .byte 0 # Segment selector size +.Laddr_table_base0: + .quad .Lfunc_begin0 +.Ldebug_addr_end0: + .ident "clang version 19.0.0git (git@github.com:ampandey-1995/llvm-project.git 6751baed8d1ee8c5fd12fe5a06aa67275fc1ebf6)" + .section ".note.GNU-stack","",@progbits + .addrsig + .section .debug_line,"",@progbits +.Lline_table_start0: diff --git a/llvm/tools/llvm-symbolizer/Opts.td b/llvm/tools/llvm-symbolizer/Opts.td index edc80bfe59673b..80ec4721c45e00 100644 --- a/llvm/tools/llvm-symbolizer/Opts.td +++ b/llvm/tools/llvm-symbolizer/Opts.td @@ -17,6 +17,7 @@ def grp_mach_o : OptionGroup<"kind">, HelpText<"llvm-symbolizer Mach-O Specific Options">; def addresses : F<"addresses", "Show address before line information">; +defm approximate_line_info : Eq<"approximate-line-info","Find approximate non-zero line number information nearest to given address.">,Values<"">; defm adjust_vma : Eq<"adjust-vma", "Add specified offset to object file addresses">, MetaVarName<"">; diff --git a/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp b/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp index b98bdbc388faf2..530dbdfd5c8b5e 100644 --- a/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp +++ b/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp @@ -482,6 +482,14 @@ int llvm_symbolizer_main(int argc, char **argv, const llvm::ToolContext &) { } else { Opts.PathStyle = DILineInfoSpecifier::FileLineInfoKind::AbsoluteFilePath; } + StringRef ApproximateLineKindVal = + Args.getLastArgValue(OPT_approximate_line_info_EQ); + Opts.ApproximateLineNumbers = + ApproximateLineKindVal == "before" + ? DILineInfoSpecifier::ApproximateLineKind::Before + : ApproximateLineKindVal == "after" + ? DILineInfoSpecifier::ApproximateLineKind::After + : DILineInfoSpecifier::ApproximateLineKind::None; Opts.DebugFileDirectory = Args.getAllArgValues(OPT_debug_file_directory_EQ); Opts.DefaultArch = Args.getLastArgValue(OPT_default_arch_EQ).str(); Opts.Demangle = Args.hasFlag(OPT_demangle, OPT_no_demangle, !IsAddr2Line); ``````````
llvmbot commented 8 months ago

@llvm/pr-subscribers-lld-elf

Author: None (ampandey-1995)

Changes LLVM Symbolizer attempt to symbolize addresses of optimized binaries reports missing line numbers for some cases. It maybe due to compiler which sometimes cannot map an instruction to line number due to optimizations. Symbolizer should handle those cases gracefully. Adding an option '-approximate-line-info=<before/after>' to symbolizer so as to report the nearest non-zero line number. --- Full diff: https://github.com/llvm/llvm-project/pull/82240.diff 12 Files Affected: - (modified) bolt/lib/Core/BinaryFunction.cpp (+2-1) - (modified) lld/Common/DWARF.cpp (+2-1) - (modified) llvm/docs/CommandGuide/llvm-symbolizer.rst (+4) - (modified) llvm/include/llvm/DebugInfo/DIContext.h (+5-3) - (modified) llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h (+10-6) - (modified) llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h (+2) - (modified) llvm/lib/DebugInfo/DWARF/DWARFContext.cpp (+5-5) - (modified) llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp (+28-7) - (modified) llvm/lib/DebugInfo/Symbolize/Symbolize.cpp (+6-2) - (added) llvm/test/tools/llvm-symbolizer/approximate-line-info.s (+142) - (modified) llvm/tools/llvm-symbolizer/Opts.td (+1) - (modified) llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp (+8) ``````````diff diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp index 54f2f9d972a461..0b37eab302cdc9 100644 --- a/bolt/lib/Core/BinaryFunction.cpp +++ b/bolt/lib/Core/BinaryFunction.cpp @@ -192,7 +192,8 @@ static SMLoc findDebugLineInformationForInstructionAt( SMLoc NullResult = DebugLineTableRowRef::NULL_ROW.toSMLoc(); uint32_t RowIndex = LineTable->lookupAddress( - {Address, object::SectionedAddress::UndefSection}); + {Address, object::SectionedAddress::UndefSection}, + DILineInfoSpecifier::ApproximateLineKind::None); if (RowIndex == LineTable->UnknownRowIndex) return NullResult; diff --git a/lld/Common/DWARF.cpp b/lld/Common/DWARF.cpp index 2cd8ca4575dee5..8e1e9c6e530157 100644 --- a/lld/Common/DWARF.cpp +++ b/lld/Common/DWARF.cpp @@ -93,7 +93,8 @@ std::optional DWARFCache::getDILineInfo(uint64_t offset, DILineInfo info; for (const llvm::DWARFDebugLine::LineTable *lt : lineTables) { if (lt->getFileLineInfoForAddress( - {offset, sectionIndex}, nullptr, + {offset, sectionIndex}, + DILineInfoSpecifier::ApproximateLineKind::None, nullptr, DILineInfoSpecifier::FileLineInfoKind::AbsoluteFilePath, info)) return info; } diff --git a/llvm/docs/CommandGuide/llvm-symbolizer.rst b/llvm/docs/CommandGuide/llvm-symbolizer.rst index 59c0ab6d196ace..ee60d97babcbc7 100644 --- a/llvm/docs/CommandGuide/llvm-symbolizer.rst +++ b/llvm/docs/CommandGuide/llvm-symbolizer.rst @@ -216,6 +216,10 @@ OPTIONS This can be used to perform lookups as if the object were relocated by the offset. +.. option:: --approximate-line-info= + + Search the object to find the approximate non-zero line numbers nearest to for a given address. + .. option:: --basenames, -s Print just the file's name without any directories, instead of the diff --git a/llvm/include/llvm/DebugInfo/DIContext.h b/llvm/include/llvm/DebugInfo/DIContext.h index 288ddf77bdfda7..d3a625b31d2060 100644 --- a/llvm/include/llvm/DebugInfo/DIContext.h +++ b/llvm/include/llvm/DebugInfo/DIContext.h @@ -152,14 +152,16 @@ struct DILineInfoSpecifier { RelativeFilePath, AbsoluteFilePath }; + enum ApproximateLineKind { None, Before, After }; using FunctionNameKind = DINameKind; - FileLineInfoKind FLIKind; FunctionNameKind FNKind; + ApproximateLineKind ALKind; DILineInfoSpecifier(FileLineInfoKind FLIKind = FileLineInfoKind::RawValue, - FunctionNameKind FNKind = FunctionNameKind::None) - : FLIKind(FLIKind), FNKind(FNKind) {} + FunctionNameKind FNKind = FunctionNameKind::None, + ApproximateLineKind ALKind = ApproximateLineKind::None) + : FLIKind(FLIKind), FNKind(FNKind), ALKind(ALKind) {} inline bool operator==(const DILineInfoSpecifier &RHS) const { return FLIKind == RHS.FLIKind && FNKind == RHS.FNKind; diff --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h index ce3bae6a1760c2..cb3531b75730f1 100644 --- a/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h +++ b/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h @@ -240,7 +240,9 @@ class DWARFDebugLine { /// Returns the index of the row with file/line info for a given address, /// or UnknownRowIndex if there is no such row. - uint32_t lookupAddress(object::SectionedAddress Address) const; + uint32_t + lookupAddress(object::SectionedAddress Address, + DILineInfoSpecifier::ApproximateLineKind LineKind) const; bool lookupAddressRange(object::SectionedAddress Address, uint64_t Size, std::vector &Result) const; @@ -266,10 +268,10 @@ class DWARFDebugLine { /// Fills the Result argument with the file and line information /// corresponding to Address. Returns true on success. - bool getFileLineInfoForAddress(object::SectionedAddress Address, - const char *CompDir, - DILineInfoSpecifier::FileLineInfoKind Kind, - DILineInfo &Result) const; + bool getFileLineInfoForAddress( + object::SectionedAddress Address, + DILineInfoSpecifier::ApproximateLineKind LineKind, const char *CompDir, + DILineInfoSpecifier::FileLineInfoKind Kind, DILineInfo &Result) const; /// Extracts directory name by its Entry in include directories table /// in prologue. Returns true on success. @@ -301,7 +303,9 @@ class DWARFDebugLine { getSourceByIndex(uint64_t FileIndex, DILineInfoSpecifier::FileLineInfoKind Kind) const; - uint32_t lookupAddressImpl(object::SectionedAddress Address) const; + uint32_t + lookupAddressImpl(object::SectionedAddress Address, + DILineInfoSpecifier::ApproximateLineKind LineKind) const; bool lookupAddressRangeImpl(object::SectionedAddress Address, uint64_t Size, std::vector &Result) const; diff --git a/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h b/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h index 11a169cfc20a69..7b560f4b7dbb2f 100644 --- a/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h +++ b/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h @@ -44,6 +44,7 @@ using namespace object; using FunctionNameKind = DILineInfoSpecifier::FunctionNameKind; using FileLineInfoKind = DILineInfoSpecifier::FileLineInfoKind; +using ApproximateLineKind = DILineInfoSpecifier::ApproximateLineKind; class CachedBinary; @@ -52,6 +53,7 @@ class LLVMSymbolizer { struct Options { FunctionNameKind PrintFunctions = FunctionNameKind::LinkageName; FileLineInfoKind PathStyle = FileLineInfoKind::AbsoluteFilePath; + ApproximateLineKind ApproximateLineNumbers = ApproximateLineKind::None; bool UseSymbolTable = true; bool Demangle = true; bool RelativeAddresses = false; diff --git a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp index b7297c18da7c99..9bf7dbd0acc109 100644 --- a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp +++ b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp @@ -1742,8 +1742,8 @@ DILineInfo DWARFContext::getLineInfoForAddress(object::SectionedAddress Address, if (Spec.FLIKind != FileLineInfoKind::None) { if (const DWARFLineTable *LineTable = getLineTableForUnit(CU)) { LineTable->getFileLineInfoForAddress( - {Address.Address, Address.SectionIndex}, CU->getCompilationDir(), - Spec.FLIKind, Result); + {Address.Address, Address.SectionIndex}, Spec.ALKind, + CU->getCompilationDir(), Spec.FLIKind, Result); } } @@ -1838,7 +1838,7 @@ DWARFContext::getInliningInfoForAddress(object::SectionedAddress Address, DILineInfo Frame; LineTable = getLineTableForUnit(CU); if (LineTable && LineTable->getFileLineInfoForAddress( - {Address.Address, Address.SectionIndex}, + {Address.Address, Address.SectionIndex}, Spec.ALKind, CU->getCompilationDir(), Spec.FLIKind, Frame)) InliningInfo.addFrame(Frame); } @@ -1865,8 +1865,8 @@ DWARFContext::getInliningInfoForAddress(object::SectionedAddress Address, // For the topmost routine, get file/line info from line table. if (LineTable) LineTable->getFileLineInfoForAddress( - {Address.Address, Address.SectionIndex}, CU->getCompilationDir(), - Spec.FLIKind, Frame); + {Address.Address, Address.SectionIndex}, Spec.ALKind, + CU->getCompilationDir(), Spec.FLIKind, Frame); } else { // Otherwise, use call file, call line and call column from // previous DIE in inlined chain. diff --git a/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp b/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp index 28f05644a3aa11..c6baad8ee9b5ea 100644 --- a/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp +++ b/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp @@ -1297,10 +1297,11 @@ uint32_t DWARFDebugLine::LineTable::findRowInSeq( } uint32_t DWARFDebugLine::LineTable::lookupAddress( - object::SectionedAddress Address) const { + object::SectionedAddress Address, + DILineInfoSpecifier::ApproximateLineKind LineKind) const { // Search for relocatable addresses - uint32_t Result = lookupAddressImpl(Address); + uint32_t Result = lookupAddressImpl(Address, LineKind); if (Result != UnknownRowIndex || Address.SectionIndex == object::SectionedAddress::UndefSection) @@ -1308,11 +1309,12 @@ uint32_t DWARFDebugLine::LineTable::lookupAddress( // Search for absolute addresses Address.SectionIndex = object::SectionedAddress::UndefSection; - return lookupAddressImpl(Address); + return lookupAddressImpl(Address, LineKind); } uint32_t DWARFDebugLine::LineTable::lookupAddressImpl( - object::SectionedAddress Address) const { + object::SectionedAddress Address, + DILineInfoSpecifier::ApproximateLineKind LineKind) const { // First, find an instruction sequence containing the given address. DWARFDebugLine::Sequence Sequence; Sequence.SectionIndex = Address.SectionIndex; @@ -1321,7 +1323,24 @@ uint32_t DWARFDebugLine::LineTable::lookupAddressImpl( DWARFDebugLine::Sequence::orderByHighPC); if (It == Sequences.end() || It->SectionIndex != Address.SectionIndex) return UnknownRowIndex; - return findRowInSeq(*It, Address); + + uint32_t RowIndex = findRowInSeq(*It, Address); + if (LineKind == DILineInfoSpecifier::ApproximateLineKind::Before) { + for (auto SeqInst = Sequence.HighPC; SeqInst >= It->LowPC; --SeqInst) { + if (Rows[RowIndex].Line) + break; + Address.Address--; + RowIndex = findRowInSeq(*It, Address); + } + } else if (LineKind == DILineInfoSpecifier::ApproximateLineKind::After) { + for (auto SeqInst = Sequence.HighPC; SeqInst < It->HighPC; ++SeqInst) { + if (Rows[RowIndex].Line) + break; + Address.Address++; + RowIndex = findRowInSeq(*It, Address); + } + } + return RowIndex; } bool DWARFDebugLine::LineTable::lookupAddressRange( @@ -1461,10 +1480,12 @@ bool DWARFDebugLine::Prologue::getFileNameByIndex( } bool DWARFDebugLine::LineTable::getFileLineInfoForAddress( - object::SectionedAddress Address, const char *CompDir, + object::SectionedAddress Address, + DILineInfoSpecifier::ApproximateLineKind LineKind, const char *CompDir, FileLineInfoKind Kind, DILineInfo &Result) const { // Get the index of row we're looking for in the line table. - uint32_t RowIndex = lookupAddress(Address); + uint32_t RowIndex = lookupAddress(Address, LineKind); + if (RowIndex == -1U) return false; // Take file number and line/column from the row. diff --git a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp index 5f29226c14b705..18be137d91d694 100644 --- a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp +++ b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp @@ -71,7 +71,9 @@ LLVMSymbolizer::symbolizeCodeCommon(const T &ModuleSpecifier, ModuleOffset.Address += Info->getModulePreferredBase(); DILineInfo LineInfo = Info->symbolizeCode( - ModuleOffset, DILineInfoSpecifier(Opts.PathStyle, Opts.PrintFunctions), + ModuleOffset, + DILineInfoSpecifier(Opts.PathStyle, Opts.PrintFunctions, + Opts.ApproximateLineNumbers), Opts.UseSymbolTable); if (Opts.Demangle) LineInfo.FunctionName = DemangleName(LineInfo.FunctionName, Info); @@ -116,7 +118,9 @@ Expected LLVMSymbolizer::symbolizeInlinedCodeCommon( ModuleOffset.Address += Info->getModulePreferredBase(); DIInliningInfo InlinedContext = Info->symbolizeInlinedCode( - ModuleOffset, DILineInfoSpecifier(Opts.PathStyle, Opts.PrintFunctions), + ModuleOffset, + DILineInfoSpecifier(Opts.PathStyle, Opts.PrintFunctions, + Opts.ApproximateLineNumbers), Opts.UseSymbolTable); if (Opts.Demangle) { for (int i = 0, n = InlinedContext.getNumberOfFrames(); i < n; i++) { diff --git a/llvm/test/tools/llvm-symbolizer/approximate-line-info.s b/llvm/test/tools/llvm-symbolizer/approximate-line-info.s new file mode 100644 index 00000000000000..b7d56b0e64534c --- /dev/null +++ b/llvm/test/tools/llvm-symbolizer/approximate-line-info.s @@ -0,0 +1,142 @@ +# REQUIRES: x86-registered-target + +# RUN: llvm-mc -g -filetype=obj -triple=x86_64-pc-linux %s -o %t.o +# RUN: llvm-symbolizer --obj=%t.o 0xa | FileCheck --check-prefix=APPROX-NONE %s +# RUN: llvm-symbolizer --obj=%t.o --approximate-line-info=before 0xa | FileCheck --check-prefix=APPROX-BEFORE %s +# RUN: llvm-symbolizer --obj=%t.o --approximate-line-info=after 0xa | FileCheck --check-prefix=APPROX-AFTER %s + +# APPROX-NONE: main +# APPROX-NONE-NEXT: /home/ampandey/test-hip/main.c:0:6 +# APPROX-BEFORE: main +# APPROX-BEFORE-NEXT: /home/ampandey/test-hip/main.c:4:6 +# APPROX-AFTER: main +# APPROX-AFTER-NEXT: /home/ampandey/test-hip/main.c:8:2 + +## Generated from C Code +## +## int foo = 0; +## int x=89; +## int main() { +## if(x) +## return foo; +## else +## return x; +## } +## +## clang -S -O3 -gline-tables-only --target=x86_64-pc-linux + + .text + .file "main.c" + .globl main # -- Begin function main + .p2align 4, 0x90 + .type main,@function +main: # @main +.Lfunc_begin0: + .file 0 "/home/ampandey/test-hip" "main.c" md5 0x26c3fbaea8e6febaf09ef44d37ec5ecc + .cfi_startproc +# %bb.0: # %entry + .loc 0 4 6 prologue_end # main.c:4:6 + movl x(%rip), %eax + testl %eax, %eax + je .LBB0_2 +# %bb.1: # %entry + .loc 0 0 6 is_stmt 0 # main.c:0:6 + movl foo(%rip), %eax +.LBB0_2: # %entry + .loc 0 8 2 is_stmt 1 # main.c:8:2 + retq +.Ltmp0: +.Lfunc_end0: + .size main, .Lfunc_end0-main + .cfi_endproc + # -- End function + .type foo,@object # @foo + .bss + .globl foo + .p2align 2, 0x0 +foo: + .long 0 # 0x0 + .size foo, 4 + + .type x,@object # @x + .data + .globl x + .p2align 2, 0x0 +x: + .long 89 # 0x59 + .size x, 4 + + .section .debug_abbrev,"",@progbits + .byte 1 # Abbreviation Code + .byte 17 # DW_TAG_compile_unit + .byte 0 # DW_CHILDREN_no + .byte 37 # DW_AT_producer + .byte 37 # DW_FORM_strx1 + .byte 19 # DW_AT_language + .byte 5 # DW_FORM_data2 + .byte 3 # DW_AT_name + .byte 37 # DW_FORM_strx1 + .byte 114 # DW_AT_str_offsets_base + .byte 23 # DW_FORM_sec_offset + .byte 16 # DW_AT_stmt_list + .byte 23 # DW_FORM_sec_offset + .byte 27 # DW_AT_comp_dir + .byte 37 # DW_FORM_strx1 + .byte 17 # DW_AT_low_pc + .byte 27 # DW_FORM_addrx + .byte 18 # DW_AT_high_pc + .byte 6 # DW_FORM_data4 + .byte 115 # DW_AT_addr_base + .byte 23 # DW_FORM_sec_offset + .byte 0 # EOM(1) + .byte 0 # EOM(2) + .byte 0 # EOM(3) + .section .debug_info,"",@progbits +.Lcu_begin0: + .long .Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit +.Ldebug_info_start0: + .short 5 # DWARF version number + .byte 1 # DWARF Unit Type + .byte 8 # Address Size (in bytes) + .long .debug_abbrev # Offset Into Abbrev. Section + .byte 1 # Abbrev [1] 0xc:0x17 DW_TAG_compile_unit + .byte 0 # DW_AT_producer + .short 29 # DW_AT_language + .byte 1 # DW_AT_name + .long .Lstr_offsets_base0 # DW_AT_str_offsets_base + .long .Lline_table_start0 # DW_AT_stmt_list + .byte 2 # DW_AT_comp_dir + .byte 0 # DW_AT_low_pc + .long .Lfunc_end0-.Lfunc_begin0 # DW_AT_high_pc + .long .Laddr_table_base0 # DW_AT_addr_base +.Ldebug_info_end0: + .section .debug_str_offsets,"",@progbits + .long 16 # Length of String Offsets Set + .short 5 + .short 0 +.Lstr_offsets_base0: + .section .debug_str,"MS",@progbits,1 +.Linfo_string0: + .asciz "clang version 19.0.0git (git@github.com:ampandey-1995/llvm-project.git 6751baed8d1ee8c5fd12fe5a06aa67275fc1ebf6)" # string offset=0 +.Linfo_string1: + .asciz "main.c" # string offset=113 +.Linfo_string2: + .asciz "/home/ampandey/test-hip" # string offset=120 + .section .debug_str_offsets,"",@progbits + .long .Linfo_string0 + .long .Linfo_string1 + .long .Linfo_string2 + .section .debug_addr,"",@progbits + .long .Ldebug_addr_end0-.Ldebug_addr_start0 # Length of contribution +.Ldebug_addr_start0: + .short 5 # DWARF version number + .byte 8 # Address size + .byte 0 # Segment selector size +.Laddr_table_base0: + .quad .Lfunc_begin0 +.Ldebug_addr_end0: + .ident "clang version 19.0.0git (git@github.com:ampandey-1995/llvm-project.git 6751baed8d1ee8c5fd12fe5a06aa67275fc1ebf6)" + .section ".note.GNU-stack","",@progbits + .addrsig + .section .debug_line,"",@progbits +.Lline_table_start0: diff --git a/llvm/tools/llvm-symbolizer/Opts.td b/llvm/tools/llvm-symbolizer/Opts.td index edc80bfe59673b..80ec4721c45e00 100644 --- a/llvm/tools/llvm-symbolizer/Opts.td +++ b/llvm/tools/llvm-symbolizer/Opts.td @@ -17,6 +17,7 @@ def grp_mach_o : OptionGroup<"kind">, HelpText<"llvm-symbolizer Mach-O Specific Options">; def addresses : F<"addresses", "Show address before line information">; +defm approximate_line_info : Eq<"approximate-line-info","Find approximate non-zero line number information nearest to given address.">,Values<"">; defm adjust_vma : Eq<"adjust-vma", "Add specified offset to object file addresses">, MetaVarName<"">; diff --git a/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp b/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp index b98bdbc388faf2..530dbdfd5c8b5e 100644 --- a/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp +++ b/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp @@ -482,6 +482,14 @@ int llvm_symbolizer_main(int argc, char **argv, const llvm::ToolContext &) { } else { Opts.PathStyle = DILineInfoSpecifier::FileLineInfoKind::AbsoluteFilePath; } + StringRef ApproximateLineKindVal = + Args.getLastArgValue(OPT_approximate_line_info_EQ); + Opts.ApproximateLineNumbers = + ApproximateLineKindVal == "before" + ? DILineInfoSpecifier::ApproximateLineKind::Before + : ApproximateLineKindVal == "after" + ? DILineInfoSpecifier::ApproximateLineKind::After + : DILineInfoSpecifier::ApproximateLineKind::None; Opts.DebugFileDirectory = Args.getAllArgValues(OPT_debug_file_directory_EQ); Opts.DefaultArch = Args.getLastArgValue(OPT_default_arch_EQ).str(); Opts.Demangle = Args.hasFlag(OPT_demangle, OPT_no_demangle, !IsAddr2Line); ``````````
jh7370 commented 8 months ago

Hi @ampandey-1995, how does this relate to #71032?

ampandey-1995 commented 8 months ago

Hi @ampandey-1995, how does this relate to #71032?

This patch is similar to the functionality achieved in 71032 and somewhat refined based on comments of @jh7370 & @dwblaikie .

jh7370 commented 8 months ago

Hi @ampandey-1995, how does this relate to #71032?

This patch is similar to the functionality achieved in 71032 and somewhat refined based on comments of @jh7370 & @dwblaikie .

Okay, is there a reason you haven't just updated that PR rather than create an entirely new one?

ampandey-1995 commented 8 months ago

Hi @ampandey-1995, how does this relate to #71032?

This patch is similar to the functionality achieved in 71032 and somewhat refined based on comments of @jh7370 & @dwblaikie .

Okay, is there a reason you haven't just updated that PR rather than create an entirely new one?

Apologies, I will close the old PR.

gbreynoo commented 8 months ago

Hi @ampandey-1995, I'm glad to see this come up again.

There was some discussion in the previous PR that we didn't get to the bottom of, so I'll state it here: @jh7370 mentioned the potential different methods of "Use of the last address before the current one with a non-zero line value" vs "The last line table entry before the one for the specified address, with a non-zero line value". See https://github.com/llvm/llvm-project/pull/71032#issuecomment-1801303252.

I think the llvm-symbolizer command guide should clearly specify the method used to derive the line number estimation. Maybe the help output as well.

There is also the question of would it be useful to have both methods available to the user, as outlined by @jh7370 in https://github.com/llvm/llvm-project/pull/71032#issuecomment-1798014589. I'm torn as although I see the value in giving the user the option, in most cases just outputting the previous entry in the line table would probably be good enough for an estimate and save making this too complicated.

I also think it should be clearer in the output of llvm-symbolizer when it is an approximate output value. You can input multiple addresses into llvm-symbolizer in one invocation so it would be useful to see which outputs required an estimate vs which are accurate.

ampandey-1995 commented 8 months ago

Hi @ampandey-1995, I'm glad to see this come up again.

Thanks @gbreynoo for reviewing the patch.

There was some discussion in the previous PR that we didn't get to the bottom of, so I'll state it here: @jh7370 mentioned the potential different methods of "Use of the last address before the current one with a non-zero line value" vs "The last line table entry before the one for the specified address, with a non-zero line value". See #71032 (comment).

Thanks again for pointing the comments of @jh7370. I think this patch is somewhat related to querying of line table entries(Second method) for extracting significant line information.

Previously , the patch 71032 was based on the approach of <incrementing/decrementing> address from the address having no line information but that approach dosen't fit well since querying llvm-symbolizer for every address used a lot of symbolizer API's calls invoking DWARF API's to extract line information. Also, if no debug information is present in the object then llvm-symbolizer will keep calling into DWARF API's as we don't get any non-zero line information which sometimes hangs the llvm-symbolizer tool itself.

The current patch tries to query the address(having no line information) by introspecting the line table having bounds [lowPC,highPC]. The search happens usually from [lowPC,SearchPC] if "before" is mentioned or from [SearchPC,highPC] if "after" is mentioned as value of option --approximate-line-info.

I think the llvm-symbolizer command guide should clearly specify the method used to derive the line number estimation. Maybe the help output as well.

Ok I will update the command guide & help output.

There is also the question of would it be useful to have both methods available to the user, as outlined by @jh7370 in #71032 (comment). I'm torn as although I see the value in giving the user the option, in most cases just outputting the previous entry in the line table would probably be good enough for an estimate and save making this too complicated.

I agree with you about the approach of querying line table as a good estimation method and also in terms of performance.

I also think it should be clearer in the output of llvm-symbolizer when it is an approximate output value. You can input multiple addresses into llvm-symbolizer in one invocation so it would be useful to see which outputs required an estimate vs which are accurate.

Yeah, Thanks will do that. Is it ok to attach a tag such as (approximate) similar to (inlined by) at the end of Line:Column information?

gbreynoo commented 8 months ago

I also think it should be clearer in the output of llvm-symbolizer when it is an approximate output value. You can input multiple addresses into llvm-symbolizer in one invocation so it would be useful to see which outputs required an estimate vs which are accurate.

Yeah, Thanks will do that. Is it ok to attach a tag such as (approximate) similar to (inlined by) at the end of Line:Column information?

That sounds good to me. With there being no equivalent functionality in addr2line to follow, I think you are right to follow the inline output behavior.

jh7370 commented 8 months ago

Before really looking at this PR, I'd like to hear from @dwblaikie, as I know he had some objections before.

ampandey-1995 commented 8 months ago

Before really looking at this PR, I'd like to hear from @dwblaikie, as I know he had some objections before.

Hi @dwblaikie any concerns regarding this patch?

ampandey-1995 commented 8 months ago

ping @dwblaikie, @jh7370, @gbreynoo.

dwblaikie commented 7 months ago

'tis on my list, sorry I haven't got to it yet.

github-actions[bot] commented 7 months ago

:white_check_mark: With the latest revision this PR passed the C/C++ code formatter.

ampandey-1995 commented 7 months ago

Is it suitable to give the user the choice between forwards and backwards propagation? How would someone decide which is right for them?

From my perspective, I believe it is good to provide iterative forward/backwards search for approximate line information within function boundaries. Most users would rely on backwards estimate but it is also not wrong/doable to provide forward estimate.

That this can propagate past basic block, or even function boundaries seems pretty problematic - since the answer could then change depending on code layout, or could give someone a totally wrong function (ask me about what GCC did with LLVM's use of is_stmt many years ago - that was fun, breaking on a function would cause GCC to break on both the function you asked about, and the end of the previous function in the code... due to a bug like this one - it started at the instruction where the function started, then went searching to figure out which statement covered that instruction - which was back in the function that preceeded the one you were trying to break on... good times)

Thanks, the current revision considers the function boundaries also to be taken into account for the forward/backward line information estimation.

ampandey-1995 commented 7 months ago

ping

dwblaikie commented 7 months ago

Is it suitable to give the user the choice between forwards and backwards propagation? How would someone decide which is right for them?

From my perspective, I believe it is good to provide iterative forward/backwards search for approximate line information within function boundaries. Most users would rely on backwards estimate but it is also not wrong/doable to provide forward estimate.

I don't think this is something we should expose unless we've got a pretty good reason to motivate how a user would make this choice.

A further thought: Would it be reasonable for this option to change how the line table is parsed from its input? It could ignore the line zero entries in the line table entirely? That would support "backwards" it maybe a simpler way? (I could be open to arguments that that's philosophically problematic - modifying the line table itself, rather than only how it's queried - @jh7370 ?)

This does cause the propagation to cross basic block boundaries, though - LLVM does have a mechanism to put line 0s just at the start of basic blocks to ensure that back propagation doesn't produce arbitrary results based on basic block ordering...

That this can propagate past basic block, or even function boundaries seems pretty problematic - since the answer could then change depending on code layout, or could give someone a totally wrong function (ask me about what GCC did with LLVM's use of is_stmt many years ago - that was fun, breaking on a function would cause GCC to break on both the function you asked about, and the end of the previous function in the code... due to a bug like this one - it started at the instruction where the function started, then went searching to figure out which statement covered that instruction - which was back in the function that preceeded the one you were trying to break on... good times)

Thanks, the current revision considers the function boundaries also to be taken into account for the forward/backward line information estimation.

Right - but basic block boundaries are a problem too. It's unfortunate/unpredictable/pretty problematic to cross a basic block boundary to retrieve a line table - because the answer's arbitrary - based on the compiler's basic block ordering.

jh7370 commented 7 months ago

A further thought: Would it be reasonable for this option to change how the line table is parsed from its input? It could ignore the line zero entries in the line table entirely? That would support "backwards" it maybe a simpler way? (I could be open to arguments that that's philosophically problematic - modifying the line table itself, rather than only how it's queried - @jh7370 ?)

From llvm-symbolizer's point of view, it could make sense to skip line 0 entries, since those entries don't provide any symbolization information. However, in more general terms, we can't change the line table parser to ignore them: consider the simple case where somebody wants to dump the raw line table, for example.

ampandey-1995 commented 6 months ago

ping

dwblaikie commented 6 months ago

A further thought: Would it be reasonable for this option to change how the line table is parsed from its input? It could ignore the line zero entries in the line table entirely? That would support "backwards" it maybe a simpler way? (I could be open to arguments that that's philosophically problematic - modifying the line table itself, rather than only how it's queried - @jh7370 ?)

From llvm-symbolizer's point of view, it could make sense to skip line 0 entries, since those entries don't provide any symbolization information.

I disagree here - I thin kit does provide value to not claim a given instruction comes from a line we don't know it came from, by default at least/without some user opt-in and/or without telling the user that's what we're doing in some particular output.

However, in more general terms, we can't change the line table parser to ignore them: consider the simple case where somebody wants to dump the raw line table, for example.

nod I didn't mean to change it always, but to change it opt-in via a flag (like the one being discussed in this patch).

And I think that'd provide the simplest implementation, and easier to explain - though I still think it's pretty error prone/likely to produce confusion for users (which is why I have misgivings about implementing any support for this, to be honest).

jh7370 commented 6 months ago

A further thought: Would it be reasonable for this option to change how the line table is parsed from its input? It could ignore the line zero entries in the line table entirely? That would support "backwards" it maybe a simpler way? (I could be open to arguments that that's philosophically problematic - modifying the line table itself, rather than only how it's queried - @jh7370 ?)

From llvm-symbolizer's point of view, it could make sense to skip line 0 entries, since those entries don't provide any symbolization information.

I disagree here - I thin kit does provide value to not claim a given instruction comes from a line we don't know it came from, by default at least/without some user opt-in and/or without telling the user that's what we're doing in some particular output.

However, in more general terms, we can't change the line table parser to ignore them: consider the simple case where somebody wants to dump the raw line table, for example.

nod I didn't mean to change it always, but to change it opt-in via a flag (like the one being discussed in this patch).

And I think that'd provide the simplest implementation, and easier to explain - though I still think it's pretty error prone/likely to produce confusion for users (which is why I have misgivings about implementing any support for this, to be honest).

Yeah, I see your point, and that does sound a bit more natural. I've also reached out to our internal binutils team to see if they have any particular preferences as to any approach taken.

bd1976bris commented 6 months ago

Hi All!

From Sony's POV the primary issue that we would like to solve is reporting effective line numbers when symbolizing stack traces. Currently, we have customers manually decrementing addresses when 0 is reported as the line number to try to get a useful line number reported. Customers report that they are generally happy with understanding their stack traces when they do this manually. Therefore, we would like to see a feature to improve on the customer doing this manually or scripting it up themselves.

This PR meets our criteria and we would be happy to see it (or an equivalent implementation) merged:

  1. Utility - this PR reports satisfactory line numbers for problem stack walks we have had reported.

  2. Potential for confusion - this PR is opt-in (via '-approximate-line-info=<before/after>') and clearly labels the line numbers produced as guesses by appending the string "(approximate)" to the output, this seems adequate to address the problem of potentially confusing users.

  3. Provide an advantage over a simple script - this PR tries to limit the address range tried using the high/low PC and the presence of basic_block/prologue_end instructions. This is better than just trying decremented addresses until a non-zero line is found. This would also be reasonably difficult to implement in a script which parses the output from the binutils we provide, so including the functionality in symbolizer makes sense.

As to the best implementation I don't have a firm opinion yet. I will try to poke at this and provide some feedback. One open question I would like to understand is: How misleading/confusing could the approximate line numbers be? An answer to that would be useful in evaluating different implementations. In Sony's case we are really interested in the answer to the previous question when we restrict it to the case of symbolizing stack traces. I also wonder if function inlining can pose/exacerbate any correctness issues?

dwblaikie commented 6 months ago

OK then - I guess if we're doing this, I think it should be previous only ("as if line zero hadn't been emitted at all") - and the implementation should be tidied up/simplified a bit.

We shouldn't keep probing by the previous single address - the row returned by findRowInSeq should be decremented - looking at the previous row, until we either find a non-zero row, or reach Seq.FirstRowIndex - in the latter case, we should produce zero anyway, because there's no other address to produce reasonably (walking past the beginning of a sequence would give an arbitrary location in another .text section)?

dwblaikie commented 6 months ago

Does that work well enough for everyone?

pogo59 commented 6 months ago

What @dwblaikie said.

bd1976bris commented 6 months ago

Does that work well enough for everyone?

@dwblaikie - Yes. Thanks for taking our use-case into consideration!

bd1976bris commented 5 months ago

I had a look at how this interacts with other options and I everything looks reasonable as far as I can see. For --print-source-context-lines I tried printing a line zero address with --skip-line-zero and the output looks good:

>llvm-symbolizer.exe 0x0000000000000156 --obj=main.elf --skip-line-zero --print-source-context-lines=3
f3
main.c:20:0 (approximate)
19  : {
20 >:   gvar3++;
21  :   if (gvar3 > 1 && b < a)

I noticed that for the --print-source-context-lines for line zero addresses without --skip-line-zero you currently get very confusing output where the lines around line zero are displayed:

>llvm-symbolizer.exe 0x0000000000000156 --obj=main.elf --print-source-context-lines=3
f3
main.c:0:0
1  : static int gvar1 = 0;
2  : static int gvar2 = 0;
3  : static int gvar3 = 0;

I think that ideally it would be better not to display the context lines for line zero addresses. It might also be nice to emit a warning recommending trying a different address or using --skip-line-zero.

ampandey-1995 commented 5 months ago

I noticed that for the --print-source-context-lines for line zero addresses without --skip-line-zero you currently get very confusing output where the lines around line zero are displayed:

I think that is the default behaviour --print-source-context-lines will display the source lines around the queried address even when that address has no line number.

>llvm-symbolizer.exe 0x0000000000000156 --obj=main.elf --print-source-context-lines=3
f3
main.c:0:0
1  : static int gvar1 = 0;
2  : static int gvar2 = 0;
3  : static int gvar3 = 0;

I think that ideally it would be better not to display the context lines for line zero addresses. It might also be nice to emit a warning recommending trying a different address or using --skip-line-zero.

This would require making a change in the default functionality of --print-source-context-lines. Ideally, with/without --skip-line-zero --print-source-context-lines should output source context lines by default for any queried address(even it has line zero.).

bd1976bris commented 5 months ago

I noticed that for the --print-source-context-lines for line zero addresses without --skip-line-zero you currently get very confusing output where the lines around line zero are displayed:

I think that is the default behaviour --print-source-context-lines will display the source lines around the queried address even when that address has no line number.

>llvm-symbolizer.exe 0x0000000000000156 --obj=main.elf --print-source-context-lines=3
f3
main.c:0:0
1  : static int gvar1 = 0;
2  : static int gvar2 = 0;
3  : static int gvar3 = 0;

I think that ideally it would be better not to display the context lines for line zero addresses. It might also be nice to emit a warning recommending trying a different address or using --skip-line-zero.

This would require making a change in the default functionality of --print-source-context-lines. Ideally, with/without --skip-line-zero --print-source-context-lines should output source context lines by default for any queried address(even it has line zero.).

Right I think that this is an enhancement that is separate to this PR. I have filed: https://github.com/llvm/llvm-project/issues/92403.

pogo59 commented 5 months ago

My reading of DWARF suggests that this is legal as there's nothing to disallow two contiguous sequences AFAICS.

The DWARF standard doesn't say anything about how two sequences relate to each other. While the intent of a sequence is to cover all contiguous instructions, nothing in the spec mandates this.

I suspect that llvm-mc is unlikely to generate multiple sequences for the same section, and for different sections it will of course use zero as the starting offset. An explicit line table in the assembly source, such as @bd1976bris proposed, will let you have multiple sequences in the same section, and therefore a non-zero starting point, in a single relocatable object. This means the linker is not required.

ampandey-1995 commented 5 months ago

I suspect that llvm-mc is unlikely to generate multiple sequences for the same section, and for different sections it will of course use zero as the starting offset. An explicit line table in the assembly source, such as @bd1976bris proposed, will let you have multiple sequences in the same section, and therefore a non-zero starting point, in a single relocatable object. This means the linker is not required.

Yes , the current revision now dosen't invoke the linker.

ampandey-AMD commented 5 months ago

ping

ampandey-1995 commented 4 months ago

ping

bd1976bris commented 4 months ago

@ampandey-1995 this might well be my failing but I'm afraid that I'm struggling to understand the tests. Before going on though I would like to say that I appreciate the effort you have put into them :) However, I don't understand why we would want to invoke the compiler and the linker (which is a complicated process to understand and results in input full of unnecessary features e.g. lots of crt symbols etc) rather than just hand coding the input? Furthermore, whilst I appreciate your use of the yaml2obj tool was suggested here, I don't think that the yaml input is very readable as it has large blocks of opaque hex. I also don't understand why we need two tests - can't we use one test for all the cases we need? I think that the names of the tests need to clearly describe to what they are testing (not how they are implemented e.g. "handcrafted") and the tests need comments describing what the test features and cases are.

I will try to post up an improved test if I can.

bd1976bris commented 4 months ago

I will try to post up an improved test if I can.

@ampandey-1995 below is my attempt. I think that it contains the testcases from both the of current tests and IMO it is more readable/understandable.

$ cat skip-line-zero.s
## Test the --skip-line-zero option.
##
## This test uses hand written assembly to produce the following line table:
## Address            Line   Column File   ISA Discriminator Flags
## ------------------ ------ ------ ------ --- ------------- -------------
## 0x0000000000000000     10      0      1   0             0
## 0x0000000000000001      0      0      1   0             0
## 0x0000000000000002      0      0      1   0             0
## 0x0000000000000006      8      0      1   0             0  end_sequence
## 0x0000000000000006      0      0      1   0             0
## 0x0000000000000007      0      0      1   0             0
## 0x0000000000000016      5      0      1   0             0  end_sequence
##
## The first sequence is for symbol foo and the second is for symbol bar.

# REQUIRES: x86-registered-target

# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t.o

## Check that without --skip-line-zero line zero is displayed for a line with no source correspondence.
# RUN: llvm-symbolizer --obj=%t.o 0x2 | \
# RUN:   FileCheck --check-prefix=SKIP-DISABLED %s
# SKIP-DISABLED:foo
# SKIP-DISABLED-NEXT:two.c:0:0

## Check that with --skip-line-zero the last line non-zero line in the current sequence is displayed.
# RUN: llvm-symbolizer --obj=%t.o 0x2 --skip-line-zero | \
# RUN:   FileCheck --check-prefix=SKIP-ENABLED %s
# SKIP-ENABLED:foo
# SKIP-ENABLED-NEXT:two.c:10:0 (approximate)

## Check that that --skip-line-zero only affects line zero addresses when more than one address is specified.
# RUN: llvm-symbolizer --obj=%t.o --skip-line-zero 0x2 0x1 | \
# RUN:   FileCheck --check-prefixes=SKIP-ENABLED,NO-SKIP %s
# NO-SKIP:foo
# NO-SKIP-NEXT:two.c:10:0

## Check --verbose output is correct with --skip-line-zero.
# RUN: llvm-symbolizer --obj=%t.o --skip-line-zero --verbose 0x2 | \
# RUN:   FileCheck --check-prefix=SKIP-VERBOSE %s
# SKIP-VERBOSE:foo
# SKIP-VERBOSE-NEXT:  Filename: {{.*}}two.c
# SKIP-VERBOSE-NEXT:  Function start address: 0x0
# SKIP-VERBOSE-NEXT:  Line: 10
# SKIP-VERBOSE-NEXT:  Column: 0
# SKIP-VERBOSE-NEXT:  Approximate: true

## Check --output-style=JSON output is correct with --skip-line-zero.
# RUN: llvm-symbolizer --obj=%t.o --skip-line-zero --output-style=JSON 0x2 | \
# RUN:   FileCheck --check-prefix=SKIP-JSON %s
# SKIP-JSON:[{"Address":"0x2","ModuleName":"{{.*}}skip-line-zero.s.tmp.o","Symbol":[{"Approximate":true,"Column":0,"Discriminator":0,"FileName":"{{.*}}two.c","FunctionName":"foo","Line":10,"StartAddress":"0x0","StartFileName":"","StartLine":0}]}]

## Check that that --skip-line-zero does not cross sequence boundaries.
# RUN: llvm-symbolizer --obj=%t.o --skip-line-zero 0x7 | \
# RUN:   FileCheck --check-prefixes=SKIP-BOUNDARY %s
# SKIP-BOUNDARY:bar
# SKIP-BOUNDARY:two.c:0:0

    .globl  foo
foo:
    .space 6
.Lfoo_end:

    .globl  bar
bar:
    .space 16
.Lbar_end:

    .section    .debug_abbrev,"",@progbits
    .byte   1                               # Abbreviation Code
    .byte   17                              # DW_TAG_compile_unit
    .byte   0                               # DW_CHILDREN_no
    .byte   16                              # DW_AT_stmt_list
    .byte   23                              # DW_FORM_sec_offset
    .byte   17                              # DW_AT_low_pc
    .byte   1                               # DW_FORM_addr
    .byte   18                              # DW_AT_high_pc
    .byte   6                               # DW_FORM_data4
    .byte   0                               # EOM(1)
    .byte   0                               # EOM(2)
    .byte   0                               # EOM(3)
    .section    .debug_info,"",@progbits
.Lcu_begin0:
    .long   .Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit
.Ldebug_info_start0:
    .short  4                               # DWARF version number
    .long   .debug_abbrev                   # Offset Into Abbrev. Section
    .byte   8                               # Address Size (in bytes)
    .byte   1                               # Abbrev [1] 0xb:0x1f DW_TAG_compile_unit
    .long   .Lline_table_start0             # DW_AT_stmt_list
    .quad   0                               # DW_AT_low_pc
        .long   .Lbar_end-foo                   # DW_AT_high_pc
.Ldebug_info_end0:
    .section    .debug_line,"",@progbits
.Lline_table_start0:
        .long .Lunit_end - .Lunit_start # unit length
.Lunit_start:
        .short 4   # version
        .long .Lprologue_end - .Lprologue_start # header length
.Lprologue_start:
    .byte 1                                # minimum_instruction_length
    .byte 1                                # maximum_operations_per_instruction
    .byte 0                                # default_is_stmt
    .byte -5                               # line_base
    .byte 14                               # line_range
    .byte 13                               # opcode_base
    .byte 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1 # arguments in standard opcodes
    .asciz "dir0"                          # include directory
    .byte 0                                # end of include directories
    .asciz "two.c"                         # filename
    .byte 0                                # reference to dir0
    .byte 0                                # modification time
    .byte 0                                # length of file (unavailable)
    .byte 0                                # end of filenames
.Lprologue_end:
        .byte 0, 9, 2        # DW_LNE_set_address
        .quad 0x0            #  foo (to 0)
        .byte 3              # DW_LNS_advance_line
        .sleb128 9           #  by 9 (to 10)
        .byte 1              # DW_LNS_copy
        .byte 3              # DW_LNS_advance_line
        .sleb128 -10         #  by -10 (to 0)
        .byte 2              # DW_LNS_advance_pc
        .byte 1              #  += (1 * min instruction length) (to 1)
        .byte 1              # DW_LNS_copy
        .byte 2              # DW_LNS_advance_pc
        .byte 1              #  += (1 * min instruction length) (to 2)
        .byte 1              # DW_LNS_copy
        .byte 3              # DW_LNS_advance_line
        .sleb128 8           #  by 8 (to 8)
        .byte 2              # DW_LNS_advance_pc
        .byte 4              #  += (4 * min instruction length) (to 6)
        .byte 0, 1, 1        # DW_LNE_end_sequence
        .byte 0, 9, 2        # DW_LNE_set_address
        .quad bar            #  bar (to 6)
        .byte 3              # DW_LNS_advance_line
        .sleb128 -1          #  by -1 (to 0)
        .byte 1              # DW_LNS_copy
        .byte 2              # DW_LNS_advance_pc
        .byte 1              #  += (1 * min instruction length) (to 7)
        .byte 1              # DW_LNS_copy
        .byte 3              # DW_LNS_advance_line
        .sleb128 5           #  by 5 (to 5)
        .byte 2              # DW_LNS_advance_pc
        .byte 15             #  += (15 * min instruction length) (to 22)
        .byte 0, 1, 1        # DW_LNE_end_sequence
.Lunit_end:
ampandey-1995 commented 4 months ago

I will try to post up an improved test if I can.

@ampandey-1995 below is my attempt. I think that it contains the testcases from both the of current tests and IMO it is more readable/understandable.

$ cat skip-line-zero.s
## Test the --skip-line-zero option.
##
## This test uses hand written assembly to produce the following line table:
## Address            Line   Column File   ISA Discriminator Flags
## ------------------ ------ ------ ------ --- ------------- -------------
## 0x0000000000000000     10      0      1   0             0
## 0x0000000000000001      9      0      1   0             0
## 0x0000000000000002      0      0      1   0             0
## 0x0000000000000003      8      0      1   0             0  end_sequence
## 0x0000000000000006      0      0      1   0             0
## 0x0000000000000007      0      0      1   0             0
## 0x0000000000000016      5      0      1   0             0  end_sequence
##
## The first sequence is for symbol foo and the second is for symbol bar.

# REQUIRES: x86-registered-target

# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t.o

## Check that without --skip-line-zero line zero is displayed for a line with no source correspondence.
# RUN: llvm-symbolizer --obj=%t.o 0x2 | \
# RUN:   FileCheck --check-prefix=SKIP-DISABLED %s
# SKIP-DISABLED:foo
# SKIP-DISABLED-NEXT:two.c:0:0

## Check that with --skip-line-zero the last line non-zero line in the current sequence is displayed.
# RUN: llvm-symbolizer --obj=%t.o 0x2 --skip-line-zero | \
# RUN:   FileCheck --check-prefix=SKIP-ENABLED %s
# SKIP-ENABLED:foo
# SKIP-ENABLED-NEXT:two.c:9:0 (approximate)

## Check that that --skip-line-zero only affects line zero addresses when more than one address is specified.
# RUN: llvm-symbolizer --obj=%t.o --skip-line-zero 0x2 0x1 | \
# RUN:   FileCheck --check-prefixes=SKIP-ENABLED,NO-SKIP %s
# NO-SKIP:foo
# NO-SKIP-NEXT:two.c:9:0

## Check --verbose output is correct with --skip-line-zero.
# RUN: llvm-symbolizer --obj=%t.o --skip-line-zero --verbose 0x2 | \
# RUN:   FileCheck --check-prefix=SKIP-VERBOSE %s
# SKIP-VERBOSE:foo
# SKIP-VERBOSE-NEXT:  Filename: {{.*}}two.c
# SKIP-VERBOSE-NEXT:  Function start address: 0x0
# SKIP-VERBOSE-NEXT:  Line: 9
# SKIP-VERBOSE-NEXT:  Column: 0
# SKIP-VERBOSE-NEXT:  Approximate: true

## Check --output-style=JSON output is correct with --skip-line-zero.
# RUN: llvm-symbolizer --obj=%t.o --skip-line-zero --output-style=JSON 0x2 | \
# RUN:   FileCheck --check-prefix=SKIP-JSON %s
# SKIP-JSON:[{"Address":"0x2","ModuleName":"{{.*}}skip-line-zero.s.tmp.o","Symbol":[{"Approximate":true,"Column":0,"Discriminator":0,"FileName":"{{.*}}two.c","FunctionName":"foo","Line":9,"StartAddress":"0x0","StartFileName":"","StartLine":0}]}]

## Check that that --skip-line-zero does not cross sequence boundaries.
# RUN: llvm-symbolizer --obj=%t.o --skip-line-zero 0x7 | \
# RUN:   FileCheck --check-prefixes=SKIP-BOUNDARY %s
# SKIP-BOUNDARY:bar
# SKIP-BOUNDARY:two.c:0:0

  .section    .text.foo,"ax",@progbits
  .globl  foo
foo:
.Lfunc_begin0:
  movl    $10, %eax
  retq
.Lfunc_end0:
  .size   foo, .Lfunc_end0-foo

  .globl  bar
bar:
.Lfunc_begin1:
# %bb.0:
  pushq   %rbp
  movq    %rsp, %rbp
  callq   foo
  movl    $20, %eax
  popq    %rbp
  retq
.Lfunc_end1:
  .size   bar, .Lfunc_end1-bar

  .section    .debug_abbrev,"",@progbits
  .byte   1                               # Abbreviation Code
  .byte   17                              # DW_TAG_compile_unit
  .byte   0                               # DW_CHILDREN_no
  .byte   37                              # DW_AT_producer
  .byte   14                              # DW_FORM_strp
  .byte   19                              # DW_AT_language
  .byte   5                               # DW_FORM_data2
  .byte   3                               # DW_AT_name
  .byte   14                              # DW_FORM_strp
  .byte   16                              # DW_AT_stmt_list
  .byte   23                              # DW_FORM_sec_offset
  .byte   27                              # DW_AT_comp_dir
  .byte   14                              # DW_FORM_strp
  .byte   83                              # DW_AT_use_UTF8
  .byte   25                              # DW_FORM_flag_present
  .byte   17                              # DW_AT_low_pc
  .byte   1                               # DW_FORM_addr
  .byte   85                              # DW_AT_ranges
  .byte   23                              # DW_FORM_sec_offset
  .byte   0                               # EOM(1)
  .byte   0                               # EOM(2)
  .byte   0                               # EOM(3)
  .section    .debug_info,"",@progbits
.Lcu_begin0:
  .long   .Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit
.Ldebug_info_start0:
  .short  4                               # DWARF version number
  .long   .debug_abbrev                   # Offset Into Abbrev. Section
  .byte   8                               # Address Size (in bytes)
  .byte   1                               # Abbrev [1] 0xb:0x1f DW_TAG_compile_unit
  .long   .Linfo_string0                  # DW_AT_producer
  .short  29                              # DW_AT_language
  .long   .Linfo_string1                  # DW_AT_name
  .long   .Lline_table_start0             # DW_AT_stmt_list
  .long   .Linfo_string2                  # DW_AT_comp_dir
                                        # DW_AT_use_UTF8
  .quad   0                               # DW_AT_low_pc
  .long   .Ldebug_ranges0                 # DW_AT_ranges
.Ldebug_info_end0:
  .section    .debug_aranges,"",@progbits
  .section    .debug_ranges,"",@progbits
.Ldebug_ranges0:
  .quad   .Lfunc_begin0
  .quad   .Lfunc_end0
  .quad   .Lfunc_begin1
  .quad   .Lfunc_end1
  .quad   0
  .quad   0
  .section    .debug_str,"MS",@progbits,1
.Linfo_string0:
  .asciz  "clang version 16.0.5 ---------------------------------------" # string offset=0
.Linfo_string1:
  .asciz  "two.c"                         # string offset=61
.Linfo_string2:
  .asciz  "c:\\Temp\\dwarfline"           # string offset=67
  .ident  "clang version 16.0.5 ---------------------------------------"
  .section    ".note.GNU-stack","",@progbits
.function_and_data_sections
  .section    .debug_line,"",@progbits
.Lline_table_start0:
        .long .Lunit_end - .Lunit_start # unit length
.Lunit_start:
        .short 4   # version
        .long .Lprologue_end - .Lprologue_start # header length
.Lprologue_start:
  .byte 1                                # minimum_instruction_length
  .byte 1                                # maximum_operations_per_instruction
  .byte 0                                # default_is_stmt
  .byte -5                               # line_base
  .byte 14                               # line_range
  .byte 13                               # opcode_base
  .byte 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1 # arguments in standard opcodes
  .asciz "dir0"                          # include directory
  .asciz "dir1"                          # include directory
  .byte 0                                # end of include directories
  .asciz "two.c"                         # filename
  .byte 0                                # reference to dir0
  .byte 0                                # modification time
  .byte 0                                # length of file (unavailable)
  .byte 0                                # end of filenames
.Lprologue_end:
        .byte 0, 9, 2        # DW_LNE_set_address
        .quad 0x0            #  foo (to 0)
        .byte 3              # DW_LNS_advance_line
        .sleb128 9           #  by 9 (to 10)
        .byte 1              # DW_LNS_copy
        .byte 3              # DW_LNS_advance_line
        .sleb128 -1          #  by -1 (to 9)
        .byte 2              # DW_LNS_advance_pc
        .byte 1              #  += (1 * min instruction length) (to 1)
        .byte 1              # DW_LNS_copy
        .byte 3              # DW_LNS_advance_line
        .sleb128 -9          #  by -9 (to 0)
        .byte 2              # DW_LNS_advance_pc
        .byte 1              #  += (1 * min instruction length) (to 2)
        .byte 1              # DW_LNS_copy
        .byte 3              # DW_LNS_advance_line
        .sleb128 8           #  by 8 (to 8)
        .byte 2              # DW_LNS_advance_pc
        .byte 1              #  += (1 * min instruction length) (to 3)
        .byte 0, 1, 1        # DW_LNE_end_sequence
        .byte 0, 9, 2        # DW_LNE_set_address
        .quad .Lfunc_begin1 - .Lfunc_begin0 #  bar (to 6)
        .byte 3              # DW_LNS_advance_line
        .sleb128 -1          #  by -1 (to 0)
        .byte 1              # DW_LNS_copy
        .byte 2              # DW_LNS_advance_pc
        .byte 1              #  += (1 * min instruction length) (to 7)
        .byte 1              # DW_LNS_copy
        .byte 3              # DW_LNS_advance_line
        .sleb128 5           #  by 5 (to 5)
        .byte 2              # DW_LNS_advance_pc
        .byte 15             #  += (15 * min instruction length) (to 22)
        .byte 0, 1, 1        # DW_LNE_end_sequence
.Lunit_end:

This is failing at my side

I will try to post up an improved test if I can.

@ampandey-1995 below is my attempt. I think that it contains the testcases from both the of current tests and IMO it is more readable/understandable.

$ cat skip-line-zero.s
## Test the --skip-line-zero option.
##
## This test uses hand written assembly to produce the following line table:
## Address            Line   Column File   ISA Discriminator Flags
## ------------------ ------ ------ ------ --- ------------- -------------
## 0x0000000000000000     10      0      1   0             0
## 0x0000000000000001      9      0      1   0             0
## 0x0000000000000002      0      0      1   0             0
## 0x0000000000000003      8      0      1   0             0  end_sequence
## 0x0000000000000006      0      0      1   0             0
## 0x0000000000000007      0      0      1   0             0
## 0x0000000000000016      5      0      1   0             0  end_sequence
##
## The first sequence is for symbol foo and the second is for symbol bar.

# REQUIRES: x86-registered-target

# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t.o

## Check that without --skip-line-zero line zero is displayed for a line with no source correspondence.
# RUN: llvm-symbolizer --obj=%t.o 0x2 | \
# RUN:   FileCheck --check-prefix=SKIP-DISABLED %s
# SKIP-DISABLED:foo
# SKIP-DISABLED-NEXT:two.c:0:0

## Check that with --skip-line-zero the last line non-zero line in the current sequence is displayed.
# RUN: llvm-symbolizer --obj=%t.o 0x2 --skip-line-zero | \
# RUN:   FileCheck --check-prefix=SKIP-ENABLED %s
# SKIP-ENABLED:foo
# SKIP-ENABLED-NEXT:two.c:9:0 (approximate)

## Check that that --skip-line-zero only affects line zero addresses when more than one address is specified.
# RUN: llvm-symbolizer --obj=%t.o --skip-line-zero 0x2 0x1 | \
# RUN:   FileCheck --check-prefixes=SKIP-ENABLED,NO-SKIP %s
# NO-SKIP:foo
# NO-SKIP-NEXT:two.c:9:0

## Check --verbose output is correct with --skip-line-zero.
# RUN: llvm-symbolizer --obj=%t.o --skip-line-zero --verbose 0x2 | \
# RUN:   FileCheck --check-prefix=SKIP-VERBOSE %s
# SKIP-VERBOSE:foo
# SKIP-VERBOSE-NEXT:  Filename: {{.*}}two.c
# SKIP-VERBOSE-NEXT:  Function start address: 0x0
# SKIP-VERBOSE-NEXT:  Line: 9
# SKIP-VERBOSE-NEXT:  Column: 0
# SKIP-VERBOSE-NEXT:  Approximate: true

## Check --output-style=JSON output is correct with --skip-line-zero.
# RUN: llvm-symbolizer --obj=%t.o --skip-line-zero --output-style=JSON 0x2 | \
# RUN:   FileCheck --check-prefix=SKIP-JSON %s
# SKIP-JSON:[{"Address":"0x2","ModuleName":"{{.*}}skip-line-zero.s.tmp.o","Symbol":[{"Approximate":true,"Column":0,"Discriminator":0,"FileName":"{{.*}}two.c","FunctionName":"foo","Line":9,"StartAddress":"0x0","StartFileName":"","StartLine":0}]}]

## Check that that --skip-line-zero does not cross sequence boundaries.
# RUN: llvm-symbolizer --obj=%t.o --skip-line-zero 0x7 | \
# RUN:   FileCheck --check-prefixes=SKIP-BOUNDARY %s
# SKIP-BOUNDARY:bar
# SKIP-BOUNDARY:two.c:0:0

  .section    .text.foo,"ax",@progbits
  .globl  foo
foo:
.Lfunc_begin0:
  movl    $10, %eax
  retq
.Lfunc_end0:
  .size   foo, .Lfunc_end0-foo

  .globl  bar
bar:
.Lfunc_begin1:
# %bb.0:
  pushq   %rbp
  movq    %rsp, %rbp
  callq   foo
  movl    $20, %eax
  popq    %rbp
  retq
.Lfunc_end1:
  .size   bar, .Lfunc_end1-bar

  .section    .debug_abbrev,"",@progbits
  .byte   1                               # Abbreviation Code
  .byte   17                              # DW_TAG_compile_unit
  .byte   0                               # DW_CHILDREN_no
  .byte   37                              # DW_AT_producer
  .byte   14                              # DW_FORM_strp
  .byte   19                              # DW_AT_language
  .byte   5                               # DW_FORM_data2
  .byte   3                               # DW_AT_name
  .byte   14                              # DW_FORM_strp
  .byte   16                              # DW_AT_stmt_list
  .byte   23                              # DW_FORM_sec_offset
  .byte   27                              # DW_AT_comp_dir
  .byte   14                              # DW_FORM_strp
  .byte   83                              # DW_AT_use_UTF8
  .byte   25                              # DW_FORM_flag_present
  .byte   17                              # DW_AT_low_pc
  .byte   1                               # DW_FORM_addr
  .byte   85                              # DW_AT_ranges
  .byte   23                              # DW_FORM_sec_offset
  .byte   0                               # EOM(1)
  .byte   0                               # EOM(2)
  .byte   0                               # EOM(3)
  .section    .debug_info,"",@progbits
.Lcu_begin0:
  .long   .Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit
.Ldebug_info_start0:
  .short  4                               # DWARF version number
  .long   .debug_abbrev                   # Offset Into Abbrev. Section
  .byte   8                               # Address Size (in bytes)
  .byte   1                               # Abbrev [1] 0xb:0x1f DW_TAG_compile_unit
  .long   .Linfo_string0                  # DW_AT_producer
  .short  29                              # DW_AT_language
  .long   .Linfo_string1                  # DW_AT_name
  .long   .Lline_table_start0             # DW_AT_stmt_list
  .long   .Linfo_string2                  # DW_AT_comp_dir
                                        # DW_AT_use_UTF8
  .quad   0                               # DW_AT_low_pc
  .long   .Ldebug_ranges0                 # DW_AT_ranges
.Ldebug_info_end0:
  .section    .debug_aranges,"",@progbits
  .section    .debug_ranges,"",@progbits
.Ldebug_ranges0:
  .quad   .Lfunc_begin0
  .quad   .Lfunc_end0
  .quad   .Lfunc_begin1
  .quad   .Lfunc_end1
  .quad   0
  .quad   0
  .section    .debug_str,"MS",@progbits,1
.Linfo_string0:
  .asciz  "clang version 16.0.5 ---------------------------------------" # string offset=0
.Linfo_string1:
  .asciz  "two.c"                         # string offset=61
.Linfo_string2:
  .asciz  "c:\\Temp\\dwarfline"           # string offset=67
  .ident  "clang version 16.0.5 ---------------------------------------"
  .section    ".note.GNU-stack","",@progbits
.function_and_data_sections
  .section    .debug_line,"",@progbits
.Lline_table_start0:
        .long .Lunit_end - .Lunit_start # unit length
.Lunit_start:
        .short 4   # version
        .long .Lprologue_end - .Lprologue_start # header length
.Lprologue_start:
  .byte 1                                # minimum_instruction_length
  .byte 1                                # maximum_operations_per_instruction
  .byte 0                                # default_is_stmt
  .byte -5                               # line_base
  .byte 14                               # line_range
  .byte 13                               # opcode_base
  .byte 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1 # arguments in standard opcodes
  .asciz "dir0"                          # include directory
  .asciz "dir1"                          # include directory
  .byte 0                                # end of include directories
  .asciz "two.c"                         # filename
  .byte 0                                # reference to dir0
  .byte 0                                # modification time
  .byte 0                                # length of file (unavailable)
  .byte 0                                # end of filenames
.Lprologue_end:
        .byte 0, 9, 2        # DW_LNE_set_address
        .quad 0x0            #  foo (to 0)
        .byte 3              # DW_LNS_advance_line
        .sleb128 9           #  by 9 (to 10)
        .byte 1              # DW_LNS_copy
        .byte 3              # DW_LNS_advance_line
        .sleb128 -1          #  by -1 (to 9)
        .byte 2              # DW_LNS_advance_pc
        .byte 1              #  += (1 * min instruction length) (to 1)
        .byte 1              # DW_LNS_copy
        .byte 3              # DW_LNS_advance_line
        .sleb128 -9          #  by -9 (to 0)
        .byte 2              # DW_LNS_advance_pc
        .byte 1              #  += (1 * min instruction length) (to 2)
        .byte 1              # DW_LNS_copy
        .byte 3              # DW_LNS_advance_line
        .sleb128 8           #  by 8 (to 8)
        .byte 2              # DW_LNS_advance_pc
        .byte 1              #  += (1 * min instruction length) (to 3)
        .byte 0, 1, 1        # DW_LNE_end_sequence
        .byte 0, 9, 2        # DW_LNE_set_address
        .quad .Lfunc_begin1 - .Lfunc_begin0 #  bar (to 6)
        .byte 3              # DW_LNS_advance_line
        .sleb128 -1          #  by -1 (to 0)
        .byte 1              # DW_LNS_copy
        .byte 2              # DW_LNS_advance_pc
        .byte 1              #  += (1 * min instruction length) (to 7)
        .byte 1              # DW_LNS_copy
        .byte 3              # DW_LNS_advance_line
        .sleb128 5           #  by 5 (to 5)
        .byte 2              # DW_LNS_advance_pc
        .byte 15             #  += (15 * min instruction length) (to 22)
        .byte 0, 1, 1        # DW_LNE_end_sequence
.Lunit_end:

Copied your test-case on my machine. It fails.

~$ llvm-mc -g -filetype=obj -triple=x86_64-unknown-linux skip-line-zero.s -o skip-line-zero.o
skip-line-zero.s:139:1: error: unknown directive
.function_and_data_sections
^
ampandey-1995 commented 4 months ago

@ampandey-1995 this might well be my failing but I'm afraid that I'm struggling to understand the tests. Before going on though I would like to say that I appreciate the effort you have put into them :)

Thanks for the input.

However, I don't understand why we would want to invoke the compiler and the linker (which is a complicated process to understand and results in input full of unnecessary features e.g. lots of crt symbols etc) rather than just hand coding the input?

Linker is not getting invoked in approximate-line-handcrafted.yaml. If you see @MaskRay comment https://github.com/llvm/llvm-project/pull/82240#discussion_r1602749928, Linker is not getting invoked in the lit test. However, it is required to generate or reproduce the yaml file only a once. Linker is not getting invoked in any RUN lines.

I also don't understand why we need two tests - can't we use one test for all the cases we need?

It's not mandatory, but from my perspective two separate tests are fine. approximate-line-generated.s(Single Sequence .text) is the generated assembly where no manual modification is happening. The test is basically to reflect on the understanding/showcase that clang produces line-zero for addresses. approximate-line-handcrafted.yaml(Two Sequences .text && .def_section) is the generated yaml from manually modified assembly where individual .loc directives of debug info are modified to zero so as to stress test the sequence boundaries and also to show approximated output for address which are in different functions(in different files). Check @dwblaikie comment https://github.com/llvm/llvm-project/pull/82240#discussion_r1595660179.

I think that the names of the tests need to clearly describe to what they are testing (not how they are implemented e.g. "handcrafted") and the tests need comments describing what the test features and cases are.

Ok, will change the name the name of the tests. Can you suggest me appropriate name? Ok, I will write more descriptive comments for each lit test case run.

jh7370 commented 4 months ago

I think part of the problem with the test complexity is because you are trying to use a fully linked object file as the input to the test. Converting it into YAML doesn't really do much other than change its representation from an ELF object into a YAML description of that ELF object. Usually, when we talk about using yaml2obj to generate test inputs, we are using a dramatically pared down version of an object, something that no linker would ever produce. For example, most of the sections in your YAML file are probably unnecessary for the test, so you should remove them. Remember the thing you're testing doesn't need to be some completely runnable program, it just needs to have sufficient symbols/debug info etc to exercise the behaviour in llvm-symbolizer that you want to test.

Additionally, obj2yaml doesn't really understand how to generate proper DWARF section descriptions in YAML, so falls back to using hex descriptions, which are, as @bd1976bris has pointed out, opaque and unreadable. yaml2obj DOES understand special descriptions that allow you to describe by hand the line table, for example (see the yaml2obj tests I pointed out before at llvm/test/tools/yaml2obj/ELF/DWARF), but you'll need to write these yourself, taking inspiration from an existing object for how the line table might be structured, rather than just trying to use obj2yaml to make them.

I'll leave it to the main reviewers to guide you in more detail, but if all you really care about testing is a line table with linked addresses in it (some of which are 0), you could start out by building a line table that you want in YAML (or asm) then add the necessary other bare minimum scaffolding to make it work. IIRC, you don't actually even need the .text to contain the addresses you have listed, as long as you have appropriate debug data and symbols, so your addresses could be fairly arbitrary.

bd1976bris commented 4 months ago
~$ llvm-mc -g -filetype=obj -triple=x86_64-unknown-linux skip-line-zero.s -o skip-line-zero.o
skip-line-zero.s:139:1: error: unknown directive
.function_and_data_sections
^

Apologies @ampandey-1995 I was in a hurry and left in a directive that we only support in our downstream toolchain. If you remove that line the test should pass. There's probably more in the assembly that can be stripped out as well. I have edited my original post so that the test should pass now. I didn't want to reply and cause another copy of the massive test text to be posted here. I didn't think about replies including the text - I should have used a gist or something!

ampandey-1995 commented 4 months ago

I think part of the problem with the test complexity is because you are trying to use a fully linked object file as the input to the test. Converting it into YAML doesn't really do much other than change its representation from an ELF object into a YAML description of that ELF object. Usually, when we talk about using yaml2obj to generate test inputs, we are using a dramatically pared down version of an object, something that no linker would ever produce. For example, most of the sections in your YAML file are probably unnecessary for the test, so you should remove them. Remember the thing you're testing doesn't need to be some completely runnable program, it just needs to have sufficient symbols/debug info etc to exercise the behaviour in llvm-symbolizer that you want to test. Additionally, obj2yaml doesn't really understand how to generate proper DWARF section descriptions in YAML, so falls back to using hex descriptions, which are, as @bd1976bris has pointed out, opaque and unreadable.

Actually, I took the inference from the test llvm-project/llvm/test/tools/llvm-symbolizer/data-location.yaml for the creation of approximation-line-handcrafted.yaml. I can remove some of the unnecessary symbols from the yaml if it's ok.

yaml2obj DOES understand special descriptions that allow you to describe by hand the line table, for example (see the yaml2obj tests I pointed out before at llvm/test/tools/yaml2obj/ELF/DWARF), but you'll need to write these yourself, taking inspiration from an existing object for how the line table might be structured, rather than just trying to use obj2yaml to make them.

Ok , So I think then If it's ok for everyone. I will create the line-table taking inference from llvm/test/tools/yaml2obj/ELF/DWARF

I'll leave it to the main reviewers to guide you in more detail, but if all you really care about testing is a line table with linked addresses in it (some of which are 0), you could start out by building a line table that you want in YAML (or asm) then add the necessary other bare minimum scaffolding to make it work. IIRC, you don't actually even need the .text to contain the addresses you have listed, as long as you have appropriate debug data and symbols, so your addresses could be fairly arbitrary.

Ok , Will try to amend the current test approximate-line-handcrafted.yaml which is similar in format to llvm/test/tools/yaml2obj/ELF/DWARF/debug-line.yaml.

ampandey-1995 commented 4 months ago
~$ llvm-mc -g -filetype=obj -triple=x86_64-unknown-linux skip-line-zero.s -o skip-line-zero.o
skip-line-zero.s:139:1: error: unknown directive
.function_and_data_sections
^

Apologies @ampandey-1995 I was in a hurry and left in a directive that we only support in our downstream toolchain. If you remove that line the test should pass. There's probably more in the assembly that can be stripped out as well. I have edited my original post so that the test should pass now. I didn't want to reply and cause another copy of the massive test text to be posted here. I didn't think about replies including the text - I should have used a gist or something!

Hi @bd1976bris , there is no need to apologize for the same. Although, I tested your latest test case.

~$ llvm-mc -g -filetype=obj -triple=x86_64-unknown-linux skip-line-zero.s -o skip-line-zero.o
<unknown>:0: error: symbol '.Lline_table_start0' is already defined

Github also allows attachment if it's ok can you please share the full test-case (without stripping) through attachments.

bd1976bris commented 4 months ago

Hi @bd1976bris , there is no need to apologize for the same. Although, I tested your latest test case.

~$ llvm-mc -g -filetype=obj -triple=x86_64-unknown-linux skip-line-zero.s -o skip-line-zero.o
<unknown>:0: error: symbol '.Lline_table_start0' is already defined

This is happening because you are passing -g. This causes the assembler to generate a line table which conflicts with the one that I have handwritten. The test should pass as the llvm-mc line in the test does not use -g.

ampandey-1995 commented 4 months ago

Hi @bd1976bris , there is no need to apologize for the same. Although, I tested your latest test case.

~$ llvm-mc -g -filetype=obj -triple=x86_64-unknown-linux skip-line-zero.s -o skip-line-zero.o
<unknown>:0: error: symbol '.Lline_table_start0' is already defined

This is happening because you are passing -g. This causes the assembler to generate a line table which conflicts with the one that I have handwritten. The test should pass as the llvm-mc line in the test does not use -g.

Yeah , I will restrict using -g in llvm-mc invocation as it is overwriting the debug-info already created in assembly. Also -gline-line-tables dosen't produce the elaborate debug-info like that which you shared in your test case. That is why my debuginfo generated in not complete enough to modify the line-table section parts. I'll try to resolve this also as amending line-table requires complete information of line-table section.

bd1976bris commented 4 months ago

@ampandey-1995 I fixed a problem with my test by correcting the end sequence address for symbol foo. I have also stripped out as much as I think I can get away with. I have updated the example in the comment but also attached the test file here: skip-line-zero.zip

bd1976bris commented 4 months ago

@ampandey-1995 I hope you have everything you need for the testing now? Please ask if there's anything else.

ampandey-1995 commented 4 months ago

@ampandey-1995 I hope you have everything you need for the testing now? Please ask if there's anything else.

Sorry, I was busy with some internal work and also it took time to understand and encode the handwritten DWARF Line Table assembly . I was basically trying to make it work with dwarf-version 5 but couldn't able to get to the point to sucessfully encode the header for dwarf V5. I have successfully handcoded encoded assembly using dwarf v4.

bd1976bris commented 4 months ago

@ampandey-1995 I hope you have everything you need for the testing now? Please ask if there's anything else.

I have successfully handcoded encoded assembly using dwarf v4.

I think dwarf 4 is fine. There are no changes to the line table between dwarf v4 and v5 that would affect --skip-line-zero.

bd1976bris commented 4 months ago

The tests are much improved IMO.

My high-level comment is that we don't need llvm/test/tools/llvm-symbolizer/approximate-line-generated.s. The comment near the top of that test states..

## This test illustrates the usage of generated assembly by clang to produce the following line table:

..but the point of the testing being added is to test --skip-line-zero. We need to produce test input with a valid line table to test --skip-line-zero and the tests should do so in the most readable manner, however, as long as the test input is correct it doesn't matter "how" it was produced. I see no benefit to Clang generating the line table assembly vs hand writing it.

I would move the test cases in approximate-line-generated.s into approximate-line-handcrafted.s (although some appear to be duplicates which can be dropped). I would also rename approximate-line-handcrafted.s to reflect that it is testing --skip-line-zero (e.g. skip-line-zero.s).

ampandey-1995 commented 3 months ago

The tests are much improved IMO.

Thanks.

My high-level comment is that we don't need llvm/test/tools/llvm-symbolizer/approximate-line-generated.s. The comment near the top of that test states..

## This test illustrates the usage of generated assembly by clang to produce the following line table:

..but the point of the testing being added is to test --skip-line-zero. We need to produce test input with a valid line table to test --skip-line-zero and the tests should do so in the most readable manner, however, as long as the test input is correct it doesn't matter "how" it was produced. I see no benefit to Clang generating the line table assembly vs hand writing it.

I would move the test cases in approximate-line-generated.s into approximate-line-handcrafted.s (although some appear to be duplicates which can be dropped). I would also rename approximate-line-handcrafted.s to reflect that it is testing --skip-line-zero (e.g. skip-line-zero.s).

Ok, I'll remove the approximate-line-generated.s and move non-duplicated tests to skip-line-zero.s

ampandey-1995 commented 3 months ago

ping