Closed ampandey-1995 closed 2 months ago
@llvm/pr-subscribers-lld-wasm @llvm/pr-subscribers-lld-macho @llvm/pr-subscribers-llvm-binary-utilities @llvm/pr-subscribers-lld-coff @llvm/pr-subscribers-debuginfo
@llvm/pr-subscribers-lld
Author: None (ampandey-1995)
@llvm/pr-subscribers-lld-elf
Author: None (ampandey-1995)
Hi @ampandey-1995, how does this relate to #71032?
Hi @ampandey-1995, how does this relate to #71032?
This patch is similar to the functionality achieved in 71032 and somewhat refined based on comments of @jh7370 & @dwblaikie .
Hi @ampandey-1995, how does this relate to #71032?
This patch is similar to the functionality achieved in 71032 and somewhat refined based on comments of @jh7370 & @dwblaikie .
Okay, is there a reason you haven't just updated that PR rather than create an entirely new one?
Hi @ampandey-1995, how does this relate to #71032?
This patch is similar to the functionality achieved in 71032 and somewhat refined based on comments of @jh7370 & @dwblaikie .
Okay, is there a reason you haven't just updated that PR rather than create an entirely new one?
Apologies, I will close the old PR.
Hi @ampandey-1995, I'm glad to see this come up again.
There was some discussion in the previous PR that we didn't get to the bottom of, so I'll state it here: @jh7370 mentioned the potential different methods of "Use of the last address before the current one with a non-zero line value" vs "The last line table entry before the one for the specified address, with a non-zero line value". See https://github.com/llvm/llvm-project/pull/71032#issuecomment-1801303252.
I think the llvm-symbolizer command guide should clearly specify the method used to derive the line number estimation. Maybe the help output as well.
There is also the question of would it be useful to have both methods available to the user, as outlined by @jh7370 in https://github.com/llvm/llvm-project/pull/71032#issuecomment-1798014589. I'm torn as although I see the value in giving the user the option, in most cases just outputting the previous entry in the line table would probably be good enough for an estimate and save making this too complicated.
I also think it should be clearer in the output of llvm-symbolizer when it is an approximate output value. You can input multiple addresses into llvm-symbolizer in one invocation so it would be useful to see which outputs required an estimate vs which are accurate.
Hi @ampandey-1995, I'm glad to see this come up again.
Thanks @gbreynoo for reviewing the patch.
There was some discussion in the previous PR that we didn't get to the bottom of, so I'll state it here: @jh7370 mentioned the potential different methods of "Use of the last address before the current one with a non-zero line value" vs "The last line table entry before the one for the specified address, with a non-zero line value". See #71032 (comment).
Thanks again for pointing the comments of @jh7370. I think this patch is somewhat related to querying of line table entries(Second method) for extracting significant line information.
Previously , the patch 71032 was based on the approach of <incrementing/decrementing> address from the address having no line information but that approach dosen't fit well since querying llvm-symbolizer for every address used a lot of symbolizer API's calls invoking DWARF API's to extract line information. Also, if no debug information is present in the object then llvm-symbolizer will keep calling into DWARF API's as we don't get any non-zero line information which sometimes hangs the llvm-symbolizer tool itself.
The current patch tries to query the address(having no line information) by introspecting the line table having bounds [lowPC,highPC]. The search happens usually from [lowPC,SearchPC] if "before" is mentioned or from [SearchPC,highPC] if "after" is mentioned as value of option --approximate-line-info.
I think the llvm-symbolizer command guide should clearly specify the method used to derive the line number estimation. Maybe the help output as well.
Ok I will update the command guide & help output.
There is also the question of would it be useful to have both methods available to the user, as outlined by @jh7370 in #71032 (comment). I'm torn as although I see the value in giving the user the option, in most cases just outputting the previous entry in the line table would probably be good enough for an estimate and save making this too complicated.
I agree with you about the approach of querying line table as a good estimation method and also in terms of performance.
I also think it should be clearer in the output of llvm-symbolizer when it is an approximate output value. You can input multiple addresses into llvm-symbolizer in one invocation so it would be useful to see which outputs required an estimate vs which are accurate.
Yeah, Thanks will do that. Is it ok to attach a tag such as (approximate) similar to (inlined by) at the end of Line:Column information?
I also think it should be clearer in the output of llvm-symbolizer when it is an approximate output value. You can input multiple addresses into llvm-symbolizer in one invocation so it would be useful to see which outputs required an estimate vs which are accurate.
Yeah, Thanks will do that. Is it ok to attach a tag such as (approximate) similar to (inlined by) at the end of Line:Column information?
That sounds good to me. With there being no equivalent functionality in addr2line to follow, I think you are right to follow the inline output behavior.
Before really looking at this PR, I'd like to hear from @dwblaikie, as I know he had some objections before.
Before really looking at this PR, I'd like to hear from @dwblaikie, as I know he had some objections before.
Hi @dwblaikie any concerns regarding this patch?
ping @dwblaikie, @jh7370, @gbreynoo.
'tis on my list, sorry I haven't got to it yet.
:white_check_mark: With the latest revision this PR passed the C/C++ code formatter.
Is it suitable to give the user the choice between forwards and backwards propagation? How would someone decide which is right for them?
From my perspective, I believe it is good to provide iterative forward/backwards search for approximate line information within function boundaries. Most users would rely on backwards estimate but it is also not wrong/doable to provide forward estimate.
That this can propagate past basic block, or even function boundaries seems pretty problematic - since the answer could then change depending on code layout, or could give someone a totally wrong function (ask me about what GCC did with LLVM's use of is_stmt many years ago - that was fun, breaking on a function would cause GCC to break on both the function you asked about, and the end of the previous function in the code... due to a bug like this one - it started at the instruction where the function started, then went searching to figure out which statement covered that instruction - which was back in the function that preceeded the one you were trying to break on... good times)
Thanks, the current revision considers the function boundaries also to be taken into account for the forward/backward line information estimation.
ping
Is it suitable to give the user the choice between forwards and backwards propagation? How would someone decide which is right for them?
From my perspective, I believe it is good to provide iterative forward/backwards search for approximate line information within function boundaries. Most users would rely on backwards estimate but it is also not wrong/doable to provide forward estimate.
I don't think this is something we should expose unless we've got a pretty good reason to motivate how a user would make this choice.
A further thought: Would it be reasonable for this option to change how the line table is parsed from its input? It could ignore the line zero entries in the line table entirely? That would support "backwards" it maybe a simpler way? (I could be open to arguments that that's philosophically problematic - modifying the line table itself, rather than only how it's queried - @jh7370 ?)
This does cause the propagation to cross basic block boundaries, though - LLVM does have a mechanism to put line 0s just at the start of basic blocks to ensure that back propagation doesn't produce arbitrary results based on basic block ordering...
That this can propagate past basic block, or even function boundaries seems pretty problematic - since the answer could then change depending on code layout, or could give someone a totally wrong function (ask me about what GCC did with LLVM's use of is_stmt many years ago - that was fun, breaking on a function would cause GCC to break on both the function you asked about, and the end of the previous function in the code... due to a bug like this one - it started at the instruction where the function started, then went searching to figure out which statement covered that instruction - which was back in the function that preceeded the one you were trying to break on... good times)
Thanks, the current revision considers the function boundaries also to be taken into account for the forward/backward line information estimation.
Right - but basic block boundaries are a problem too. It's unfortunate/unpredictable/pretty problematic to cross a basic block boundary to retrieve a line table - because the answer's arbitrary - based on the compiler's basic block ordering.
A further thought: Would it be reasonable for this option to change how the line table is parsed from its input? It could ignore the line zero entries in the line table entirely? That would support "backwards" it maybe a simpler way? (I could be open to arguments that that's philosophically problematic - modifying the line table itself, rather than only how it's queried - @jh7370 ?)
From llvm-symbolizer's point of view, it could make sense to skip line 0 entries, since those entries don't provide any symbolization information. However, in more general terms, we can't change the line table parser to ignore them: consider the simple case where somebody wants to dump the raw line table, for example.
ping
A further thought: Would it be reasonable for this option to change how the line table is parsed from its input? It could ignore the line zero entries in the line table entirely? That would support "backwards" it maybe a simpler way? (I could be open to arguments that that's philosophically problematic - modifying the line table itself, rather than only how it's queried - @jh7370 ?)
From llvm-symbolizer's point of view, it could make sense to skip line 0 entries, since those entries don't provide any symbolization information.
I disagree here - I thin kit does provide value to not claim a given instruction comes from a line we don't know it came from, by default at least/without some user opt-in and/or without telling the user that's what we're doing in some particular output.
However, in more general terms, we can't change the line table parser to ignore them: consider the simple case where somebody wants to dump the raw line table, for example.
nod I didn't mean to change it always, but to change it opt-in via a flag (like the one being discussed in this patch).
And I think that'd provide the simplest implementation, and easier to explain - though I still think it's pretty error prone/likely to produce confusion for users (which is why I have misgivings about implementing any support for this, to be honest).
A further thought: Would it be reasonable for this option to change how the line table is parsed from its input? It could ignore the line zero entries in the line table entirely? That would support "backwards" it maybe a simpler way? (I could be open to arguments that that's philosophically problematic - modifying the line table itself, rather than only how it's queried - @jh7370 ?)
From llvm-symbolizer's point of view, it could make sense to skip line 0 entries, since those entries don't provide any symbolization information.
I disagree here - I thin kit does provide value to not claim a given instruction comes from a line we don't know it came from, by default at least/without some user opt-in and/or without telling the user that's what we're doing in some particular output.
However, in more general terms, we can't change the line table parser to ignore them: consider the simple case where somebody wants to dump the raw line table, for example.
nod I didn't mean to change it always, but to change it opt-in via a flag (like the one being discussed in this patch).
And I think that'd provide the simplest implementation, and easier to explain - though I still think it's pretty error prone/likely to produce confusion for users (which is why I have misgivings about implementing any support for this, to be honest).
Yeah, I see your point, and that does sound a bit more natural. I've also reached out to our internal binutils team to see if they have any particular preferences as to any approach taken.
Hi All!
From Sony's POV the primary issue that we would like to solve is reporting effective line numbers when symbolizing stack traces. Currently, we have customers manually decrementing addresses when 0 is reported as the line number to try to get a useful line number reported. Customers report that they are generally happy with understanding their stack traces when they do this manually. Therefore, we would like to see a feature to improve on the customer doing this manually or scripting it up themselves.
This PR meets our criteria and we would be happy to see it (or an equivalent implementation) merged:
Utility - this PR reports satisfactory line numbers for problem stack walks we have had reported.
Potential for confusion - this PR is opt-in (via '-approximate-line-info=<before/after>') and clearly labels the line numbers produced as guesses by appending the string "(approximate)" to the output, this seems adequate to address the problem of potentially confusing users.
Provide an advantage over a simple script - this PR tries to limit the address range tried using the high/low PC and the presence of basic_block/prologue_end instructions. This is better than just trying decremented addresses until a non-zero line is found. This would also be reasonably difficult to implement in a script which parses the output from the binutils we provide, so including the functionality in symbolizer makes sense.
As to the best implementation I don't have a firm opinion yet. I will try to poke at this and provide some feedback. One open question I would like to understand is: How misleading/confusing could the approximate line numbers be? An answer to that would be useful in evaluating different implementations. In Sony's case we are really interested in the answer to the previous question when we restrict it to the case of symbolizing stack traces. I also wonder if function inlining can pose/exacerbate any correctness issues?
OK then - I guess if we're doing this, I think it should be previous only ("as if line zero hadn't been emitted at all") - and the implementation should be tidied up/simplified a bit.
We shouldn't keep probing by the previous single address - the row returned by findRowInSeq
should be decremented - looking at the previous row, until we either find a non-zero row, or reach Seq.FirstRowIndex
- in the latter case, we should produce zero anyway, because there's no other address to produce reasonably (walking past the beginning of a sequence would give an arbitrary location in another .text section)?
Does that work well enough for everyone?
What @dwblaikie said.
Does that work well enough for everyone?
@dwblaikie - Yes. Thanks for taking our use-case into consideration!
I had a look at how this interacts with other options and I everything looks reasonable as far as I can see. For --print-source-context-lines
I tried printing a line zero address with --skip-line-zero
and the output looks good:
>llvm-symbolizer.exe 0x0000000000000156 --obj=main.elf --skip-line-zero --print-source-context-lines=3
f3
main.c:20:0 (approximate)
19 : {
20 >: gvar3++;
21 : if (gvar3 > 1 && b < a)
I noticed that for the --print-source-context-lines
for line zero addresses without --skip-line-zero
you currently get very confusing output where the lines around line zero are displayed:
>llvm-symbolizer.exe 0x0000000000000156 --obj=main.elf --print-source-context-lines=3
f3
main.c:0:0
1 : static int gvar1 = 0;
2 : static int gvar2 = 0;
3 : static int gvar3 = 0;
I think that ideally it would be better not to display the context lines for line zero addresses. It might also be nice to emit a warning recommending trying a different address or using --skip-line-zero
.
I noticed that for the
--print-source-context-lines
for line zero addresses without--skip-line-zero
you currently get very confusing output where the lines around line zero are displayed:
I think that is the default behaviour --print-source-context-lines will display the source lines around the queried address even when that address has no line number.
>llvm-symbolizer.exe 0x0000000000000156 --obj=main.elf --print-source-context-lines=3 f3 main.c:0:0 1 : static int gvar1 = 0; 2 : static int gvar2 = 0; 3 : static int gvar3 = 0;
I think that ideally it would be better not to display the context lines for line zero addresses. It might also be nice to emit a warning recommending trying a different address or using
--skip-line-zero
.
This would require making a change in the default functionality of --print-source-context-lines
. Ideally, with/without --skip-line-zero
--print-source-context-lines
should output source context lines by default for any queried address(even it has line zero.).
I noticed that for the
--print-source-context-lines
for line zero addresses without--skip-line-zero
you currently get very confusing output where the lines around line zero are displayed:I think that is the default behaviour --print-source-context-lines will display the source lines around the queried address even when that address has no line number.
>llvm-symbolizer.exe 0x0000000000000156 --obj=main.elf --print-source-context-lines=3 f3 main.c:0:0 1 : static int gvar1 = 0; 2 : static int gvar2 = 0; 3 : static int gvar3 = 0;
I think that ideally it would be better not to display the context lines for line zero addresses. It might also be nice to emit a warning recommending trying a different address or using
--skip-line-zero
.This would require making a change in the default functionality of
--print-source-context-lines
. Ideally, with/without--skip-line-zero
--print-source-context-lines
should output source context lines by default for any queried address(even it has line zero.).
Right I think that this is an enhancement that is separate to this PR. I have filed: https://github.com/llvm/llvm-project/issues/92403.
My reading of DWARF suggests that this is legal as there's nothing to disallow two contiguous sequences AFAICS.
The DWARF standard doesn't say anything about how two sequences relate to each other. While the intent of a sequence is to cover all contiguous instructions, nothing in the spec mandates this.
I suspect that llvm-mc is unlikely to generate multiple sequences for the same section, and for different sections it will of course use zero as the starting offset. An explicit line table in the assembly source, such as @bd1976bris proposed, will let you have multiple sequences in the same section, and therefore a non-zero starting point, in a single relocatable object. This means the linker is not required.
I suspect that llvm-mc is unlikely to generate multiple sequences for the same section, and for different sections it will of course use zero as the starting offset. An explicit line table in the assembly source, such as @bd1976bris proposed, will let you have multiple sequences in the same section, and therefore a non-zero starting point, in a single relocatable object. This means the linker is not required.
Yes , the current revision now dosen't invoke the linker.
ping
ping
@ampandey-1995 this might well be my failing but I'm afraid that I'm struggling to understand the tests. Before going on though I would like to say that I appreciate the effort you have put into them :) However, I don't understand why we would want to invoke the compiler and the linker (which is a complicated process to understand and results in input full of unnecessary features e.g. lots of crt symbols etc) rather than just hand coding the input? Furthermore, whilst I appreciate your use of the yaml2obj tool was suggested here, I don't think that the yaml input is very readable as it has large blocks of opaque hex. I also don't understand why we need two tests - can't we use one test for all the cases we need? I think that the names of the tests need to clearly describe to what they are testing (not how they are implemented e.g. "handcrafted") and the tests need comments describing what the test features and cases are.
I will try to post up an improved test if I can.
I will try to post up an improved test if I can.
@ampandey-1995 below is my attempt. I think that it contains the testcases from both the of current tests and IMO it is more readable/understandable.
$ cat skip-line-zero.s
## Test the --skip-line-zero option.
##
## This test uses hand written assembly to produce the following line table:
## Address Line Column File ISA Discriminator Flags
## ------------------ ------ ------ ------ --- ------------- -------------
## 0x0000000000000000 10 0 1 0 0
## 0x0000000000000001 0 0 1 0 0
## 0x0000000000000002 0 0 1 0 0
## 0x0000000000000006 8 0 1 0 0 end_sequence
## 0x0000000000000006 0 0 1 0 0
## 0x0000000000000007 0 0 1 0 0
## 0x0000000000000016 5 0 1 0 0 end_sequence
##
## The first sequence is for symbol foo and the second is for symbol bar.
# REQUIRES: x86-registered-target
# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t.o
## Check that without --skip-line-zero line zero is displayed for a line with no source correspondence.
# RUN: llvm-symbolizer --obj=%t.o 0x2 | \
# RUN: FileCheck --check-prefix=SKIP-DISABLED %s
# SKIP-DISABLED:foo
# SKIP-DISABLED-NEXT:two.c:0:0
## Check that with --skip-line-zero the last line non-zero line in the current sequence is displayed.
# RUN: llvm-symbolizer --obj=%t.o 0x2 --skip-line-zero | \
# RUN: FileCheck --check-prefix=SKIP-ENABLED %s
# SKIP-ENABLED:foo
# SKIP-ENABLED-NEXT:two.c:10:0 (approximate)
## Check that that --skip-line-zero only affects line zero addresses when more than one address is specified.
# RUN: llvm-symbolizer --obj=%t.o --skip-line-zero 0x2 0x1 | \
# RUN: FileCheck --check-prefixes=SKIP-ENABLED,NO-SKIP %s
# NO-SKIP:foo
# NO-SKIP-NEXT:two.c:10:0
## Check --verbose output is correct with --skip-line-zero.
# RUN: llvm-symbolizer --obj=%t.o --skip-line-zero --verbose 0x2 | \
# RUN: FileCheck --check-prefix=SKIP-VERBOSE %s
# SKIP-VERBOSE:foo
# SKIP-VERBOSE-NEXT: Filename: {{.*}}two.c
# SKIP-VERBOSE-NEXT: Function start address: 0x0
# SKIP-VERBOSE-NEXT: Line: 10
# SKIP-VERBOSE-NEXT: Column: 0
# SKIP-VERBOSE-NEXT: Approximate: true
## Check --output-style=JSON output is correct with --skip-line-zero.
# RUN: llvm-symbolizer --obj=%t.o --skip-line-zero --output-style=JSON 0x2 | \
# RUN: FileCheck --check-prefix=SKIP-JSON %s
# SKIP-JSON:[{"Address":"0x2","ModuleName":"{{.*}}skip-line-zero.s.tmp.o","Symbol":[{"Approximate":true,"Column":0,"Discriminator":0,"FileName":"{{.*}}two.c","FunctionName":"foo","Line":10,"StartAddress":"0x0","StartFileName":"","StartLine":0}]}]
## Check that that --skip-line-zero does not cross sequence boundaries.
# RUN: llvm-symbolizer --obj=%t.o --skip-line-zero 0x7 | \
# RUN: FileCheck --check-prefixes=SKIP-BOUNDARY %s
# SKIP-BOUNDARY:bar
# SKIP-BOUNDARY:two.c:0:0
.globl foo
foo:
.space 6
.Lfoo_end:
.globl bar
bar:
.space 16
.Lbar_end:
.section .debug_abbrev,"",@progbits
.byte 1 # Abbreviation Code
.byte 17 # DW_TAG_compile_unit
.byte 0 # DW_CHILDREN_no
.byte 16 # DW_AT_stmt_list
.byte 23 # DW_FORM_sec_offset
.byte 17 # DW_AT_low_pc
.byte 1 # DW_FORM_addr
.byte 18 # DW_AT_high_pc
.byte 6 # DW_FORM_data4
.byte 0 # EOM(1)
.byte 0 # EOM(2)
.byte 0 # EOM(3)
.section .debug_info,"",@progbits
.Lcu_begin0:
.long .Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit
.Ldebug_info_start0:
.short 4 # DWARF version number
.long .debug_abbrev # Offset Into Abbrev. Section
.byte 8 # Address Size (in bytes)
.byte 1 # Abbrev [1] 0xb:0x1f DW_TAG_compile_unit
.long .Lline_table_start0 # DW_AT_stmt_list
.quad 0 # DW_AT_low_pc
.long .Lbar_end-foo # DW_AT_high_pc
.Ldebug_info_end0:
.section .debug_line,"",@progbits
.Lline_table_start0:
.long .Lunit_end - .Lunit_start # unit length
.Lunit_start:
.short 4 # version
.long .Lprologue_end - .Lprologue_start # header length
.Lprologue_start:
.byte 1 # minimum_instruction_length
.byte 1 # maximum_operations_per_instruction
.byte 0 # default_is_stmt
.byte -5 # line_base
.byte 14 # line_range
.byte 13 # opcode_base
.byte 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1 # arguments in standard opcodes
.asciz "dir0" # include directory
.byte 0 # end of include directories
.asciz "two.c" # filename
.byte 0 # reference to dir0
.byte 0 # modification time
.byte 0 # length of file (unavailable)
.byte 0 # end of filenames
.Lprologue_end:
.byte 0, 9, 2 # DW_LNE_set_address
.quad 0x0 # foo (to 0)
.byte 3 # DW_LNS_advance_line
.sleb128 9 # by 9 (to 10)
.byte 1 # DW_LNS_copy
.byte 3 # DW_LNS_advance_line
.sleb128 -10 # by -10 (to 0)
.byte 2 # DW_LNS_advance_pc
.byte 1 # += (1 * min instruction length) (to 1)
.byte 1 # DW_LNS_copy
.byte 2 # DW_LNS_advance_pc
.byte 1 # += (1 * min instruction length) (to 2)
.byte 1 # DW_LNS_copy
.byte 3 # DW_LNS_advance_line
.sleb128 8 # by 8 (to 8)
.byte 2 # DW_LNS_advance_pc
.byte 4 # += (4 * min instruction length) (to 6)
.byte 0, 1, 1 # DW_LNE_end_sequence
.byte 0, 9, 2 # DW_LNE_set_address
.quad bar # bar (to 6)
.byte 3 # DW_LNS_advance_line
.sleb128 -1 # by -1 (to 0)
.byte 1 # DW_LNS_copy
.byte 2 # DW_LNS_advance_pc
.byte 1 # += (1 * min instruction length) (to 7)
.byte 1 # DW_LNS_copy
.byte 3 # DW_LNS_advance_line
.sleb128 5 # by 5 (to 5)
.byte 2 # DW_LNS_advance_pc
.byte 15 # += (15 * min instruction length) (to 22)
.byte 0, 1, 1 # DW_LNE_end_sequence
.Lunit_end:
I will try to post up an improved test if I can.
@ampandey-1995 below is my attempt. I think that it contains the testcases from both the of current tests and IMO it is more readable/understandable.
$ cat skip-line-zero.s ## Test the --skip-line-zero option. ## ## This test uses hand written assembly to produce the following line table: ## Address Line Column File ISA Discriminator Flags ## ------------------ ------ ------ ------ --- ------------- ------------- ## 0x0000000000000000 10 0 1 0 0 ## 0x0000000000000001 9 0 1 0 0 ## 0x0000000000000002 0 0 1 0 0 ## 0x0000000000000003 8 0 1 0 0 end_sequence ## 0x0000000000000006 0 0 1 0 0 ## 0x0000000000000007 0 0 1 0 0 ## 0x0000000000000016 5 0 1 0 0 end_sequence ## ## The first sequence is for symbol foo and the second is for symbol bar. # REQUIRES: x86-registered-target # RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t.o ## Check that without --skip-line-zero line zero is displayed for a line with no source correspondence. # RUN: llvm-symbolizer --obj=%t.o 0x2 | \ # RUN: FileCheck --check-prefix=SKIP-DISABLED %s # SKIP-DISABLED:foo # SKIP-DISABLED-NEXT:two.c:0:0 ## Check that with --skip-line-zero the last line non-zero line in the current sequence is displayed. # RUN: llvm-symbolizer --obj=%t.o 0x2 --skip-line-zero | \ # RUN: FileCheck --check-prefix=SKIP-ENABLED %s # SKIP-ENABLED:foo # SKIP-ENABLED-NEXT:two.c:9:0 (approximate) ## Check that that --skip-line-zero only affects line zero addresses when more than one address is specified. # RUN: llvm-symbolizer --obj=%t.o --skip-line-zero 0x2 0x1 | \ # RUN: FileCheck --check-prefixes=SKIP-ENABLED,NO-SKIP %s # NO-SKIP:foo # NO-SKIP-NEXT:two.c:9:0 ## Check --verbose output is correct with --skip-line-zero. # RUN: llvm-symbolizer --obj=%t.o --skip-line-zero --verbose 0x2 | \ # RUN: FileCheck --check-prefix=SKIP-VERBOSE %s # SKIP-VERBOSE:foo # SKIP-VERBOSE-NEXT: Filename: {{.*}}two.c # SKIP-VERBOSE-NEXT: Function start address: 0x0 # SKIP-VERBOSE-NEXT: Line: 9 # SKIP-VERBOSE-NEXT: Column: 0 # SKIP-VERBOSE-NEXT: Approximate: true ## Check --output-style=JSON output is correct with --skip-line-zero. # RUN: llvm-symbolizer --obj=%t.o --skip-line-zero --output-style=JSON 0x2 | \ # RUN: FileCheck --check-prefix=SKIP-JSON %s # SKIP-JSON:[{"Address":"0x2","ModuleName":"{{.*}}skip-line-zero.s.tmp.o","Symbol":[{"Approximate":true,"Column":0,"Discriminator":0,"FileName":"{{.*}}two.c","FunctionName":"foo","Line":9,"StartAddress":"0x0","StartFileName":"","StartLine":0}]}] ## Check that that --skip-line-zero does not cross sequence boundaries. # RUN: llvm-symbolizer --obj=%t.o --skip-line-zero 0x7 | \ # RUN: FileCheck --check-prefixes=SKIP-BOUNDARY %s # SKIP-BOUNDARY:bar # SKIP-BOUNDARY:two.c:0:0 .section .text.foo,"ax",@progbits .globl foo foo: .Lfunc_begin0: movl $10, %eax retq .Lfunc_end0: .size foo, .Lfunc_end0-foo .globl bar bar: .Lfunc_begin1: # %bb.0: pushq %rbp movq %rsp, %rbp callq foo movl $20, %eax popq %rbp retq .Lfunc_end1: .size bar, .Lfunc_end1-bar .section .debug_abbrev,"",@progbits .byte 1 # Abbreviation Code .byte 17 # DW_TAG_compile_unit .byte 0 # DW_CHILDREN_no .byte 37 # DW_AT_producer .byte 14 # DW_FORM_strp .byte 19 # DW_AT_language .byte 5 # DW_FORM_data2 .byte 3 # DW_AT_name .byte 14 # DW_FORM_strp .byte 16 # DW_AT_stmt_list .byte 23 # DW_FORM_sec_offset .byte 27 # DW_AT_comp_dir .byte 14 # DW_FORM_strp .byte 83 # DW_AT_use_UTF8 .byte 25 # DW_FORM_flag_present .byte 17 # DW_AT_low_pc .byte 1 # DW_FORM_addr .byte 85 # DW_AT_ranges .byte 23 # DW_FORM_sec_offset .byte 0 # EOM(1) .byte 0 # EOM(2) .byte 0 # EOM(3) .section .debug_info,"",@progbits .Lcu_begin0: .long .Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit .Ldebug_info_start0: .short 4 # DWARF version number .long .debug_abbrev # Offset Into Abbrev. Section .byte 8 # Address Size (in bytes) .byte 1 # Abbrev [1] 0xb:0x1f DW_TAG_compile_unit .long .Linfo_string0 # DW_AT_producer .short 29 # DW_AT_language .long .Linfo_string1 # DW_AT_name .long .Lline_table_start0 # DW_AT_stmt_list .long .Linfo_string2 # DW_AT_comp_dir # DW_AT_use_UTF8 .quad 0 # DW_AT_low_pc .long .Ldebug_ranges0 # DW_AT_ranges .Ldebug_info_end0: .section .debug_aranges,"",@progbits .section .debug_ranges,"",@progbits .Ldebug_ranges0: .quad .Lfunc_begin0 .quad .Lfunc_end0 .quad .Lfunc_begin1 .quad .Lfunc_end1 .quad 0 .quad 0 .section .debug_str,"MS",@progbits,1 .Linfo_string0: .asciz "clang version 16.0.5 ---------------------------------------" # string offset=0 .Linfo_string1: .asciz "two.c" # string offset=61 .Linfo_string2: .asciz "c:\\Temp\\dwarfline" # string offset=67 .ident "clang version 16.0.5 ---------------------------------------" .section ".note.GNU-stack","",@progbits .function_and_data_sections .section .debug_line,"",@progbits .Lline_table_start0: .long .Lunit_end - .Lunit_start # unit length .Lunit_start: .short 4 # version .long .Lprologue_end - .Lprologue_start # header length .Lprologue_start: .byte 1 # minimum_instruction_length .byte 1 # maximum_operations_per_instruction .byte 0 # default_is_stmt .byte -5 # line_base .byte 14 # line_range .byte 13 # opcode_base .byte 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1 # arguments in standard opcodes .asciz "dir0" # include directory .asciz "dir1" # include directory .byte 0 # end of include directories .asciz "two.c" # filename .byte 0 # reference to dir0 .byte 0 # modification time .byte 0 # length of file (unavailable) .byte 0 # end of filenames .Lprologue_end: .byte 0, 9, 2 # DW_LNE_set_address .quad 0x0 # foo (to 0) .byte 3 # DW_LNS_advance_line .sleb128 9 # by 9 (to 10) .byte 1 # DW_LNS_copy .byte 3 # DW_LNS_advance_line .sleb128 -1 # by -1 (to 9) .byte 2 # DW_LNS_advance_pc .byte 1 # += (1 * min instruction length) (to 1) .byte 1 # DW_LNS_copy .byte 3 # DW_LNS_advance_line .sleb128 -9 # by -9 (to 0) .byte 2 # DW_LNS_advance_pc .byte 1 # += (1 * min instruction length) (to 2) .byte 1 # DW_LNS_copy .byte 3 # DW_LNS_advance_line .sleb128 8 # by 8 (to 8) .byte 2 # DW_LNS_advance_pc .byte 1 # += (1 * min instruction length) (to 3) .byte 0, 1, 1 # DW_LNE_end_sequence .byte 0, 9, 2 # DW_LNE_set_address .quad .Lfunc_begin1 - .Lfunc_begin0 # bar (to 6) .byte 3 # DW_LNS_advance_line .sleb128 -1 # by -1 (to 0) .byte 1 # DW_LNS_copy .byte 2 # DW_LNS_advance_pc .byte 1 # += (1 * min instruction length) (to 7) .byte 1 # DW_LNS_copy .byte 3 # DW_LNS_advance_line .sleb128 5 # by 5 (to 5) .byte 2 # DW_LNS_advance_pc .byte 15 # += (15 * min instruction length) (to 22) .byte 0, 1, 1 # DW_LNE_end_sequence .Lunit_end:
This is failing at my side
I will try to post up an improved test if I can.
@ampandey-1995 below is my attempt. I think that it contains the testcases from both the of current tests and IMO it is more readable/understandable.
$ cat skip-line-zero.s ## Test the --skip-line-zero option. ## ## This test uses hand written assembly to produce the following line table: ## Address Line Column File ISA Discriminator Flags ## ------------------ ------ ------ ------ --- ------------- ------------- ## 0x0000000000000000 10 0 1 0 0 ## 0x0000000000000001 9 0 1 0 0 ## 0x0000000000000002 0 0 1 0 0 ## 0x0000000000000003 8 0 1 0 0 end_sequence ## 0x0000000000000006 0 0 1 0 0 ## 0x0000000000000007 0 0 1 0 0 ## 0x0000000000000016 5 0 1 0 0 end_sequence ## ## The first sequence is for symbol foo and the second is for symbol bar. # REQUIRES: x86-registered-target # RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t.o ## Check that without --skip-line-zero line zero is displayed for a line with no source correspondence. # RUN: llvm-symbolizer --obj=%t.o 0x2 | \ # RUN: FileCheck --check-prefix=SKIP-DISABLED %s # SKIP-DISABLED:foo # SKIP-DISABLED-NEXT:two.c:0:0 ## Check that with --skip-line-zero the last line non-zero line in the current sequence is displayed. # RUN: llvm-symbolizer --obj=%t.o 0x2 --skip-line-zero | \ # RUN: FileCheck --check-prefix=SKIP-ENABLED %s # SKIP-ENABLED:foo # SKIP-ENABLED-NEXT:two.c:9:0 (approximate) ## Check that that --skip-line-zero only affects line zero addresses when more than one address is specified. # RUN: llvm-symbolizer --obj=%t.o --skip-line-zero 0x2 0x1 | \ # RUN: FileCheck --check-prefixes=SKIP-ENABLED,NO-SKIP %s # NO-SKIP:foo # NO-SKIP-NEXT:two.c:9:0 ## Check --verbose output is correct with --skip-line-zero. # RUN: llvm-symbolizer --obj=%t.o --skip-line-zero --verbose 0x2 | \ # RUN: FileCheck --check-prefix=SKIP-VERBOSE %s # SKIP-VERBOSE:foo # SKIP-VERBOSE-NEXT: Filename: {{.*}}two.c # SKIP-VERBOSE-NEXT: Function start address: 0x0 # SKIP-VERBOSE-NEXT: Line: 9 # SKIP-VERBOSE-NEXT: Column: 0 # SKIP-VERBOSE-NEXT: Approximate: true ## Check --output-style=JSON output is correct with --skip-line-zero. # RUN: llvm-symbolizer --obj=%t.o --skip-line-zero --output-style=JSON 0x2 | \ # RUN: FileCheck --check-prefix=SKIP-JSON %s # SKIP-JSON:[{"Address":"0x2","ModuleName":"{{.*}}skip-line-zero.s.tmp.o","Symbol":[{"Approximate":true,"Column":0,"Discriminator":0,"FileName":"{{.*}}two.c","FunctionName":"foo","Line":9,"StartAddress":"0x0","StartFileName":"","StartLine":0}]}] ## Check that that --skip-line-zero does not cross sequence boundaries. # RUN: llvm-symbolizer --obj=%t.o --skip-line-zero 0x7 | \ # RUN: FileCheck --check-prefixes=SKIP-BOUNDARY %s # SKIP-BOUNDARY:bar # SKIP-BOUNDARY:two.c:0:0 .section .text.foo,"ax",@progbits .globl foo foo: .Lfunc_begin0: movl $10, %eax retq .Lfunc_end0: .size foo, .Lfunc_end0-foo .globl bar bar: .Lfunc_begin1: # %bb.0: pushq %rbp movq %rsp, %rbp callq foo movl $20, %eax popq %rbp retq .Lfunc_end1: .size bar, .Lfunc_end1-bar .section .debug_abbrev,"",@progbits .byte 1 # Abbreviation Code .byte 17 # DW_TAG_compile_unit .byte 0 # DW_CHILDREN_no .byte 37 # DW_AT_producer .byte 14 # DW_FORM_strp .byte 19 # DW_AT_language .byte 5 # DW_FORM_data2 .byte 3 # DW_AT_name .byte 14 # DW_FORM_strp .byte 16 # DW_AT_stmt_list .byte 23 # DW_FORM_sec_offset .byte 27 # DW_AT_comp_dir .byte 14 # DW_FORM_strp .byte 83 # DW_AT_use_UTF8 .byte 25 # DW_FORM_flag_present .byte 17 # DW_AT_low_pc .byte 1 # DW_FORM_addr .byte 85 # DW_AT_ranges .byte 23 # DW_FORM_sec_offset .byte 0 # EOM(1) .byte 0 # EOM(2) .byte 0 # EOM(3) .section .debug_info,"",@progbits .Lcu_begin0: .long .Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit .Ldebug_info_start0: .short 4 # DWARF version number .long .debug_abbrev # Offset Into Abbrev. Section .byte 8 # Address Size (in bytes) .byte 1 # Abbrev [1] 0xb:0x1f DW_TAG_compile_unit .long .Linfo_string0 # DW_AT_producer .short 29 # DW_AT_language .long .Linfo_string1 # DW_AT_name .long .Lline_table_start0 # DW_AT_stmt_list .long .Linfo_string2 # DW_AT_comp_dir # DW_AT_use_UTF8 .quad 0 # DW_AT_low_pc .long .Ldebug_ranges0 # DW_AT_ranges .Ldebug_info_end0: .section .debug_aranges,"",@progbits .section .debug_ranges,"",@progbits .Ldebug_ranges0: .quad .Lfunc_begin0 .quad .Lfunc_end0 .quad .Lfunc_begin1 .quad .Lfunc_end1 .quad 0 .quad 0 .section .debug_str,"MS",@progbits,1 .Linfo_string0: .asciz "clang version 16.0.5 ---------------------------------------" # string offset=0 .Linfo_string1: .asciz "two.c" # string offset=61 .Linfo_string2: .asciz "c:\\Temp\\dwarfline" # string offset=67 .ident "clang version 16.0.5 ---------------------------------------" .section ".note.GNU-stack","",@progbits .function_and_data_sections .section .debug_line,"",@progbits .Lline_table_start0: .long .Lunit_end - .Lunit_start # unit length .Lunit_start: .short 4 # version .long .Lprologue_end - .Lprologue_start # header length .Lprologue_start: .byte 1 # minimum_instruction_length .byte 1 # maximum_operations_per_instruction .byte 0 # default_is_stmt .byte -5 # line_base .byte 14 # line_range .byte 13 # opcode_base .byte 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1 # arguments in standard opcodes .asciz "dir0" # include directory .asciz "dir1" # include directory .byte 0 # end of include directories .asciz "two.c" # filename .byte 0 # reference to dir0 .byte 0 # modification time .byte 0 # length of file (unavailable) .byte 0 # end of filenames .Lprologue_end: .byte 0, 9, 2 # DW_LNE_set_address .quad 0x0 # foo (to 0) .byte 3 # DW_LNS_advance_line .sleb128 9 # by 9 (to 10) .byte 1 # DW_LNS_copy .byte 3 # DW_LNS_advance_line .sleb128 -1 # by -1 (to 9) .byte 2 # DW_LNS_advance_pc .byte 1 # += (1 * min instruction length) (to 1) .byte 1 # DW_LNS_copy .byte 3 # DW_LNS_advance_line .sleb128 -9 # by -9 (to 0) .byte 2 # DW_LNS_advance_pc .byte 1 # += (1 * min instruction length) (to 2) .byte 1 # DW_LNS_copy .byte 3 # DW_LNS_advance_line .sleb128 8 # by 8 (to 8) .byte 2 # DW_LNS_advance_pc .byte 1 # += (1 * min instruction length) (to 3) .byte 0, 1, 1 # DW_LNE_end_sequence .byte 0, 9, 2 # DW_LNE_set_address .quad .Lfunc_begin1 - .Lfunc_begin0 # bar (to 6) .byte 3 # DW_LNS_advance_line .sleb128 -1 # by -1 (to 0) .byte 1 # DW_LNS_copy .byte 2 # DW_LNS_advance_pc .byte 1 # += (1 * min instruction length) (to 7) .byte 1 # DW_LNS_copy .byte 3 # DW_LNS_advance_line .sleb128 5 # by 5 (to 5) .byte 2 # DW_LNS_advance_pc .byte 15 # += (15 * min instruction length) (to 22) .byte 0, 1, 1 # DW_LNE_end_sequence .Lunit_end:
Copied your test-case on my machine. It fails.
~$ llvm-mc -g -filetype=obj -triple=x86_64-unknown-linux skip-line-zero.s -o skip-line-zero.o
skip-line-zero.s:139:1: error: unknown directive
.function_and_data_sections
^
@ampandey-1995 this might well be my failing but I'm afraid that I'm struggling to understand the tests. Before going on though I would like to say that I appreciate the effort you have put into them :)
Thanks for the input.
However, I don't understand why we would want to invoke the compiler and the linker (which is a complicated process to understand and results in input full of unnecessary features e.g. lots of crt symbols etc) rather than just hand coding the input?
Linker is not getting invoked in approximate-line-handcrafted.yaml. If you see @MaskRay comment https://github.com/llvm/llvm-project/pull/82240#discussion_r1602749928, Linker is not getting invoked in the lit test. However, it is required to generate or reproduce the yaml file only a once. Linker is not getting invoked in any RUN lines.
I also don't understand why we need two tests - can't we use one test for all the cases we need?
It's not mandatory, but from my perspective two separate tests are fine.
approximate-line-generated.s
(Single Sequence .text) is the generated assembly where no manual modification is happening. The test is basically to reflect on the understanding/showcase that clang produces line-zero for addresses.
approximate-line-handcrafted.yaml
(Two Sequences .text && .def_section) is the generated yaml from manually modified assembly where individual .loc directives of debug info are modified to zero so as to stress test the sequence boundaries and also to show approximated output for address which are in different functions(in different files). Check @dwblaikie comment https://github.com/llvm/llvm-project/pull/82240#discussion_r1595660179.
I think that the names of the tests need to clearly describe to what they are testing (not how they are implemented e.g. "handcrafted") and the tests need comments describing what the test features and cases are.
Ok, will change the name the name of the tests. Can you suggest me appropriate name? Ok, I will write more descriptive comments for each lit test case run.
I think part of the problem with the test complexity is because you are trying to use a fully linked object file as the input to the test. Converting it into YAML doesn't really do much other than change its representation from an ELF object into a YAML description of that ELF object. Usually, when we talk about using yaml2obj to generate test inputs, we are using a dramatically pared down version of an object, something that no linker would ever produce. For example, most of the sections in your YAML file are probably unnecessary for the test, so you should remove them. Remember the thing you're testing doesn't need to be some completely runnable program, it just needs to have sufficient symbols/debug info etc to exercise the behaviour in llvm-symbolizer that you want to test.
Additionally, obj2yaml doesn't really understand how to generate proper DWARF section descriptions in YAML, so falls back to using hex descriptions, which are, as @bd1976bris has pointed out, opaque and unreadable. yaml2obj DOES understand special descriptions that allow you to describe by hand the line table, for example (see the yaml2obj tests I pointed out before at llvm/test/tools/yaml2obj/ELF/DWARF), but you'll need to write these yourself, taking inspiration from an existing object for how the line table might be structured, rather than just trying to use obj2yaml to make them.
I'll leave it to the main reviewers to guide you in more detail, but if all you really care about testing is a line table with linked addresses in it (some of which are 0), you could start out by building a line table that you want in YAML (or asm) then add the necessary other bare minimum scaffolding to make it work. IIRC, you don't actually even need the .text to contain the addresses you have listed, as long as you have appropriate debug data and symbols, so your addresses could be fairly arbitrary.
~$ llvm-mc -g -filetype=obj -triple=x86_64-unknown-linux skip-line-zero.s -o skip-line-zero.o skip-line-zero.s:139:1: error: unknown directive .function_and_data_sections ^
Apologies @ampandey-1995 I was in a hurry and left in a directive that we only support in our downstream toolchain. If you remove that line the test should pass. There's probably more in the assembly that can be stripped out as well. I have edited my original post so that the test should pass now. I didn't want to reply and cause another copy of the massive test text to be posted here. I didn't think about replies including the text - I should have used a gist or something!
I think part of the problem with the test complexity is because you are trying to use a fully linked object file as the input to the test. Converting it into YAML doesn't really do much other than change its representation from an ELF object into a YAML description of that ELF object. Usually, when we talk about using yaml2obj to generate test inputs, we are using a dramatically pared down version of an object, something that no linker would ever produce. For example, most of the sections in your YAML file are probably unnecessary for the test, so you should remove them. Remember the thing you're testing doesn't need to be some completely runnable program, it just needs to have sufficient symbols/debug info etc to exercise the behaviour in llvm-symbolizer that you want to test. Additionally, obj2yaml doesn't really understand how to generate proper DWARF section descriptions in YAML, so falls back to using hex descriptions, which are, as @bd1976bris has pointed out, opaque and unreadable.
Actually, I took the inference from the test llvm-project/llvm/test/tools/llvm-symbolizer/data-location.yaml
for the creation of approximation-line-handcrafted.yaml
. I can remove some of the unnecessary symbols from the yaml if it's ok.
yaml2obj DOES understand special descriptions that allow you to describe by hand the line table, for example (see the yaml2obj tests I pointed out before at llvm/test/tools/yaml2obj/ELF/DWARF), but you'll need to write these yourself, taking inspiration from an existing object for how the line table might be structured, rather than just trying to use obj2yaml to make them.
Ok , So I think then If it's ok for everyone. I will create the line-table taking inference from llvm/test/tools/yaml2obj/ELF/DWARF
I'll leave it to the main reviewers to guide you in more detail, but if all you really care about testing is a line table with linked addresses in it (some of which are 0), you could start out by building a line table that you want in YAML (or asm) then add the necessary other bare minimum scaffolding to make it work. IIRC, you don't actually even need the .text to contain the addresses you have listed, as long as you have appropriate debug data and symbols, so your addresses could be fairly arbitrary.
Ok , Will try to amend the current test approximate-line-handcrafted.yaml
which is similar in format to llvm/test/tools/yaml2obj/ELF/DWARF/debug-line.yaml
.
~$ llvm-mc -g -filetype=obj -triple=x86_64-unknown-linux skip-line-zero.s -o skip-line-zero.o skip-line-zero.s:139:1: error: unknown directive .function_and_data_sections ^
Apologies @ampandey-1995 I was in a hurry and left in a directive that we only support in our downstream toolchain. If you remove that line the test should pass. There's probably more in the assembly that can be stripped out as well. I have edited my original post so that the test should pass now. I didn't want to reply and cause another copy of the massive test text to be posted here. I didn't think about replies including the text - I should have used a gist or something!
Hi @bd1976bris , there is no need to apologize for the same. Although, I tested your latest test case.
~$ llvm-mc -g -filetype=obj -triple=x86_64-unknown-linux skip-line-zero.s -o skip-line-zero.o
<unknown>:0: error: symbol '.Lline_table_start0' is already defined
Github also allows attachment if it's ok can you please share the full test-case (without stripping) through attachments.
Hi @bd1976bris , there is no need to apologize for the same. Although, I tested your latest test case.
~$ llvm-mc -g -filetype=obj -triple=x86_64-unknown-linux skip-line-zero.s -o skip-line-zero.o <unknown>:0: error: symbol '.Lline_table_start0' is already defined
This is happening because you are passing -g
. This causes the assembler to generate a line table which conflicts with the one that I have handwritten. The test should pass as the llvm-mc line in the test does not use -g
.
Hi @bd1976bris , there is no need to apologize for the same. Although, I tested your latest test case.
~$ llvm-mc -g -filetype=obj -triple=x86_64-unknown-linux skip-line-zero.s -o skip-line-zero.o <unknown>:0: error: symbol '.Lline_table_start0' is already defined
This is happening because you are passing
-g
. This causes the assembler to generate a line table which conflicts with the one that I have handwritten. The test should pass as the llvm-mc line in the test does not use-g
.
Yeah , I will restrict using -g in llvm-mc invocation as it is overwriting the debug-info already created in assembly.
Also -gline-line-tables
dosen't produce the elaborate debug-info like that which you shared in your test case. That is why my debuginfo generated in not complete enough to modify the line-table section parts. I'll try to resolve this also as amending line-table requires complete information of line-table section.
@ampandey-1995 I fixed a problem with my test by correcting the end sequence address for symbol foo
. I have also stripped out as much as I think I can get away with. I have updated the example in the comment but also attached the test file here: skip-line-zero.zip
@ampandey-1995 I hope you have everything you need for the testing now? Please ask if there's anything else.
@ampandey-1995 I hope you have everything you need for the testing now? Please ask if there's anything else.
Sorry, I was busy with some internal work and also it took time to understand and encode the handwritten DWARF Line Table assembly . I was basically trying to make it work with dwarf-version 5 but couldn't able to get to the point to sucessfully encode the header for dwarf V5. I have successfully handcoded encoded assembly using dwarf v4.
@ampandey-1995 I hope you have everything you need for the testing now? Please ask if there's anything else.
I have successfully handcoded encoded assembly using dwarf v4.
I think dwarf 4 is fine. There are no changes to the line table between dwarf v4 and v5 that would affect --skip-line-zero.
The tests are much improved IMO.
My high-level comment is that we don't need llvm/test/tools/llvm-symbolizer/approximate-line-generated.s
. The comment near the top of that test states..
## This test illustrates the usage of generated assembly by clang to produce the following line table:
..but the point of the testing being added is to test --skip-line-zero
. We need to produce test input with a valid line table to test --skip-line-zero
and the tests should do so in the most readable manner, however, as long as the test input is correct it doesn't matter "how" it was produced. I see no benefit to Clang generating the line table assembly vs hand writing it.
I would move the test cases in approximate-line-generated.s
into approximate-line-handcrafted.s
(although some appear to be duplicates which can be dropped). I would also rename approximate-line-handcrafted.s
to reflect that it is testing --skip-line-zero (e.g. skip-line-zero.s)
.
The tests are much improved IMO.
Thanks.
My high-level comment is that we don't need
llvm/test/tools/llvm-symbolizer/approximate-line-generated.s
. The comment near the top of that test states..
## This test illustrates the usage of generated assembly by clang to produce the following line table:
..but the point of the testing being added is to test
--skip-line-zero
. We need to produce test input with a valid line table to test--skip-line-zero
and the tests should do so in the most readable manner, however, as long as the test input is correct it doesn't matter "how" it was produced. I see no benefit to Clang generating the line table assembly vs hand writing it.I would move the test cases in
approximate-line-generated.s
intoapproximate-line-handcrafted.s
(although some appear to be duplicates which can be dropped). I would also renameapproximate-line-handcrafted.s
to reflect that it is testing --skip-line-zero (e.g.skip-line-zero.s)
.
Ok, I'll remove the approximate-line-generated.s and move non-duplicated tests to skip-line-zero.s
ping
LLVM Symbolizer attempt to symbolize addresses of optimized binaries reports missing line numbers for some cases. It maybe due to compiler which sometimes cannot map an instruction to line number due to optimizations. Symbolizer should handle those cases gracefully.
Adding an option '--skip-line-zero' to symbolizer so as to report the nearest non-zero line number.