llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.92k stars 11.52k forks source link

[lldb] Setting breakpoint by basename on template instantiations with template types fails #58362

Open Michael137 opened 1 year ago

Michael137 commented 1 year ago

Tried with top-of-tree as of 4ba360d499f6169d62a9535438b3cf34337ffc67. But works with lldb-14.

$ cat nested.cpp 
template<typename T>
struct C {};

template<typename T>
struct S {
    template<typename U>
    void method() {}
};

template<typename T>
void free_function() {}

int main() {
    free_function<C<int>>();
    free_function<double>();

    S<float> s;
    s.method<C<int>>();
    s.method<double>();
}

$ ./bin/lldb  a.out -o "br se -n method<C<int>>" -o "br se -n \"method<C<int> >\"" -o "br se -n \"free_function<C<int> >\"" -o "q"
(lldb) target create "a.out"
Current executable set to 'a.out' (arm64).
(lldb) br se -n method<C<int>>
Breakpoint 1: no locations (pending).
WARNING:  Unable to resolve breakpoint to any actual locations.
(lldb) br se -n "method<C<int> >"
Breakpoint 2: no locations (pending).
WARNING:  Unable to resolve breakpoint to any actual locations.
(lldb) br se -n "free_function<C<int> >"
Breakpoint 3: no locations (pending).
WARNING:  Unable to resolve breakpoint to any actual locations.
(lldb) q

With lldb-14:

$ lldb a.out -o "br se -n method<C<int>>" -o "br se -n \"method<C<int> >\"" -o "br se -n \"free_function<C<int> >\"" -o "q"
(lldb) target create "a.out"
Current executable set to 'a.out' (arm64).
(lldb) br se -n method<C<int>>
Breakpoint 1: no locations (pending).
WARNING:  Unable to resolve breakpoint to any actual locations.
(lldb) br se -n "method<C<int> >"
Breakpoint 2: where = a.out`void S<float>::method<C<int> >() + 8 at nested.cpp:7:20, address = 0x0000000100003f60
(lldb) br se -n "free_function<C<int> >"
Breakpoint 3: where = a.out`void free_function<C<int> >() at nested.cpp:11:23, address = 0x0000000100003f50
(lldb) q

Details

The symbol name that we read out of DWARF contains a space between the angle brackets (i.e., func<Foo<int> >), which makes sense because that’s how it’s spelled out in DW_AT_name. So when a user sets breakpoint on func<Foo<int>>, we don’t find it in the DWARFMappedHash::MemoryTable anywhere because it doesn’t match the DW_AT_name.

However, when the user adds the space and we fish it out of the DWARF index, we fail to match (in ProcessFunctionDIE) against the demangled string because the demangled name doesn’t contain the space.

Will need to bisect but potentially related to https://github.com/llvm/llvm-project/issues/58360

llvmbot commented 1 year ago

@llvm/issue-subscribers-lldb

Michael137 commented 1 year ago

Following commit caused the regression:

bc150a07f1a14a7923a29499b568d799f7214912 is the first bad commit
commit bc150a07f1a14a7923a29499b568d799f7214912
Author: Nathan Sidwell <nathan@acm.org>
Date:   Mon Mar 28 12:38:24 2022 -0700

    [demangler] No need to space adjacent template closings

    With the demangler parenthesizing 'a >> b' inside template parameters,
    because C++11 parsing of >> there, we don't really need to add spaces
    between adjacent template arg closing '>' chars.  In 2022, that just
    looks odd.

    Reviewed By: MaskRay

    Differential Revision: https://reviews.llvm.org/D123134

 libcxxabi/src/demangle/ItaniumDemangle.h           |   12 +-
 libcxxabi/test/test_demangle.pass.cpp              | 7172 ++++++++++----------
 llvm/include/llvm/Demangle/ItaniumDemangle.h       |   12 +-
 .../llvm-objdump/MachO/symbolized-disassembly.test |    2 +-
 llvm/unittests/Demangle/PartialDemangleTest.cpp    |    2 +-
 5 files changed, 3608 insertions(+), 3592 deletions(-)
Michael137 commented 1 year ago

Not sure what exactly writes out the DW_AT_name with the disambiguating space then. Would've expected this to come from the demangler. But somewhere there's a discrepancy. Either way, we should align DWARF and the demangler so lldb doesn't need to care about this

Michael137 commented 1 year ago

Ok so looks like we create the function name (which gets attached to a DISubProgram) in CGDebugInfo::CreateCXXMemberFunction via CGDebugInfo::GetName. This creates the name by using the AST type printer.

CGDebugInfo::getPrintingPolicy sets PrintingPolicy::SplitTemplateClosers unconditionally true (see https://reviews.llvm.org/D80554). Confirmed that setting this to false fixes the breakpoint naming issue above.

@dwblaikie @adrian-prantl Given the Itanium demangler stopped adding the disambiguating space, is there any appetite in revisiting the debug-info printing policy for templates (perhaps behind some GDB tuning check)? Adding the space makes setting breakpoints on template instantiations quite tedious (afaik, now only possible with basename+disabling the uninteresting ones).

Though I suspect that might break debugging gcc binaries? Alternatively this could be a demangler setting? Not sure if there's precedent for that

adrian-prantl commented 1 year ago

This sounds like a good idea to me. You can use compiler explorer to see what GCC generates. If we can avoid making it an LLDB tuning that would be even better.

Michael137 commented 1 year ago

Yup, GCC emits the template name with a space. I think that's probably partly why we decided to make it the default printing policy to be SplitTemplateClosers == true. Changing the printing policy default will break either this use-case for either GCC or Clang binaries unless we take one of the two other options which are:

  1. Control the demangler format for templates behind some PrintingPolicy-style flag (but that would only
  2. Teach LLDB to parse both new and old template formats

The latter seems like a worst-case scenario (but maybe not that big of a deal, haven't really looked at that part of the parser yet). The former seems plausible, will check if something like that already exists for the Itanium demangler

adrian-prantl commented 1 year ago

Clang has a debugger tuning flag, and on platforms like Darwin, tuning for LLDB is the default. We could certainly change the demangler setting based on the debugger tuning. We just need to make sure that it only affects the debug info output and not any of the generated code.

adrian-prantl commented 1 year ago

LLDB certainly makes a best effort to support code produced by compilers other than Clang, and maybe this is where we could draw the line.

dwblaikie commented 1 year ago

my 2c at a quick glance - this bug points to what appears to me to be a pretty problematically tight coupling in lldb between DW_AT_name and the demangler. These names are not derived the same way, and can vary by quite a bit. I don't think it's practical to depend on those printing identically. What happens next time we want to change the way names are printed in either the demangler or in DW_AT_name - do we end up with another breaking change where old debuggers can't debug new binaries and new debuggers (with fixed demangler) won't be able to debug old binaries? That doesn't seem like a good place to be.

I'd say the right fix here is to address/remove this tight coupling.

Sounds like the right place for the fix would be in "However, when the user adds the space and we fish it out of the DWARF index, we fail to match (in ProcessFunctionDIE) against the demangled string because the demangled name doesn’t contain the space."

The simplified template names work I've been doing has certainly shown up a bunch of places where template names are... complicated.

Take for instance lambdas in template names - those have basically /no/ bearing on the demangled name (there's a bunch of problems with that - they're not canonical or unique, they include source file/line numbers, etc - making it impossible to reliably match these names up between CUs, perform overload resolution, etc).

But also just differences in how GCC and Clang render non-type-template-parameters. Try char literals and I think GCC uses t1<(char)3> whereas clang uses t1<'\03'>, for instance? And lldb's demangler probably does the latter? But I'm not actually sure.

Ah, GCC and Clang were closer than I thought, but don't match either GCC's demangler or LLVM's:

$ g++-12 -g char.cpp -c && llvm-dwarfdump-tot char.o | grep _Z | c++filt 
                DW_AT_linkage_name      ("void f1<(char)3>()")
                DW_AT_linkage_name      ("void f1<(char)120>()")
$ g++-12 -g char.cpp -c && llvm-dwarfdump-tot char.o | grep _Z | llvm-cxxfilt
                DW_AT_linkage_name      ("void f1<(char)3>()")
                DW_AT_linkage_name      ("void f1<(char)120>()")
$ g++-12 -g char.cpp -c && llvm-dwarfdump-tot char.o | grep "\"f1"
                DW_AT_name      ("f1<'\\003'>")
                DW_AT_name      ("f1<'x'>")
$ clang++-tot -g char.cpp -c && llvm-dwarfdump-tot char.o | grep "\"f1"
                DW_AT_name      ("f1<'x'>")
                DW_AT_name      ("f1<'\\x03'>")

So presumably there's already some fuzzy matching? Or even some fairly basic things are broken as well?

(also consider other more complex types like function pointers - likely those get printed out with all manner of different whitespace, parentheses, etc)

labath commented 1 year ago

I'd actually say that simplified template names are the best way to fix this problem. It completely sidesteps the demangled name consistency problem -- by not including the demangled name in the debug info. It also reduces its size, and makes it easier to answer compiler queries during expression evaluation (clang asks "What's foo?, not "What's foo<int>?").

Another way (potentially as a transitional step) to solve this would be to include the basename in the accelerator tables (in addition to the DW_AT_name and DW_AT_linkage_name that we add now). That way, lldb could look up the templated function by its base name, and then perform a more elaborate matching on the result.

So presumably there's already some fuzzy matching? Or even some fairly basic things are broken as well?

I'd expect this is just broken. People don't set breakpoints by the demangled name very much, precisely because it's so hard to get them right. Still, one could be forgiven for thinking that pasting a string from e.g. an lldb-produced backtrace into lldb's "breakpoint set" command will work. And it currently doesn't, because one uses the demangled (linkage) name and the other uses DW_AT_name.

jimingham commented 1 year ago

On Oct 17, 2022, at 11:20 PM, Pavel Labath @.***> wrote:

I'd actually say that simplified template names are the best way to fix this problem. It completely sidesteps the demangled name consistency problem -- by not including the demangled name in the debug info. It also reduces its size, and makes it easier to answer compiler queries during expression evaluation (clang asks "What's 'foo'?, not "What's 'foo'?").

Another way (potentially as a transitional step) to solve this would be to include the basename in the accelerator tables (in addition to the DW_AT_name and DW_AT_linkage_name that we add now). That way, lldb could look up the templated function by its base name, and then perform a more elaborate matching on the result.

I actually thought we were doing that and was surprised to find out that we weren't.

So presumably there's already some fuzzy matching? Or even some fairly basic things are broken as well?

I'd expect this is just broken. People don't set breakpoints by the demangled name very much, precisely because it's so hard to get them right. Still, one could be forgiven for thinking that pasting a string from e.g. an lldb-produced backtrace into lldb's "breakpoint set" command will work. And it currently doesn't, because one uses the demangled (linkage) name and the other uses DW_AT_name.

We've talked on and off for a while about doing some better kinds of matching against templated, and even just overloaded functions. We should know to treat "arg1,arg2" as equivalent to "arg1, arg2" and so forth, as well as allow things like foo(int, *) for "give me the first argument int variant, don't make me type the rest" and also allowing you not to provide defaulted arguments in the specification and such-like. But so far no one has had time to take up that project. Short of that, we're pretty much just doing exact matching.

Jim

— Reply to this email directly, view it on GitHub https://github.com/llvm/llvm-project/issues/58362#issuecomment-1281866331, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADUPVW7T5O3QP4VQDLYPCRLWDY6RTANCNFSM6AAAAAARFLET4Q. You are receiving this because you are on a team that was mentioned.

dwblaikie commented 1 year ago

I'd actually say that simplified template names are the best way to fix this problem. It completely sidesteps the demangled name consistency problem -- by not including the demangled name in the debug info. It also reduces its size, and makes it easier to answer compiler queries during expression evaluation (clang asks "What's 'foo'?, not "What's 'foo'?").

(minor aside: Looks like something in the mail/github issues pipeline might'vee lost some stuff - I guess this was meant to read:

clang asks "What's foo?", not "What's foo<int>?" ?)

But yeah, perhaps simplified template names is just the right solution to a bunch of problems here.

Another way (potentially as a transitional step) to solve this would be to include the basename in the accelerator tables (in addition to the DW_AT_name and DW_AT_linkage_name that we add now). That way, lldb could look up the templated function by its base name, and then perform a more elaborate matching on the result.

That's probably mildly awkward (either having to plumb an extra string down through LLVM IR, or I guess more likely doing string manipulation on the index entries and stripping anything after < while special casing op<* operator overloads and correctly detecting which ones are templates - well, that only applies to functions, I guess, so maybe we can filter/not worry about those cases) but might be do-able. If we can just switch to simplified template names, it might be easier.

Note that not every name can be/is currently simplified, though (names that lack sufficient DWARF descriptions to reconstitute the original string are not simplified) - so it's not a universal fix. (so maybe that advocates for doing the partial fix, since it'll cover the cases that simplified template names can't)

I actually thought we were doing that and was surprised to find out that we weren't.

So presumably there's already some fuzzy matching? Or even some fairly basic things are broken as well? I'd expect this is just broken. People don't set breakpoints by the demangled name very much, precisely because it's so hard to get them right. Still, one could be forgiven for thinking that pasting a string from e.g. an lldb-produced backtrace into lldb's "breakpoint set" command will work. And it currently doesn't, because one uses the demangled (linkage) name and the other uses DW_AT_name. We've talked on and off for a while about doing some better kinds of matching against templated, and even just overloaded functions. We should know to treat "arg1,arg2" as equivalent to "arg1, arg2" and so forth, as well as allow things like foo(int, *) for "give me the first argument int variant, don't make me type the rest" and also allowing you not to provide defaulted arguments in the specification and such-like. But so far no one has had time to take up that project. Short of that, we're pretty much just doing exact matching. Jim

nod seems like something some folks would benefit from

GCC, FWIW, seems to use the mangled name only (it doesn't care about how GCC or Clang spelled the DW_AT_name) but does allow arbitrary whitespace anywhere in the name (t1<(char)3> works, as does t1 < ( char ) 3 >). And it does this quickly and uses an index - so I guess there's some way to shape the index (such as by just indexing the t1 part, as we're discussing re: simplified template names) that makes it possible to do some more fuzzy matching at least.

Michael137 commented 1 year ago

Another way (potentially as a transitional step) to solve this would be to include the basename in the accelerator tables (in addition to the DW_AT_name and DW_AT_linkage_name that we add now). That way, lldb could look up the templated function by its base name, and then perform a more elaborate matching on the result.

To clarify, you're suggesting we find all candidate functions by basename from the dwarf index (so we find the function regardless of name formatting) and then when matching this against the user input we do some form of fuzzy matching (at the very least on the space between angle brackets)?

I guess more likely doing string manipulation on the index entries and stripping anything after < while special casing op<* operator overloads and correctly detecting which ones are templates

Note that the CPlusPlusNameParser::MethodName knows how to do this already (albeit with some unhandled cornercases), if we were to go down this route.

labath commented 1 year ago

(minor aside: Looks like something in the mail/github issues pipeline might'vee lost some stuff - I guess this was meant to read:

It seems the same thing happened to your post as well. :) The second name was meant to be foo<int>. I have fixed the original post as well.

Note that not every name can be/is currently simplified, though (names that lack sufficient DWARF descriptions to reconstitute the original string are not simplified) - so it's not a universal fix. (so maybe that advocates for doing the partial fix, since it'll cover the cases that simplified template names can't)

That is a good point, and I don't really know how to answer that.

Another way (potentially as a transitional step) to solve this would be to include the basename in the accelerator tables (in addition to the DW_AT_name and DW_AT_linkage_name that we add now). That way, lldb could look up the templated function by its base name, and then perform a more elaborate matching on the result.

To clarify, you're suggesting we find all candidate functions by basename from the dwarf index (so we find the function regardless of name formatting) and then when matching this against the user input we do some form of fuzzy matching (at the very least on the space between angle brackets)?

Precisely.

felipepiovezan commented 1 year ago

This has come up recently again: https://reviews.llvm.org/D153869

I would like to see if we can build consensus towards this suggestion:

I'd actually say that simplified template names are the best way to fix this problem. It completely sidesteps the demangled name consistency problem -- by not including the demangled name in the debug info.

In other words, emit AT_name with the simplified template name. Are there any blockers towards making this the default?

Worth noting:

nother way (potentially as a transitional step) to solve this would be to include the basename in the accelerator tables (in addition to the DW_AT_name and DW_AT_linkage_name that we add now).

This is what has been done for apple_names for a while now.

felipepiovezan commented 1 year ago

Piecing together the history of this, I believe I encountered the same issue but on a different context:

When LLDB needs to evaluate an expression like foo(42) and foo is defined as template<typename A>foo(A), LLDB queries the accelerator table for the name foo. Note that the string foo itself is neither:

  1. AT_name, which in this case would be foo<int>
  2. nor AT_linkage, which in this case would be the mangled name of foo<int>.

(note: the original problem in this thread was about evaluating foo<int>(42), not just foo(42).)

As such, it is guaranteed that LLDB's query will fail, since those are the only two attributes blessed by the DWARF standard to be used as keys in debug_names.

As we've established, even if LLDB were to attempt conjuring the string foo<int>, this is not enough because the contents of AT_name are not standardized.

dsymutil (or rather, DWARFLinker) takes it upon itself to fix this issue by adding the key foo into the table, linking the string foo to every DIE corresponding to a specialization of this function. Note that this enables more functionality in LLDB, but it is not a great solution, as we now have a debug linker doing more than just linking & optimizing: it is enabling functionality that would otherwise not be possible if the debug information were left in object files. There are quite a few tests that are only enabled in "dsym mode", and I suspect this is the reason.

The -gsimple-template-names seems to be what we want to generate by default to address the issue more broadly. However, it has the constraint that it will only work on types for which LLDB can recreate the original name from the rest of debug info (for example, it won't handle noexcept functions). This makes sense in some contexts, but would not allow us to handle the issue above.

As such I am unsure what we can do here. -gsimple-template-names seems like the way to go, but its "recoverability" constraint means it can fail to simplify names and therefore we cannot rely on it to fix the issue I described above. In other words, we still need a "if we can't simplify this, add a third entry in the accelerator table". While the standard requires the key to be either AT_name or AT_linkage_name, it also says that:

A producer may choose to implement additional rules for what names are placed in the index, and may communicate those rules to a cooperating consumer via an augmentation string, described below.

dwblaikie commented 1 year ago

-gsimple-template-names seems like the way to go, but its "recoverability" constraint means it can fail to simplify names and therefore we cannot rely on it to fix the issue I described above. In other words, we still need a "if we can't simplify this, add a third entry in the accelerator table".

Yeah, that seems likely.

Another possible direction is to do the work to improve DWARF to the point where there are no cases that can't be recovered - it'd be good for the ecosystem to not have some names simplified and some not simplified. But that's a bunch of work.

dsymutil (or rather, DWARFLinker) takes it upon itself to fix this issue by adding the key foo into the table, linking the string foo to every DIE corresponding to a specialization of this function

Oh, wow, I didn't know that. Has that been long-standing? (@JDevlieghere ?)

Yeah, that seems really unfortunate that the debugging experience would be different with a dsym than with the .os directly.

The heuristic that dsymutil uses to determine the base name is probably pretty simple (strip everything after the first <) and could be implemented/shared with LLVM's DWARF generation code so fixing this wouldn't require a change to Clang/LLVM IR Debug Info metadata to ferry an extra name down, etc...

felipepiovezan commented 1 year ago

The heuristic that dsymutil uses to determine the base name is probably pretty simple (strip everything after the first <) and

It is slightly more complicated than that, as it needs to account for the <<, >>, <=> operators. But it is still simple: https://github.com/llvm/llvm-project/blob/343e204a52845dcd7bb7e7b8213513bcc33939c8/llvm/lib/DWARFLinker/DWARFLinker.cpp#L143

could be implemented/shared with LLVM's DWARF generation code so fixing this wouldn't require a change to Clang/LLVM IR Debug Info metadata to ferry an extra name down, etc...

This is what I am tempted to propose, as it would be orthogonal to the simple template names flag (they could be used independently or simultaneously). However, note that it would make the DWARFVerifier fail, since those names are neither AT_name nor AT_linkage_name. Personally, I am tempted to remove that verifier check because of this text in the standard: "A producer may choose to implement additional rules for what names are placed in the index, [...]"

dwblaikie commented 1 year ago

However, note that it would make the DWARFVerifier fail, since those names are neither AT_name nor AT_linkage_name.

Which is already true, though, yeah? (with llvm-dsymutil generated .debug_names) - so it seems that's something we should fix anyway? & seems like we would want to fix it to be more lenient/allow for base name entries for templates in the accelerator table.

It is slightly more complicated than that, as it needs to account for the <<, >>, <=> operators.

Ah, nifty. I'd have to stare at that code some more - but actually that might be suitable for generalizing -gsimple-template-names a bit more. I punted on all operator overloads because I wasn't sure/didn't think hard enough about how to support detecting if the name was already simplified or not... hmm, might still be hard/not possible (partly because you can have more than one template parameter list - both the operator overload might be a function template, and the return type in a conversion operator might be a template: operator x<y> - is that a simplified name that returns x<y> or a non-simplified template that returns x and has a template parameter list <y>)

felipepiovezan commented 1 year ago

Posted a patch: https://reviews.llvm.org/D155723

Which is already true, though, yeah?

Oops, you're right! So many pieces going on that I forgot about the current behaviour.

is that a simplified name that returns x or a non-simplified template that returns x and has a template parameter list )

Wow, took me a while to parse this, but this is indeed problematic 🤔