llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.23k stars 11.66k forks source link

[lldb] Add support for evaluating expressions in anonymous namespaces. #96963

Closed mvanotti closed 3 months ago

mvanotti commented 3 months ago

I have found that referring to variables inside anonymous namespaces in lldb is a bit tricky if not impossible. In particular, if a variable is in an anonymous namespace, it will not be possible to create a breakpoint condition using the variable name as reported by lldb.

See the following example:

#include <stddef.h>
#include <stdio.h>

namespace {

size_t AnonymousVar = 1;

namespace xoo {
size_t AnonymousXooVar = 2;
}  // namespace xoo

}  // namespace

namespace foo {

namespace {
size_t FooAnonymousVar = 3;
}  // namespace

}  // namespace foo

int main(int argc, char** argv) {
  __asm__ volatile("nop");

  printf("Hello World! Argc is %d, I'm %s\n", argc, argv[0]);
  if (argc == 10000) {
    // Use all variables to avoid them being optimized out.
    __asm__ volatile("" ::"r"(AnonymousVar));
    __asm__ volatile("" ::"r"(xoo::AnonymousXooVar));
    __asm__ volatile("" ::"r"(foo::FooAnonymousVar));
  }
}

Compiled with:

$ clang++ -Wall -Wextra -pedantic -std=c++17 -glldb -O0    repro.cc   -o repro

And ran with the following python program:

#!/usr/bin/python3
import lldb
import os

def main():
  debugger = lldb.SBDebugger.Create()
  debugger.SetAsync(False)

  target_prog = './repro'
  argv = []

  target = debugger.CreateTargetWithFileAndArch(
      target_prog, lldb.LLDB_ARCH_DEFAULT_64BIT
  )

  assert target.IsValid()

  bp = target.BreakpointCreateByName('main')
  error = lldb.SBError()

  process = target.Launch(
    debugger.GetListener(),
    argv,
    [],  # envp
    None,
    None,
    None,
    os.getcwd(),
    0,  # launch_flags
    False,  # stop_at_entry
    error,
  )

  assert error.Success(), error
  assert process.GetState() == lldb.eStateStopped
  assert len(process.threads) == 1
  thread = process.threads[0]
  assert thread.IsValid()
  frames = thread.get_thread_frames()
  assert frames
  frame = frames[0]
  assert frame.IsValid()
  print(frame.line_entry.line)

  bp = target.BreakpointCreateByLocation('repro.cc', 26)
  variable_names = set(['FooAnonymousVar', 'AnonymousVar', 'AnonymousXooVar'])
  variables = [var for var in frame.variables if any([name in var.name for name in variable_names])]
  condition = ' && '.join(f'{var.name} == {var.value}' for var in variables)
  print(f'Setting Breakpoint condition to: {condition}')
  bp.SetCondition(condition)
  thread.StepOutOfFrame(frame)
  assert(process.GetState() == lldb.eStateStopped)
  assert frame.line_entry.line == 26, frame.line_entry.line

if __name__ == '__main__':
  main()
$ python3 ./debug.py 
24
Setting Breakpoint condition to: foo::(anonymous namespace)::FooAnonymousVar == 3 && (anonymous namespace)::AnonymousVar == 1 && (anonymous namespace)::xoo::AnonymousXooVar == 2
error: stopped due to an error evaluating condition of breakpoint 2.1: "foo::(anonymous namespace)::FooAnonymousVar == 3 && (anonymous namespace)::AnonymousVar == 1 && (anonymous namespace)::xoo::AnonymousXooVar == 2"
Couldn't parse conditional expression:
error: <user expression 0>:1:6: expected unqualified-id
foo::(anonymous namespace)::FooAnonymousVar == 3 && (anonymous namespace)::AnonymousVar == 1 && (anonymous namespace)::xoo::AnonymousXooVar == 2
     ^
error: <user expression 0>:1:7: use of undeclared identifier 'anonymous'
foo::(anonymous namespace)::FooAnonymousVar == 3 && (anonymous namespace)::AnonymousVar == 1 && (anonymous namespace)::xoo::AnonymousXooVar == 2
      ^

Note that this also happens when using lldb from the cli, and you can sometimes omit the anonymous namespace, but it becomes trickier when you have more complex setups like foo::(anonymous namespace)::bar and variables whose name collide.

llvmbot commented 3 months ago

@llvm/issue-subscribers-lldb

Author: Marco Vanotti (mvanotti)

I have found that referring to variables inside anonymous namespaces in `lldb` is a bit tricky if not impossible. In particular, if a variable is in an anonymous namespace, it will not be possible to create a breakpoint condition using the variable name as reported by `lldb`. See the following example: ```c++ #include <stddef.h> #include <stdio.h> namespace { size_t AnonymousVar = 1; namespace xoo { size_t AnonymousXooVar = 2; } // namespace xoo } // namespace namespace foo { namespace { size_t FooAnonymousVar = 3; } // namespace } // namespace foo int main(int argc, char** argv) { __asm__ volatile("nop"); printf("Hello World! Argc is %d, I'm %s\n", argc, argv[0]); if (argc == 10000) { // Use all variables to avoid them being optimized out. __asm__ volatile("" ::"r"(AnonymousVar)); __asm__ volatile("" ::"r"(xoo::AnonymousXooVar)); __asm__ volatile("" ::"r"(foo::FooAnonymousVar)); } } ``` Compiled with: ``` $ clang++ -Wall -Wextra -pedantic -std=c++17 -glldb -O0 repro.cc -o repro ``` And ran with the following python program: ```python #!/usr/bin/python3 import lldb import os def main(): debugger = lldb.SBDebugger.Create() debugger.SetAsync(False) target_prog = './repro' argv = [] target = debugger.CreateTargetWithFileAndArch( target_prog, lldb.LLDB_ARCH_DEFAULT_64BIT ) assert target.IsValid() bp = target.BreakpointCreateByName('main') error = lldb.SBError() process = target.Launch( debugger.GetListener(), argv, [], # envp None, None, None, os.getcwd(), 0, # launch_flags False, # stop_at_entry error, ) assert error.Success(), error assert process.GetState() == lldb.eStateStopped assert len(process.threads) == 1 thread = process.threads[0] assert thread.IsValid() frames = thread.get_thread_frames() assert frames frame = frames[0] assert frame.IsValid() print(frame.line_entry.line) bp = target.BreakpointCreateByLocation('repro.cc', 26) variable_names = set(['FooAnonymousVar', 'AnonymousVar', 'AnonymousXooVar']) variables = [var for var in frame.variables if any([name in var.name for name in variable_names])] condition = ' && '.join(f'{var.name} == {var.value}' for var in variables) print(f'Setting Breakpoint condition to: {condition}') bp.SetCondition(condition) thread.StepOutOfFrame(frame) assert(process.GetState() == lldb.eStateStopped) assert frame.line_entry.line == 26, frame.line_entry.line if __name__ == '__main__': main() ``` ```shell $ python3 ./debug.py 24 Setting Breakpoint condition to: foo::(anonymous namespace)::FooAnonymousVar == 3 && (anonymous namespace)::AnonymousVar == 1 && (anonymous namespace)::xoo::AnonymousXooVar == 2 error: stopped due to an error evaluating condition of breakpoint 2.1: "foo::(anonymous namespace)::FooAnonymousVar == 3 && (anonymous namespace)::AnonymousVar == 1 && (anonymous namespace)::xoo::AnonymousXooVar == 2" Couldn't parse conditional expression: error: <user expression 0>:1:6: expected unqualified-id foo::(anonymous namespace)::FooAnonymousVar == 3 && (anonymous namespace)::AnonymousVar == 1 && (anonymous namespace)::xoo::AnonymousXooVar == 2 ^ error: <user expression 0>:1:7: use of undeclared identifier 'anonymous' foo::(anonymous namespace)::FooAnonymousVar == 3 && (anonymous namespace)::AnonymousVar == 1 && (anonymous namespace)::xoo::AnonymousXooVar == 2 ^ ``` Note that this also happens when using `lldb` from the cli, and you _can_ sometimes omit the anonymous namespace, but it becomes trickier when you have more complex setups like `foo::(anonymous namespace)::bar` and variables whose name collide.
Michael137 commented 3 months ago

foo::(anonymous namespace)::FooAnonymousVar isn't a valid C++ expression, so the fact the expression evaluator falls over here is kind of expected. FWIW, running the following on your example works for me on macOS:

(lldb) br se -n main -c "FooAnonymousVar == 3 && AnonymousVar == 1 && AnonymousXooVar == 2"
Breakpoint 1: where = a.out`main + 24 at anon.cpp:23:3, address = 0x0000000100003f08
(lldb) run
Process 32146 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100003f08 a.out`main(argc=1, argv=0x000000016fdfec60) at anon.cpp:23:3
   20   }  // namespace foo
   21  
   22   int main(int argc, char** argv) {
-> 23     __asm__ volatile("nop");

I guess the problem here is that you're trying to take the variable names in frame.variables and use those in expressions, not sure if we make any guarantees about that? Maybe someone more familiar with the scripting API has an idea of how to retrieve an alternate name that would work in expressions?

Looks like we get those from the demangler:

0x00000067:       DW_TAG_variable [5]   (0x00000066)
                    DW_AT_name [DW_FORM_strx]   (indexed (0000000f) string = "FooAnonymousVar")
                    DW_AT_type [DW_FORM_ref_addr]       (0x0000000000000058 "size_t")
                    DW_AT_decl_file [DW_FORM_data1]     ("/Users/michaelbuch/anon.cpp")
                    DW_AT_decl_line [DW_FORM_data1]     (17)
                    DW_AT_location [DW_FORM_exprloc]    (DW_OP_addr 0x100008010)
                    DW_AT_linkage_name [DW_FORM_strx]   (indexed (00000010) string = "_ZN3foo12_GLOBAL__N_115FooAnonymousVarE")

$ c++filt -n _ZN3foo12_GLOBAL__N_115FooAnonymousVarE
foo::(anonymous namespace)::FooAnonymousVar
mvanotti commented 3 months ago

Thanks for your reply, Michael!

As a workaround, I'm removing the namespaces from the name of the variables if it contains an anonymous namespace. However, name collisions could lead to both false positives and false negatives (I think in the case of a name collision lldb only picks one variable that matches the expression).

In other debuggers, like gdb, using foo::(anonymous namespace)::FooAnonymousVar in breakpoints conditions seems to work fine.

I think you are right in your comment: I don't really care about the syntax with the anonymous namespace, I just want a way to reliably make breakpoint conditional expressions from frame.variables objects. Ideally, this would be serializable (so I can use the same expressions across multiple runs for the same binary).

Michael137 commented 3 months ago

Side note, in your case, the name we get from SBValue::GetName is ultimately from Mangled::GetName, which will return the demangled name of the DW_AT_linkage_name. That contains (anonymous namespace). Here we're at the mercy of the LLVM demangler (which doesn't make any guarantees about whether the output is a valid C++ expression, see e.g., ABI tags). There is something to be said about an option to the demangler to omit this (anonymous namespace) part (though there hasn't been much precedent for that in the LLVM demangler afaik).

Thanks for your reply, Michael!

As a workaround, I'm removing the namespaces from the name of the variables if it contains an anonymous namespace. However, name collisions could lead to both false positives and false negatives (I think in the case of a name collision lldb only picks one variable that matches the expression).

Yea I'm sure there are cases where we fail to pick the variable correctly. If you have some examples that'd be useful.

In other debuggers, like gdb, using foo::(anonymous namespace)::FooAnonymousVar in breakpoints conditions seems to work fine. I think you are right in your comment: I don't really care about the syntax with the anonymous namespace, I just want a way to reliably make breakpoint conditional expressions from frame.variables objects. Ideally, this would be serializable (so I can use the same expressions across multiple runs for the same binary).

It would be nice to align with GDB (I'm not sure how GDB really does things but I imagine those conditional breakpoints probably don't dispatch to the compiler and thus don't need to be valid expressions). Could you provide an example of this in GDB? I couldn't actually get it to work locally. I just tried something like (gdb) b main if (anonymous namespace)::anon_foo == 5

Either way, this will hopefully get much better with the DIL (Data-Inspection Language) https://discourse.llvm.org/t/rfc-data-inspection-language/69893/2 work. There we won't dispatch simple expressions to the expression evaluator anymore, so we don't need the identifiers to be valid. Though even then, I'm currently not entirely sure how we'd get this (anonymous namespace) to be parsed correctly. Tricky to say at which level this would want to be fixed (CC @jimingham @clayborg @labath who may have some thoughts on this). Though a good start might be to look at the cases where LLDB's expression evaluator finds the incorrect variable, and see if we can iron those out (though I suspect those might also be tricky to get right).

jimingham commented 3 months ago

On Jun 28, 2024, at 2:08 PM, Michael Buch @.***> wrote:

Side note, in your case, the name we get from SBValue::GetName is ultimately from Mangled::GetName, which will return the demangled name of the DW_AT_linkage_name. Which in this case contains (anonymous namespace), here we're at the mercy of the LLVM demangler (which doesn't make any guarantees about whether the output is a valid C++ expression, see e.g., ABI tags). There is something to be said about an option to the demangler to omit this (though there hasn't been much precedent for that in the LLVM demangler afaik).

Thanks for your reply, Michael!

As a workaround, I'm removing the namespaces from the name of the variables if it contains an anonymous namespace. However, name collisions could lead to both false positives and false negatives (I think in the case of a name collision lldb only picks one variable that matches the expression).

Yea I'm sure there are cases where we fail to pick the variable correctly. If you have some examples that'd be useful.

In other debuggers, like gdb, using foo::(anonymous namespace)::FooAnonymousVar in breakpoints conditions seems to work fine. I think you are right in your comment: I don't really care about the syntax with the anonymous namespace, I just want a way to reliably make breakpoint conditional expressions from frame.variables objects. Ideally, this would be serializable (so I can use the same expressions across multiple runs for the same binary).

It would be nice to align with GDB (I'm not sure how GDB really does things but I imagine those conditional breakpoints probably don't dispatch to the compiler and thus don't need to be valid expressions). Could you provide an example of this in GDB? I couldn't actually get it to work locally. I just tried something like (gdb) b main if (anonymouse namespace)::anon_foo == 5

Unless things have changed a lot since I last worked on it, the gdb parser that underlies print and the condition evaluator doesn't attempt to be a language accurate parser, and allows a fair handful bits of added syntax that the language doesn't allow.

Either way, this will hopefully get much better with the DIL (Data-Inspection Language) https://discourse.llvm.org/t/rfc-data-inspection-language/69893/2 work. There we won't dispatch simple expressions to the expression evaluator anymore, so we don't need the identifiers to be valid. Though even then, I'm currently not entirely sure how we'd get this (anonymous namespace) to be parsed correctly. Tricky to say at which level this would want to be fixed (CC @jimingham https://github.com/jimingham @clayborg https://github.com/clayborg @labath https://github.com/labath who may have some thoughts on this). Though a good start might be to look at the cases where LLDB's expression evaluator finds the incorrect variable, and see if we can iron those out (though I suspect those might also be tricky to get right).

For the expression parser we aren't going to do anything, to support specifying (anonymous namespace) are we? This isn't valid C++ and we really do want to keep the expression parser as close to the language standard as we can.

For frame var, this only matters for lookup, right? You want to do something like target var "foo::(anonymous namespace)::FooAnonymousVar" and we just have to make the expression path parser (a separate thing from the expr parser) handle this token.

For the DIL, that's a little more complex, since that intends to parse more complex expressions, not just name lookups. So it will have to add this to its grammar somehow. But this should be pretty easy to add, as it is always going to be the same text and can only appear as a part of the variable path expression.

Jim

— Reply to this email directly, view it on GitHub https://github.com/llvm/llvm-project/issues/96963#issuecomment-2197662013, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADUPVW6RD2EMYVHE3GRFZSDZJXGELAVCNFSM6AAAAABKAUJUUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJXGY3DEMBRGM. You are receiving this because you were mentioned.

mvanotti commented 3 months ago

Yea I'm sure there are cases where we fail to pick the variable correctly. If you have some examples that'd be useful.

For example this program will fail:

#include <stddef.h>

namespace {
namespace foo {
size_t Var1 = 0;
}  // namespace foo
namespace bar {
size_t Var1 = 0;
}  // namespace bar
}  // namespace

void x(void) { __asm__ volatile("nop"); }

int main(void) {
  foo::Var1 = 100;
  bar::Var1 = 50;

  x();
}
(lldb) breakpoint set -n x --condition "Var1 == 50"
Breakpoint 1: where = program4`x() + 4 at program4.cc:12:16, address = 0x0000000000001744
(lldb) run
Process 583065 launched: '/home/user/test/program4' (x86_64)
Process 583065 exited with status = 0 (0x00000000) 

But will work if I use bar::Var1.

In the following example, the failure mode is a bit different:

#include <stddef.h>

namespace xoo {
namespace {
namespace foo {
size_t Var1 = 0;
}  // namespace foo
namespace bar {
size_t Var1 = 0;
}  // namespace bar
}  // namespace
}

void x(void) { __asm__ volatile("nop"); }

int main(void) {
  xoo::foo::Var1 = 100;
  xoo::bar::Var1 = 50;

  x();
}
(lldb) breakpoint set -n x --condition "xoo::bar::Var1 == 50"
Breakpoint 1: where = program5`x() + 4 at program5.cc:14:16, address = 0x0000000000001744
(lldb) run
Process 592284 launched: '/home/user/test/program5' (x86_64)
Process 592284 stopped
* thread #1, name = 'program5', stop reason = breakpoint 1.1
    frame #0: 0x0000555555555744 program5`x() at program5.cc:14:16
   11   }  // namespace
   12   }
   13  
-> 14   void x(void) { __asm__ volatile("nop"); }
   15  
   16   int main(void) {
   17     xoo::foo::Var1 = 100;
error: stopped due to an error evaluating condition of breakpoint 1.1: "xoo::bar::Var1 == 50"
Couldn't parse conditional expression:
error: <user expression 0>:1:6: 'bar' is not a class, namespace, or enumeration
    1 | xoo::bar::Var1 == 50
      |      ^
note: 'bar' declared here

It would be nice to align with GDB (I'm not sure how GDB really does things but I imagine those conditional breakpoints probably don't dispatch to the compiler and thus don't need to be valid expressions). Could you provide an example of this in GDB? I couldn't actually get it to work locally. I just tried something like (gdb) b main if (anonymous namespace)::anon_foo == 5

I think you need quotes in your expression, this worked for me (using the first post code example):

(gdb) b repro.cc:26 if "(anonymous namespace)::AnonymousVar == 1"
Breakpoint 1 at 0x1157: file repro.cc, line 26.
(gdb) run
Starting program: /home/user/test/repro 

Breakpoint 1, main (argc=1, argv=0x7fffffffe038) at repro.cc:26
26        printf("Hello World! Argc is %d, I'm %s\n", argc, argv[0]);

(...) Though a good start might be to look at the cases where LLDB's expression evaluator finds the incorrect variable, and see if we can iron those out (though I suspect those might also be tricky to get right).

I'm not sure lldb is doing something wrong if I was ambiguous when I set my condition expression. The problem is that if the variable is in an anon namespace I don't know how to reference it in any other way.

Michael137 commented 3 months ago

For the expression parser we aren't going to do anything, to support specifying (anonymous namespace) are we? This isn't valid C++ and we really do want to keep the expression parser as close to the language standard as we can.

Agreed, I don't think we want to start deviating from standard C++ for this in the expression evaluator. I was just suggesting we take a closer look at the cases where referencing the anonymous namespace variables by basename trips up the expression evaluator.

For frame var, this only matters for lookup, right? You want to do something like target var "foo::(anonymous namespace)::FooAnonymousVar" and we just have to make the expression path parser (a separate thing from the expr parser) handle this token.

I think @mvanotti mainly wants this for conditional-breakpoints. Which wouldn't benefit from added support in frame var (unless I'm misunderstanding how conditional-breakpoint expressions work under the hood, I thought they dispatch to the expression evaluator).

For example this program will fail:

Thanks! Yea here you're also trying to run an expression that's technically not valid C++ (if you did Val1 == 50 in the source itself you'd get an ambiguity error). So maybe we should error out in that case? Might have some other implications that I'm not aware of atm.

The other example is more of an issue. I would've expected LLDB to be able to look through the anonymous namespace (there's at least some support for this when LLDB constructs declaration contexts from DWARF). I vaguely remember trying to fix this some time ago. Oh I think that was for inline namespaces. Worth splitting this out into a separate issue.

When doing some other investigations recently, I found that when we do FindTypes/FindNamespace lookups for types nested in an (anonymous namespace), we fail to find them. But we do end up processing the DIE in the index, only to reject it. So this might be exactly the problem you're seeing. I think this happens because we're comparing the demangled name vs. the name that we used in the index (which doesn't include the (anonymous namespace) prefix).

I think you need quotes in your expression, this worked for me (using the first post code example):

I think that will actually not do what you expect. If you quote the condition it treats it as a string, and thus always evaluate to true. Try changing the condition to something false, and observe how we still hit the breakpoint. At least that's what was happening when I tried it locally (I haven't used GDB in a while, so I might've been doing something wrong).

I'm not sure lldb is doing something wrong if I was ambiguous when I set my condition expression. The problem is that if the variable is in an anon namespace I don't know how to reference it in any other way.

At least one of those cases seems suspect. I think we should be able to make the xoo::bar::Var1 case work

mvanotti commented 3 months ago

I think that will actually not do what you expect. If you quote the condition it treats it as a string, and thus always evaluate to true. Try changing the condition to something false, and observe how we still hit the breakpoint. At least that's what was happening when I tried it locally (I haven't used GDB in a while, so I might've been doing something wrong).

I just double checked and you are right. Using the mangled name of the variable seems to work, both in gdb and lldb. However, I can't find a good way of getting the mangled name in lldb, besides going through SBTarget.FindSymbols.

Michael137 commented 3 months ago

Just had another look at this. Supporting lookup into anonymous namespaces in the expression evaluator shouldn't be too hard, we just need to account for them in TypeSystemClang::DeclContextIsContainedInLookup (we already support transparent lookup through inline namespaces there)

Michael137 commented 3 months ago

Proposed fix for the conditional breakpoint part: https://github.com/llvm/llvm-project/pull/97275

(doesn't address target var)