llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.06k stars 11.59k forks source link

LLDB: Inconsistant existance of expression global variable (`use of undeclared identifier`) #84806

Closed royitaqi closed 6 months ago

royitaqi commented 6 months ago

Problem

During LLDB debug session, global variables that are declared in a previous expression evaluation is sometimes reported as undeclared identifier after a program restart.

TL;DR LLDB output (see full output at the end):

(lldb) p int $i = 32; $i
(int) 32
...
(lldb) r
...
(lldb) p $i
(int) 32
(lldb) p $i + a
error: <user expression 2>:1:1: use of undeclared identifier '$i'
    1 | $i + a
      | ^
(lldb) p $i
error: <user expression 3>:1:1: use of undeclared identifier '$i'
    1 | $i
      | ^
(lldb) 

In the above, there are two issues:

  1. $i in $i + a was deemed undeclared after a print of $i itself was successful.
  2. After the above issue happened, the printing of $i itself no longer works, either.

Repro steps

Create a main.cpp:

int main() { 
    int a = 1;
    return a;
}

Compile:

clang++ -g -O0 main.cpp

Start lldb a.out then type the following comands. The output from LLDB follows.

b /main/
r

p a
p int $i = 32; $i
p $i + a
r

p $i
p $i + a
p $i

Full LLDB output:

(lldb) target create "a.out"
Current executable set to '/Users/<username>/tmp2/a.out' (arm64).
(lldb) b /main/
Breakpoint 1: where = a.out`main + 12 at main.cpp:2:9, address = 0x0000000100003f98
(lldb) r

Process 73517 launched: '/Users/<username>/tmp2/a.out' (arm64)

Process 73517 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100003f98 a.out`main at main.cpp:2:9
   1    int main() {
-> 2        int a = 1;
   3        return a;
   4    }
(lldb) p a
(int) -2035101492
(lldb) p int $i = 32; $i
(int) 32
(lldb) p $i + a
(int) -2035101460
(lldb) r
There is a running process, kill it and restart?: [Y/n] 
Process 73517 exited with status = 9 (0x00000009) killed
Process 73523 launched: '/Users/<username>/tmp2/a.out' (arm64)
Process 73523 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100003f98 a.out`main at main.cpp:2:9
   1    int main() {
-> 2        int a = 1;
   3        return a;
   4    }
(lldb) p $i
(int) 32
(lldb) p $i + a
error: <user expression 2>:1:1: use of undeclared identifier '$i'
    1 | $i + a
      | ^
(lldb) p $i
error: <user expression 3>:1:1: use of undeclared identifier '$i'
    1 | $i
      | ^
(lldb) 
llvmbot commented 6 months ago

@llvm/issue-subscribers-lldb

Author: None (royitaqi)

# Problem During LLDB debug session, global variables that are declared in a previous expression evaluation is sometimes reported as undeclared identifier _after_ a program restart. TL;DR LLDB output (see full output at the end): ``` (lldb) p int $i = 32; $i (int) 32 ... (lldb) r ... (lldb) p $i (int) 32 (lldb) p $i + a error: <user expression 2>:1:1: use of undeclared identifier '$i' 1 | $i + a | ^ (lldb) p $i error: <user expression 3>:1:1: use of undeclared identifier '$i' 1 | $i | ^ (lldb) ``` In the above, **there are two issues**: 1. `$i` in `$i + a` was deemed undeclared after a print of `$i` itself was successful. 2. After the above issue happened, the printing of `$i` itself no longer works, either. # Repro steps Create a `main.cpp`: ``` int main() { int a = 1; return a; } ``` Compile: ``` clang++ -g -O0 main.cpp ``` Start `lldb a.out` then type the following comands. The output from LLDB follows. ``` b /main/ r p a p int $i = 32; $i p $i + a r p $i p $i + a p $i ``` Full LLDB output: ``` (lldb) target create "a.out" Current executable set to '/Users/<username>/tmp2/a.out' (arm64). (lldb) b /main/ Breakpoint 1: where = a.out`main + 12 at main.cpp:2:9, address = 0x0000000100003f98 (lldb) r Process 73517 launched: '/Users/<username>/tmp2/a.out' (arm64) Process 73517 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 frame #0: 0x0000000100003f98 a.out`main at main.cpp:2:9 1 int main() { -> 2 int a = 1; 3 return a; 4 } (lldb) p a (int) -2035101492 (lldb) p int $i = 32; $i (int) 32 (lldb) p $i + a (int) -2035101460 (lldb) r There is a running process, kill it and restart?: [Y/n] Process 73517 exited with status = 9 (0x00000009) killed Process 73523 launched: '/Users/<username>/tmp2/a.out' (arm64) Process 73523 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 frame #0: 0x0000000100003f98 a.out`main at main.cpp:2:9 1 int main() { -> 2 int a = 1; 3 return a; 4 } (lldb) p $i (int) 32 (lldb) p $i + a error: <user expression 2>:1:1: use of undeclared identifier '$i' 1 | $i + a | ^ (lldb) p $i error: <user expression 3>:1:1: use of undeclared identifier '$i' 1 | $i | ^ (lldb) ```
svs-quic commented 6 months ago

Bisected to https://github.com/llvm/llvm-project/commit/db814552132d614e410118e22b22c89d35ae6062. cc: @kastiglione

Simpler way to reproduce:

(lldb) b main
(lldb) run
Process 1 stopped
* thread #1, stop reason = breakpoint 1.1
   1    int main() { 
-> 2        int a = 1;
   3        return a;
   4    }
(lldb) p int $i = 32
(lldb) p $i
(int) 32
(lldb) p $i
error: <user expression 1>:1:1: use of undeclared identifier '$i'
    1 | $i
      | ^
svs-quic commented 6 months ago

On further digging it looks like persistent results was disabled for dwim-print in https://github.com/llvm/llvm-project/commit/385496385476fc9735da5fa4acabc34654e8b30d.

Using the expression command instead of the print command seems to work fine.

(lldb) b main
(lldb) run
Process 1 stopped
   1    int main() { 
-> 2        int a = 1;
   3        return a;
   4    }
(lldb) exp a
(int) $0 = 1072693252
(lldb) exp int $i = 32; $i
(int) $1 = 32
(lldb) exp $i + a
(int) $2 = 1072693284
(lldb) r
There is a running process, kill it and restart?: [Y/n] Y
Process 1 stopped
* thread #1, stop reason = breakpoint 1.1

   1    int main() { 
-> 2        int a = 1;
   3        return a;
   4    }
(lldb) exp $i
(int) $i = 32
(lldb) exp $i + a
(int) $3 = 1072693284
(lldb) exp $i
(int) $i = 32
royitaqi commented 6 months ago

Please bear with me (a noob). I have several confusions:

1. What are the difference between dwin-print and expression?

They seems they both evaluate the input and print the result.

Their help message reads the same to me.

  dwim-print        -- Print a variable or expression.
  expression        -- Evaluate an expression on the current thread.  Displays any returned value with
                       LLDB's default formatting.

I glanced through the tutorial (https://lldb.llvm.org/use/tutorial.html) and the command map (https://lldb.llvm.org/use/map.html) and cannot tell the difference. They seemed to be used interchangably.

If they are indeed different, is there documentation about what's the difference? Otherwise from a product/user perspective it's really confusing.

2. Is this the expected product behavior (that LLDB should have two print-ish commands that does different things in some cases)?

3. What does "dwin" mean? I assume I'm missing some context.

royitaqi commented 6 months ago

Also I don't understand how this issue is related to persistent result (if I understand correctly, persistent result means the $0, $1, $2 in that commit's description, but not $i in this issue).

kastiglione commented 6 months ago

@royitaqi

expression uses embedded compilers (clang, swift, etc) to JIT execute the given expression, inside the process. The result is then displayed to the user.

frame variable reads memory and uses type information to display a variable's contents.

There are other ways to view data, for example register read and memory read.

dwim-print attempts to use the most optimal/direct/logical way of printing an expression. When dwim-print is given a variable, it will use the same mechanism as frame variable, and not expression. This is because using expression can have side effects, be slower, and by invoking the complexity of a compiler, can be more unreliable.

In summary, dwim-print is supposed to be a higher level command that abstracts printing values. The existing commands that print values are also tied to a specific means of getting that data. Many users would only use expression (via the p alias) and not know or use the other commands. dwim-print allows users to not have to think about which way to print something, and focus on debugging instead.

"dwim" means "do what I mean", for reference see https://en.wikipedia.org/wiki/DWIM.

kastiglione commented 6 months ago

Also I don't understand how this issue is related to persistent result (if I understand correctly, persistent result means the $0, $1, $2 in that commit, but not $I).

That's a good point. Note that "persistent results" are short for "persistent result variables". Both $0 and $i are persistent variables, but the former plays a specific role "persistent result variable".

kastiglione commented 6 months ago

I will look into this later today.

jimingham commented 6 months ago

IIUC, this is a request that persistent result variables be imported from one run of the target to another.

That was not a design point for the expression result variables. The expression result variables exist in the memory of the process we are inspecting, and so their lifespan is bounded by the lifespan of the process.

This would be hard to support in full generality. We could keep the constant values in lldb, and then just reinsert them in inferior memory when we restart. But since we can't guarantee that we'll be able to put them back in the same memory location they were in a previous run, any references from one result variable to another would somehow have to be relocated as well.

Jim

On Mar 12, 2024, at 9:33 AM, Dave Lee @.***> wrote:

I will look into this later today.

— Reply to this email directly, view it on GitHub https://github.com/llvm/llvm-project/issues/84806#issuecomment-1992081593, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADUPVWYGDSJ64ZUAPBYLTOTYX4U4FAVCNFSM6AAAAABEQZ3S6WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJSGA4DCNJZGM. You are receiving this because you are on a team that was mentioned.

royitaqi commented 6 months ago

IIUC, this is a request that persistent result variables be imported from one run of the target to another.

This is not the request.

This issue simply points out the inconsistency of "whether the persistent variables are available or not even in the same run" (as shown in OP) and claims that this is unexpected.

The request is to make this behavior consistent, and leave it open for ppl to decide if persistent variables should always be available across runs, or never be.

royitaqi commented 6 months ago

@kastiglione thank you for your earlier explanation. That helped me a lot in my understanding the context. They make more sense to me now.

--

FWIW, it's not clear in lldb's help that p is an alias of expression. Contrary, it shares the same description with dwim-print which made me to believe that p is an alias of dwim-print:

p         -- Print a variable or expression.
dwim-print        -- Print a variable or expression.

This is probably a minor separate documentation topic.

kastiglione commented 6 months ago

@royitaqi p is an alias for dwim-print. Up until last year, p was an alias for expression. Aliases are use configurable, so while p defaults to dwim-print, users can re-alias p to expression or anything else they want.

kastiglione commented 6 months ago

On further digging it looks like persistent results was disabled for dwim-print in https://github.com/llvm/llvm-project/commit/385496385476fc9735da5fa4acabc34654e8b30d.

This was done for a number of reasons. With p being an alias, users can configure it to in various ways. This includes the ability to re-enable persistent results:

command unalias p
command alias p dwim-print --persistent-result true -- 

The reasons for this change were consistency, and simplicity. First compare frame variable and expression:

(lldb) v myVar
(int) myVar = 15
(lldb) e myVar
(int) $0 = 15

When printing a variable, it doesn't add much value to restate the variable's name, and it doesn't add much value to make a (persistent) variable for a variable. For these reasons, dwim-print focuses on printing and shows the output without naming it. As mentioned, this behavior can be changed as a default, or, when a user knows they need a persistent result, they can choose to use e instead of p. In our experience, the majority of users don't use persistent result variables (or even know what they are), so this is optimized for the common case, and leaves the customization for more advanced debugger users.

svs-quic commented 6 months ago

I will look into this later today.

So I noticed that if I comment out the below block of code in CommandObjectDWIMPrint.cpp we get back the expected behavior for $i.

if (suppress_result)
  if (auto result_var_sp =
          target.GetPersistentVariable(valobj_sp->GetName())) {
    auto language = valobj_sp->GetPreferredDisplayLanguage();
    if (auto *persistent_state =
            target.GetPersistentExpressionStateForLanguage(language))
      persistent_state->RemovePersistentVariable(result_var_sp);
  }
kastiglione commented 6 months ago

I'll have to debug what GetPersistentVariable is doing.

kastiglione commented 6 months ago

I understand the issue and will create a PR tomorrow.

jimingham commented 6 months ago

Note that persistent expression results aren't really functional after a rerun. For instance:

(lldb) expr int $my_int = 100
(lldb) expr printf("%d\n", $my_int)
100
(int) $0 = 4
(lldb) run
There is a running process, kill it and restart?: [Y/n] 
Process 54851 exited with status = 9 (0x00000009) killed
Process 54855 launched: '/tmp/a.out' (arm64)
Process 54855 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100003f70 a.out`main at tryit.c:6
   3    int
   4    main()
   5    {
-> 6      printf("Hello, there.\n");
          ^
   7      return 0;
   8    }
Target 0: (a.out) stopped.
(lldb) expr printf("%d\n", $my_int)
-113244184
(int) $1 = 11

So when you make the persistent variable, we insert it in memory so that it can be used in future expressions. That works on the run in which the variable was made, but not after a rerun. Doing that after rerun is actually tricky, since we can't guarantee that we'll be able to lay them down at the same address we did in a previous run, and then if one persistent result referred to another, we'd also have to relocate that reference.

We should either mark these variables as "no longer present in the target, can't be used in expressions generally" or we should not try to preserve them. Silently generating variables that don't quite work isn't a great solution.