python / cpython

The Python programming language
https://www.python.org
Other
63.18k stars 30.25k forks source link

Support instruction-level debugging in pdb #103049

Open gaogaotiantian opened 1 year ago

gaogaotiantian commented 1 year ago

Feature or enhancement

Support instruction-level debugging in pdb

Pitch

pdb could provide a better debugging experience by supporting instruction level debugging. We already have most of the utilities but we need to put them together.

The new commands will be introduced: li(listinst), lli(longlistinst), si(stepinst) and ni(nextinst) (Another candidate would be dis, which is short for import dis; dis.dis()).

li and lli will list source file with the instructions.

(Pdb) lli
  4     def f():
               0 RESUME                   0
  5         a = [1, 2, 3]
               2 BUILD_LIST               0
               4 LOAD_CONST               1 ((1, 2, 3))
               6 LIST_EXTEND              1
               8 STORE_FAST               0 (a)
  6         breakpoint()
              10 LOAD_GLOBAL              1 (NULL + breakpoint)
              20 CALL                     0
              30 POP_TOP
  7  ->     g(a)
              32 LOAD_GLOBAL              3 (NULL + g)
              42 LOAD_FAST                0 (a)
              44 CALL                     1
     -->      54 POP_TOP
              56 RETURN_CONST             0 (None)

si will step one instruction ahead and ni will stay in this frame. We have opcode event to support this.

> /home/gaogaotiantian/programs/mycpython/example.py(7)f()
-> g(a)
(Pdb) ni
> /home/gaogaotiantian/programs/mycpython/example.py(7)f()
-> g(a)
-->      42 LOAD_FAST                0 (a)
(Pdb) ni
> /home/gaogaotiantian/programs/mycpython/example.py(7)f()
-> g(a)
-->      44 CALL                     1
(Pdb) si
--Call--
> /home/gaogaotiantian/programs/mycpython/example.py(1)g()
-> def g(x):
-->       0 RESUME                   0

Previous discussion

Did not find any.

Linked PRs

gaogaotiantian commented 1 year ago

I linked a prototype for the instruction display. Please let me know if we want to proceed on this feature. I can either do two PRs(I would assume si is going to be rather complicated) or do a big one.

ambv commented 1 year ago

We don't want a dis command as it's easy for users to add an alias if they so choose. Using .pdbrc they can define aliases that are always available.

As for working with opcodes, that's an interesting idea! But do we need a separate assembly mode? It feels to me like it would be more flexible to add new commands for this, like lo | listopcodes and llo | longlistopcodes. What do you think?

gaogaotiantian commented 1 year ago

I was actually torn between the two ideas - whether to overload the existing command with a state flipper, or to add two extra commands. There are two reasons I'm leaning slightly toward the state solution:

  1. In my imagine, users either need to debug on instruction level or not. Keeping two very similar functions in the same command seems a bit clearer. It can inherit all the features/options from l and ll.
  2. With the state, we can do some other tricks like display the instruction after the breakpoint(step) in assembly mode.

I can go along with the extra commands as well, the debugger I was using for C in my prev company uses more like a state switch solution. We can only display the current instruction when the user uses si(or in the future other instruction-level commands).

Actually there's another possibility - we overload everything. In assembly mode, step means stepinstruction. I believe this is how windbgx works. If the users switched to "assembly mode", clicking "step" actually steps a single instruction. That's also one possible solution.

The reason I brought up a new command for dis is that there are equivalents in gcc(and probably other debuggers). True the user can achieve that with .pdbrc with alias, but that's true for a lot of other commands(whatis, interact, retval ...). I think the ultimate decision falls on - will this introduce more trouble for the users don't use it, or will it benefits more for the users who do.

Also I guess there are a couple of names for the instruction. disassembly is one and opcode you used. I always think opcode is referring to the actual instruction type like LOAD_GLOBAL or CACHE, whereas instruction is the full package with arguments, line number, positions and stuff. That's also how it's used in dis docs if I understand correctly.

On the side note, now that I think about it, si and ni are two different commands and we need to separate them.

artemmukhin commented 1 year ago

I am excited about this proposal. Indeed, there are plenty of options for how this could be implemented. I would like to know your thoughts on how LLDB handles disassembly.

LLDB has separate commands for source and instruction level stepping:

Additionally, LLDB provides settings to control whether to display disassembly when stopped. I took a couple of screenshots for demonstration.

By default, LLDB only shows source code when stopped, even with instruction level stepping:

no-disassembly

Although it does not display disassembly, it marks the corresponding piece of code that is being executed.

By setting settings set stop-disassembly-display always, you can examine the source code and the assembly code at the same time:

show-disassembly

Do you think pdb could provide a similar user experience? And how beneficial would such behaviour be for Python programmers?

gaogaotiantian commented 1 year ago

pdb currently does not have a settings command to handle all potential settings (maybe for the next pdb we should consider that). assem would be the first state command if introduced.

I guess having assem state determine whether to step inst or line is not that great an idea on command line tools. It's nice on GUI where the button overloading has more benifits. On command line tools, I guess users would like more distinct commands.

Personally, I'd like my debugger to show some difference when I do ni. Just a note here that we can do specific code now with position. I still think displaying the current instruction after ni would be a better user experience. pdb only list a single source line, compared to lldb which has more context. So adding another instruction line would be cheaper(screen space wise) than lldb. Also, it we do not have state, we won't have a choice - so it's either with or without. I'd lean forward to with.

As for the benefits, I'm not sure. I would guess most of the Python users debug their program with print(). Even among the people who are using pdb, most of them probably are not familiar with bytecodes. However, the number of users of Python is so large that if a small portion of a small portion is a significant number. At least for me, I often want to see the actual compiled bytecode of the function to see what's really going on there. (Thus the thought of dis, which would be super convenient).

I'm totally fine with li and lli instead of assem. Just want to hear more from the actual users.

gaogaotiantian commented 1 year ago

Dear nosy:

I've finished a draft for the implementation. I decided to go no-state. Separate commands for all instruction-related stuff. So, li, lli, si and ni. Also when you do si and ni, the current instruction will be displayed in the prompt.

Anyone has suggestions/questions on this? Once the implementation is reviewed and confirmed, I can work on the tests and docs.

artemmukhin commented 1 year ago

Overall, this approach looks good to me! I agree that it better suits CLI than having a separate assembly mode. I also like that ni and si show me both the current source line and the current instruction.

One more thing to consider is the terminal width. Disassembly output can be wide, at least because of the full file paths:

(Pdb) li
  1  -> def foo(a, b):
     -->       2 LOAD_CONST               0 (<code object foo at 0x101651210, file "/very/long/full/path/to/foo.py", line 1>)

However, I do not have a particular idea of how to approach that properly.

gaogaotiantian commented 1 year ago

One more thing to consider is the terminal width. Disassembly output can be wide, at least because of the full file paths:

Unfortunately, this piece is using the internal function of dis directly, which will provide a familiar experience to dis users (assuming most of the people interested in instructions are using dis). dis has similar issues and I guess there's no perfect way to deal with it.

gaogaotiantian commented 1 year ago

Hi all, the feature, test and docs are all finished and ready for review now.

SonOfLilit commented 1 year ago

This is very technically cool, but I can't think of any use cases for end users and it's a lot of code. (Am I just not creative enough?)

I can, however, imagine that core devs working on the Python bytecode compiler will get benefit from it. Maybe we should get buy in from some potential users before complicating Bdb?

gaogaotiantian commented 1 year ago

This is very technically cool, but I can't think of any use cases for end users and it's a lot of code. (Am I just not creative enough?)

I can, however, imagine that core devs working on the Python bytecode compiler will get benefit from it. Maybe we should get buy in from some potential users before complicating Bdb?

Python has f_trace_opcodes, so the ability to trace and debug opcode (instruction) is natural. The end user of Python is a large base, I for example, often need the capability to debug on instruction level. It's also very helpful to debug CPython itself.

All I'm saying is, there are different needs for different developers - for a lot of developers, they do not even use debuggers,. Having a working solution for debugging instructions does not make pdb worse. And yes, there are more code in bdb, but most of them are on isolated paths that are very specific to instruction tracing. It does not impact the current bdb responsibilities.

For use cases, there is a very common pattern in Python - one-liners. Often they consist of multiple expressions. As of now, pdb can only execute it as a full line, and the debuggability within the line is horrible. Having an instruction-level debugging in pdb would solve that.

However, with the new PEP 669, this work is blocked by the implementation of #103615 , which I'm also responsible for. So this feature will only be visited later.