Demangle Symbols in Debuggers (LLDB, GDB)

miguelmartin75 commented 1 year ago

Summary

Related issue which is closed: https://github.com/nim-lang/Nim/issues/8596

Nim can be debugged with LLDB (or GDB)
Name mangling causes UX issues with debugging in LLDB and GDB by requiring you to refer to Nim symbols in their mangled form.
- The suggested workaround is quite hacky. For variables, this requires you to print all local or global variables. Then you scan and find the variable in a GUI or Terminal output. For other symbols, such as breaking at a function call, I can see this being quite frustrating.
- Preferably we would not have to refer to names as mangled, and e.g. could print x rather than print x_<nim-specific-mangle>
This is a common problem, as noted from the forums:

Description

Here are my findings from researching LLDB. I have not researched GDB. I thought I would post them here in case others wanted to implement/execute this or whether I have missed something in my proposed solution.

For LLDB, one needs to:

(Required) Let LLDB know how to identify the mangling scheme & how to de-mangle a symbol
(Optional) Implement a Language plugin for deeper LLDB integration

References:

For D: https://reviews.llvm.org/D110578
For Swift: if one looks at Apple's fork of LLDB, we can see the Language plugin being implemented for Swift along with code for demangling & identification of swift's mangling scheme.

From reading the source: a unique mangling scheme identifiable from others is needed along with code to de-mangle it. All mangling schemes used by other languages/compilers (C++/Itanium, C++/MSVC, D, Rust) use a prefix to classify how/from what compiler the name was mangled.

For Nim: identifying the mangling scheme/language from a mangled name is more complex. This is because Nim is compiled into a target language that uses an existing mangling scheme. If we had control over the binary or Debug Symbol output file (e.g. DWARF), I believe this would be easier, but again: since the target language's compiler is being used it is slightly more complex.

To solve this with today's standard Nim compiler, here are my researched steps:

Contain/embed a unique constant identifier within each symbol to identify that this symbol was output from the Nim compiler. Modifications to be done here: https://github.com/nim-lang/Nim/blob/502a4486aeb8d0a5dcdf86540522d3dc16960536/compiler/ccgutils.nim#L71
- This unfortunately would have a chance to overlap with identifiers that are used for C or C++ code in existing codebases. Unicode symbols would allow for rare conflicts but would require C99 or above
  - This probably requires an RFC and further discussion
Modify LLDB:
1. Modify the Mangle class
  1. Add mangling scheme enum entry for Nim here: https://github.com/llvm/llvm-project/blob/main/lldb/include/lldb/Core/Mangled.h#L41-L48
  2. Classify if the symbol originates from the Nim compiler with the above knowledge: https://github.com/llvm/llvm-project/blob/main/lldb/source/Core/Mangled.cpp#L42-L79
    - Implementation seems to require one-level deep recursion
  3. Call & implement demangling code in C++
    - Getting this accepted to LLDB might be difficult (due to valid C/C++ identifiers). Perhaps a compiler option similar to Apple's LLDB (see here) or a run-time flag would be appropriate here (seems to require many modifications of LLDB, maybe LLVM folks know best here)
(optional): implement a Language plugin. Why? Deeper integration with LLDB
- See here for swift's language plugin, here's is the Language class: https://github.com/apple/llvm-project/blob/40e3ca95e3f05c7b5286092d52a33a751a717a5e/lldb/source/Plugins/Language/Swift/SwiftLanguage.h#L26
- Docs seem to be lacking, but it seems to be for:
  - Help/docs on symbols
  - De-mangle functions without parameters mangled in the name (GetDemangledFunctionNameWithoutArguments)
  - Probably other things, for Swift it seems to be related to the REPL integration with LLDB

Alternatives

Here are some alternatives I can think of, but will likely require more work:

Modify the nim compiler to output the target assembly directly (or via LLVM), this is related to NIR
- It would be likely be easier convincing the LLVM/LLDB team to merge the name de-mangling changes for Nim if it did not conflict with C/C++ symbols
Write a debugger in Nim. Pros:
- Would offer a chance to integrate with the compiler, i.e. to evaluate nimscript in the debugger or to modify the program at run-time / to provide a REPL similar to Swift
- Reading & modifying the LLDB code is hard with all the OOP/abstraction

Examples

No response

Backwards Compatibility

My proposed solution will change the way the nim compiler mangles, but for backward compatibility: one could offer a flag to mangle the old way. Though I don't think this flag would be necessary: just re-compile your source if you want debugging support.

Links

Mangling & D:

https://dlang.org/blog/2017/12/20/ds-newfangled-name-mangling/

LLDB codepointers:

Writing a debugger:

Zectbumo commented 12 months ago

+1 Please let's write our own debugger.

ire4ever1190 commented 11 months ago

Implementing for GDB would be similar process ^1. Imo adding support to existing debuggers is better than writing our own since it means less maintenance and allows easy integration with existing tools

nim-lang / RFCs