Open kendalharland opened 1 month ago
@llvm/issue-subscribers-lldb
Author: Kendal Harland (kendalharland)
Is there any contributor that is currently running this test in their CI, and if so does it pass? (Perhaps @dzhidzhoev).
For us, it fails on breakpoint set -n func
line.
Is there any contributor that is currently running this test in their CI, and if so does it pass? (Perhaps @dzhidzhoev).
For us, it fails on
breakpoint set -n func
line.
Thanks for confirming! IIRC this is because LLDB does not load the debug information automatically (I am also unsure if it is supposed to) This error should be resolved via the extra add-dsym
call in my patch above, then the test will go on to fail when it attempts to match realign_stack
and call_func
in thread backtrace
output.
This test does pass for me. In a build with -DLLVM_ENABLE_PROJECTS="clang;lld;lldb" -DLLVM_ENABLE_DIA_SDK=TRUE
(where the build is set up with MSVC), the test does pass, without any modifications.
LLVM's own PDB reader is indeed a bit incomplete, but AFAIK all buildbots that run configurations with LLDB on Windows run with -DLLVM_ENABLE_DIA_SDK=TRUE
.
CC @labath @mstorsjo
Overview
My local build of this test produces a binary whose
realign_stack
andcall_func
symbols are not visible tolldb
so the test fails to match to the expected output ofthread backtrace
as these symbol names appear as empty strings in the stack trace output.To make this test pass, I had to create this local patch which is certainly not the correct way to fix things but demonstrates my problem well.
Background
In particular, here are the problems I observe:
1. lld-link does not produce a binary that exports the `realign_stack` and `call_func` symbols which are defined as `.globl` in [windows-unaligned-x86_64-asm.s](https://github.com/llvm/llvm-project/blob/90065da6d5a5f661b60c2f75b0f2dc094d27f4f5/lldb/test/Shell/Unwind/Inputs/windows-unaligned-x86_64-asm.s),
I think this part is expected. .globl makes the symbol visible to other compile units in the same library. To export it to other libraries, extra steps (like the ones you did are needed). At least on windows that is -- with elf, global symbols are exported by default, and must be explicitly hidden if desired.
even though
lldb-pdbutil
indicates that these symbols are “public” in the final executable (The commands below are run in powershell but I built and ran these tests in cmd.exe):PS C:\workspace\llvm-project> C:\workspace\llvm-project\build\bin\llvm-pdbutil.exe pretty --externals C:\workspace\llvm-project\build\tools\lldb\test\Shell\Unwind\Output\windows-unaligned-x86_64.test.pdb | Select-String -Pattern realign_stack public [0x0000104e] realign_stack PS C:\workspace\llvm-project> C:\workspace\llvm-project\build\bin\llvm-pdbutil.exe pretty --externals C:\workspace\llvm-project\build\tools\lldb\test\Shell\Unwind\Output\windows-unaligned-x86_64.test.pdb | Select-String -Pattern call_func public [0x00001040] call_func
LLDB's
image lookup
command fails to find these symbols by address or name.
I guess that's a bug. If they're in the debug info, lldb should find them, whether or not they are exported.
As a workaround I’ve manually exported the symbols in build.py using e.g.
/EXPORT:realign_stack
2. LLDB doesn’t appear to load PDB symbols from a separate `.pdb` file automatically.
It should, and I think it does so in some cases, but it's quite possible this does not work all the time.
It’s unclear if the original test author ran into this problem from their PR discussion so as a workaround I added an explicit
add-dsym ...
call to load the symbols.2. What toolchain differences would cause the `realign_stack` and `call_func` symbols not to exported in one build vs another? Different assembler? Compiler version? Linker version? I would like to fix this locally and then update the test's `REQUIRES` statement if possible.
Like I said above, I think the exportedness of the symbol is not the issue, and I don't expect any windows compiler to export them by default (well, maybe some cygwin stuff, which tries to emulate elf behavior).
That said, all of the above things could affect the exact kind of debug info (PDB) being produced, and thus lldb's ability to parse it.
3. What needs to be done to make LLDB's built in PDB reader function locally? If this test is running in someone's CI without `LLDB_USE_NATIVE_PBD_READER=1`, there must be some steps I skipped in setting up PDB support and I'd like to document these.
From where I'm sitting, the ideal outcome would be to flip the settings to use the "native" pdb reader by default. :)
Thanks for the added context @labath! Flipping the settings to use the native reader by default seems reasonable to me.
Like I said above, I think the exportedness of the symbol is not the issue, and I don't expect any windows compiler to export them by default (well, maybe some cygwin stuff, which tries to emulate elf behavior).
Makes sense. This tells me that although I can fix the problem locally by /EXPORT
ing the symbols, it's not really fixing the true issue. This leaves me only to figure out why these symbols are not rendered in my thread backtrace
output even with the Native PDB reader enabled, locally. Instead their backtrace lines contain empty strings where the names should be. Maybe @mstorsjo can help? Would you have any clue as to where I should start debugging this?
My full set of setup and build commands is:
# In an Admin cmd.exe
regsvr32 "C:\Program Files\Microsoft Visual Studio\2022\Community\DIA SDK\bin\msdia140.dll"
regsvr32 "C:\Program Files\Microsoft Visual Studio\2022\Community\DIA SDK\bin\amd64\msdia140.dll"
# In a non Admin cmd.exe
"C:\Program Files\Microsoft Visual Studio\2022\Community\Common7\Tools\VsDevCmd.bat" -host_arch=amd64 -arch=amd64
"C:\Program Files\Microsoft Visual Studio\2022\Community\Common7\Tools\VsDevCmd.bat" -test
set LLDB_USE_LLDB_SERVER=1
set MSVC_INSTALL_DIR=C:\Program Files\Microsoft Visual Studio\2022\Community
set "PATH=%MSVC_INSTALL_DIR%\DIA SDK\bin\amd64;%PATH%"
set "PATH=%MSVC_INSTALL_DIR%\VC\Tools\Llvm\x64\bin;%PATH%"
set "PATH=C:\workspace\llvm-project\build\bin;%PATH%"
set CC=clang-cl
set CXX=clang-cl
cmake -G Ninja ^
-B build ^
-D Python3_EXECUTABLE="C:\Program Files\Python312\python.exe" ^
-D CMAKE_C_COMPILER_LAUNCHER=ccache ^
-D CMAKE_CXX_COMPILER_LAUNCHER=ccache ^
-D CMAKE_BUILD_TYPE="Release" ^
-D LLVM_CCACHE_BUILD="ON" ^
-D LLVM_ENABLE_PROJECTS="llvm;clang;lldb;lld" ^
-D LLVM_ENABLE_LLD="YES" ^
-D LLVM_ENABLE_DIA_SDK="ON" ^
-D LLVM_LIT_ARGS=-v ^
-D LLDB_ENABLE_PYTHON=1 ^
-D LLDB_PYTHON_EXE_RELATIVE_PATH="python.exe" ^
-D LLDB_TEST_USER_ARGS=--skip-category=watchpoint ^
-D LLDB_ENFORCE_STRICT_TEST_REQUIREMENTS=1 ^
llvm
ninja -C build check-lldb
I believe the main difference here might be that you are setting up your build with MSVC and I'm using Clang with MSVC CLI. I can start there and see if that fixes the problem.
Can confirm that switching to MSVC does not fix this for me.
For some reason, the test is now failing for me too, in my local test environment. I have no idea how it worked for me two weeks ago though.
But now I see what's going wrong.
The issue is that when building the assembly source file, Shell/Unwind/Inputs/windows-unaligned-x86_64-asm.s
, the output object file gets DWARF debug info generated. This is a change that happened in Clang 18, which seems to have started occurring since f58330cbe44598eb2de0cca3b812f67fea0a71ca. (This seems to be an unintended effect of this commit.)
When the assembly files is compiled, the command run looks like clang-cl /c /Z7 /Foout.o input.s
. The /Z7
is included to enable generation of codeview debug info in the output object file - but when the input is a plain .s
assembly file, it seems like it triggers generating some DWARF debug info.
To see this in action, run the following:
> bin\clang-cl /Z7 /c /Foout.o ..\..\lldb\test\Shell\Unwind\Inputs\windows-unaligned-x86_64-asm.s
> bin\llvm-readobj --sections out.o
Note that the output contains a couple of sections like .debug_info
and .debug_line
. When the /Z7
option is omitted, these sections aren't generated. When using a clang-cl
from version 17 or earlier, those sections aren't generated either.
When linking the output binary, we do generate PDB debug info, as that's what we request. But as one object file did include DWARF debug info, the output also contains a little bit of DWARF. When LLDB inspects the binary, it looks for both DWARF and PDB. In this case, it finds a little bit of DWARF debug info, and LLDB concludes the search for related debug info, not looking for the PDB. That's why it can't resolve the symbols that normally would be provided by the debug info (either PDB, or DWARF when the test is run in mingw mode).
So in this case, there are many ways around this (some more proof of concept, others more reasonable):
llvm-strip
on the output binary, to remove any unintended DWARF debug info. (This only should be done when the toolchain is operating in clang-cl mode though, in mingw mode, DWARF would be the default and intended debug output format, and shouldn't be stripped.)llvm-strip --strip-debug
on the object file corresponding to the assembly input/Z7
to the compiler, in the build.py
helper script, when the input is an assembly file-gdwarf
has been passed. I.e. fix the apparently unintended consequence of f58330cbe44598eb2de0cca3b812f67fea0a71ca for this case.This does seem like a clear regression in the Clang 18 timeframe.
But what's puzzling me is how the test does seem to work for some, like I would presume for @dzhidzhoev and probably also on some buildbots, and like how it did seem to work for me 2 weeks ago? Can the build.py
script end up running e.g. a host installed clang-cl
version rather than the one that is recently built, in the same build tree?
When linking the output binary, we do generate PDB debug info, as that's what we request. But as one object file did include DWARF debug info, the output also contains a little bit of DWARF. When LLDB inspects the binary, it looks for both DWARF and PDB. In this case, it finds a little bit of DWARF debug info, and LLDB concludes the search for related debug info, not looking for the PDB. That's why it can't resolve the symbols that normally would be provided by the debug info (either PDB, or DWARF when the test is run in mingw mode).
Woah, great job tracking this down.
FYI I'm leaving for an extended vacation today; I'll try out the solutions you've mentioned above in my local runs when I return. I appreciate the thorough explanation!
So in this case, there are many ways around this (some more proof of concept, others more reasonable):
- Run
llvm-strip
on the output binary, to remove any unintended DWARF debug info. (This only should be done when the toolchain is operating in clang-cl mode though, in mingw mode, DWARF would be the default and intended debug output format, and shouldn't be stripped.)- Run
llvm-strip --strip-debug
on the object file corresponding to the assembly input- Don't pass
/Z7
to the compiler, in thebuild.py
helper script, when the input is an assembly file- Make Clang/LLVM not generate DWARF debug info for assembly outputs, when the target triple is an MSVC target, unless an option like
-gdwarf
has been passed. I.e. fix the apparently unintended consequence of f58330c for this case.
Could providing a small binary/yaml be a better solution, to reduce this test's dependency on the compiler and assembler default pipeline?
So in this case, there are many ways around this (some more proof of concept, others more reasonable):
- Run
llvm-strip
on the output binary, to remove any unintended DWARF debug info. (This only should be done when the toolchain is operating in clang-cl mode though, in mingw mode, DWARF would be the default and intended debug output format, and shouldn't be stripped.)- Run
llvm-strip --strip-debug
on the object file corresponding to the assembly input- Don't pass
/Z7
to the compiler, in thebuild.py
helper script, when the input is an assembly file- Make Clang/LLVM not generate DWARF debug info for assembly outputs, when the target triple is an MSVC target, unless an option like
-gdwarf
has been passed. I.e. fix the apparently unintended consequence of f58330c for this case.Could providing a small binary/yaml be a better solution, to reduce this test's dependency on the compiler and assembler default pipeline?
That would probably work, but it's not exactly ideal.
I think the ideal thing would be to get Clang fixed here, to make it not produce DWARF (unless explicitly told to) when building in MSVC mode.
Is this test passing for you? Can you look into how that works - rerun the individual test with llvm-lit -a
, then rerun the build.py
command with --verbose
to see exactly what it executes when building your test executable, to understand why this isn't happening for you.
Is this test passing for you?
It's not passing on my setup.
Is this test passing for you?
It's not passing on my setup.
Oh, ok, that explains things!
I would have expected there to be some existing buildbots that does test LLDB on Windows/x86_64, but perhaps there isn't one?
This issue should have been fixed now, by https://github.com/llvm/llvm-project/pull/106686 / https://github.com/llvm/llvm-project/commit/fcb7b390ccd5b4cfc71f13b5e16a846f3f400c10 - please try it out!
CC @labath @mstorsjo
Related to https://github.com/swiftlang/llvm-project/issues/9141
Overview
My local build of this test produces a binary whose
realign_stack
andcall_func
symbols are not visible tolldb
so the test fails to match to the expected output ofthread backtrace
as these symbol names appear as empty strings in the stack trace output.To make this test pass, I had to create this local patch which is certainly not the correct way to fix things but demonstrates my problem well.
Background
In particular, here are the problems I observe:
realign_stack
andcall_func
symbols which are defined as.globl
in windows-unaligned-x86_64-asm.s, even thoughlldb-pdbutil
indicates that these symbols are “public” in the final executable (The commands below are run in powershell but I built and ran these tests in cmd.exe):LLDB's
image lookup
command fails to find these symbols by address or name.As a workaround I’ve manually exported the symbols in build.py using e.g.
/EXPORT:realign_stack
LLDB doesn’t appear to load PDB symbols from a separate
.pdb
file automatically. It’s unclear if the original test author ran into this problem from their PR discussion so as a workaround I added an explicitadd-dsym ...
call to load the symbols.LLDB seems to have a hard time parsing PDB files and crashes when
add-dsym
is called. I believe this is a known issue [1], [2]. Given that there are 2 PDB readers to choose from: LLDB’s implementation (default) and the Windows Native reader, as a workaround I’ve used the native one by settingLLDB_USE_NATIVE_PDB_READER=1
.Without my local patch, the full failure output with verbose information is below. As you can see,
func
andmain
are symbolized in lldb, but the two frames between those calls have no symbol name. The Clang and lld-link invocations are also shown below.Questions
realign_stack
andcall_func
symbols not to exported in one build vs another? Different assembler? Compiler version? Linker version? I would like to fix this locally and then update the test'sREQUIRES
statement if possible.LLDB_USE_NATIVE_PBD_READER=1
, there must be some steps I skipped in setting up PDB support and I'd like to document these.