microsoft / windows-rs

Rust for Windows
https://kennykerr.ca/rust-getting-started/
Apache License 2.0
10.45k stars 494 forks source link

Problems with natvis support #2836

Closed kennykerr closed 9 months ago

kennykerr commented 9 months ago

Originally posted by @tim-weis in https://github.com/microsoft/windows-rs/issues/2023#issuecomment-1929388019

Thanks for the feedback. This has been rather insightful.

I went ahead and did some more research. The TL;DR is: The HSTRING visualizer works with CDB but not with other debuggers I tried (WinDbg, Visual Studio).

First, though, the windows::core vs. windows_core path discrepancy was down to me copy-pasting outdated expected test output from #2023. This has since been changed, and all debuggers (and visualizers) agree on windows_core as the package-relative path prefix. That turned out to be a red herring.

For reference, here's the full repro (slightly modified from my previous comment).

Cargo.toml

[package]
name = "win_nv"
version = "0.0.0"
edition = "2021"

[dependencies]
windows = { version = "0.52.0", features = [] }

.cargo/config.toml

[build]
# Request that visualizers are embedded into the PDB
rustflags = ["--cfg=windows_debugger_visualizer"]

src/main.rs

use windows::core::HSTRING;

#[inline(never)]
fn __break() {}

fn main() {
    let empty = HSTRING::new();
    println!("{empty}");

    let hstring = HSTRING::from("This is an HSTRING");
    println!("{hstring}");

    __break();
}

This is following a pattern I first discovered in Ridwan's debugger_test crate: It introduces a function (__break()) for the sole purpose of having a symbol to set a breakpoint on, providing a convenient way to insert checkpoints at which execution pauses. This works in combination with the bm command (bm *!*::__break "gu") instructing the debugger to "Go Up" ("gu") whenever the function is hit, taking us back to the scope of interest.

With the crate set up the following command line launches right into the CDB debugger:

cargo b && cd target\debug && "%WindowsSdkDir%Debuggers\x64\cdb.exe" -o win_nv.exe

Once in the debugger, we can set the breakpoint, run to the checkpoint and inspect the HSTRINGs:

0:000> bm *!*::__break "gu"
*** WARNING: Unable to verify checksum for win_nv.exe
  1: 00007ff6`e62d1010 @!"win_nv!win_nv::__break"
0:000> g

This is an HSTRING
win_nv!win_nv::main+0x104:
00007ff6`e62d1124 eb00            jmp     win_nv!win_nv::main+0x106 (00007ff6`e62d1126)
0:000> dx empty
empty            : "" [Type: windows_core::strings::hstring::HSTRING]
    [<Raw View>]     [Type: windows_core::strings::hstring::HSTRING]
    [len]            : 0x0 [Type: unsigned int]
0:000> dx hstring
hstring          : "This is an HSTRING" [Type: windows_core::strings::hstring::HSTRING]
    [<Raw View>]     [Type: windows_core::strings::hstring::HSTRING]
    [len]            : 0x12 [Type: unsigned int]
    [ref_count]      : 1 [Type: windows_core::imp::ref_count::RefCount]
    [flags]          : 0x0 [Type: unsigned int]
    [chars]
0:000> q

That explains why the tests are succeeding. Moving to WinDbg had some surprises, though: Setting the breakpoint the same way as in CDB behaved differently. The "gu" command string went up (at least) one stack frame more than expected. I'm not sure what's up with that, but I just replaced the command string with "pt" ("Step to Next Return") which seemingly worked. For completeness: bm *!*::__break "pt".

Once there, the debugger produced unexpected results for the HSTRING variables:

0:000> dx empty
empty                 [Type: windows_core::strings::hstring::HSTRING]
    [<Raw View>]     [Type: windows_core::strings::hstring::HSTRING]
    [len]            : Unexpected failure to dereference object
0:000> dx hstring
hstring                 [Type: windows_core::strings::hstring::HSTRING]
    [<Raw View>]     [Type: windows_core::strings::hstring::HSTRING]
    [len]            : Unexpected failure to dereference object

Can anyone please verify my observations?


Rust: 1.75.0 (stable) CDB: cdb version 10.0.22621.382 WinDbg: Debugger client version: 1.2308.2002.0; Debugger engine version: 10.0.25921.1001 Host OS: Windows 10 19045.3930


While this is starting to feel like I'm losing my mind, here are a few more things I tried to make sure I'm looking at the same thing the debugger is:

None of the above had any observable effect so I'm confident that windows.natvis is actually loaded and evaluated.

MaulingMonkey commented 9 months ago

Can anyone please verify my observations?

I cannot. Are you launching both cdb and windbg from C:\Program Files (x86)\Windows Kits\10\Debuggers\x64?

Moving to WinDbg had some surprises, though: Setting the breakpoint the same way as in CDB behaved differently.

Some variance between debugger versions is (sadly) common though. I'm not masochistic enough to attempt to unit test WinDbg or Visual Studio however. ...yet, at least.

The "gu" command string went up (at least) one stack frame more than expected.

No repro. The last time I encountered a similar issue, someone had enabled opt-level = "3" for their debug builds. I'm assuming you haven't hidden similar in a global/user-wide .cargo/config.toml however. That said, make sure you don't close win_nv's console window - you'll likely encounter TerminateProcess in an injected thread before your breakpoint.


rustc 1.75.0 (82e1608df 2023-12-21) cdb version 10.0.22621.1 WinDbg 10.0.22621.1 Microsoft Windows [Version 10.0.19045.3930]


Using File > Open Executable...

image

MaulingMonkey commented 9 months ago

Slapping #[no_mangle] on __break for easier function breakpoint resolution in Visual Studio, and putting cargo-vs to use with:

pushd C:\local\_archive\2024\win_nv
cargo vs2017
cargo vs2019
cargo vs2022
start "" vs\vs2017.sln
start "" vs\vs2019.sln
start "" vs\vs2022.sln

I was able to verify hstring's visualizer seems to be working fine for Debug|x64 builds in VS as well on my machine:

VS2017

image

VS2019

image

VS2022

image

tim-weis commented 9 months ago

I cannot. Are you launching both cdb and windbg from C:\Program Files (x86)\Windows Kits\10\Debuggers\x64?

I'm not. I was using the tool formerly called "WinDbg Preview" that now goes by the name "WinDbg" and I'm struggling to disambiguate. With WinDbg from the Debugging Tools for Windows, everything works as expected. It never crossed my mind that "WinDbg" and "WinDbg" would behave differently, so thanks for that insight, @MaulingMonkey!

Just to clarify, I did my testing using the tool that used to be called "WinDbg Preview", and things are failing with that still (and Visual Studio).

The failure cases seem to be down to these two lines:

https://github.com/microsoft/windows-rs/blob/0df3676e996364024144efb7b1f424e5f7a66e53/crates/libs/core/windows.natvis#L30-L31

Avoiding this and nullptr solved the issue in "WinDbg Preview" for me (I don't know how to control .natvis files in Visual Studio, so I haven't verified that). I also couldn't find any reference documentation that explained those tokens.

riverar commented 9 months ago

image

Not seeing any issues here with WinDbg Preview, client 1.2308.2002.0 / engine 10.0.25921.1001. Be aware WinDbg Preview is moving away from Store distribution and is now updated via AppInstaller https://aka.ms/windbg/download. So if you were relying solely on the Store copy, you may be running a outdated copy.

riverar commented 9 months ago

Ah, I can reproduce @tim-weis's reported behavior when I introduce an empty hstring. Something is definitely funky with the evaluation of header/is_empty intrinsic. After failure, the natvis appears to stop further evaluation until reload. What's got me scratching my head is that I don't have a windows_core::strings::hstring::Header symbol.

tim-weis commented 9 months ago

Thank you, @riverar! At least it isn't just me anymore that's seeing things. The windows_core::strings::hstring::Header type should be available from the PDB. Does dt windows_core::strings::hstring::Header succeed for you?

Removing the empty HSTRING doesn't change things for me, though. It is failing either way in WinDbg Preview and Visual Studio.

kennykerr commented 9 months ago

Seems like something @wesleywiser would know about.

riverar commented 9 months ago

In the case of empty HSTRINGs, the this prvalue is typed as a non-pointer (windows_core::strings::hstring::HSTRING), which results in cast failure. I added a intermediate cast to align the expression's behavior here.

@tim-weis Can you verify this works for you now?

<Intrinsic Name="header" Expression="*((windows_core::strings::hstring::Header**)(uintptr_t)this)" ReturnType="windows_core::strings::hstring::Header *" />
riverar commented 9 months ago

Oh no, I'm discovering WinDbg, Visual Studio, and others evaluate slightly differently.

tim-weis commented 9 months ago

@riverar This change has no effect for me in WinDbg Preview. The only way for me to get the visualizer to work in WinDbg Preview is with this header expression:

<Intrinsic Name="header" Expression="(windows_core::strings::hstring::Header*)__0.tag" />

It appears as though this just isn't defined for me (in WinDbg Preview and Visual Studio). A dx this is met with an Error: Unable to bind name 'this'[^1]. I have no idea where this symbol is (intended to be) defined. It's neither listed under the debugger intrinsics nor the pseudo variables. But it appears to be at the core of the issue I'm seeing.

Just to make sure we aren't talking past each other: The issue you (Rafael) are looking at and the issue I am observing are quite possibly distinct. While you are discovering (how, by the way?) that the expression this in a visualizer evaluates to different things across debuggers, I cannot seem to be using this as an expression altogether (in WinDbg Preview and Visual Studio).

Apparently, there's something peculiar about my specific environment.

[^1]: This is failing in CDB (where the visualizer otherwise works for me) in the same way, so this seems to be a peculiarity of the natvis infrastructure rather than the debugger engine.

kennykerr commented 9 months ago

For reference, here's what C++/WinRT does:

https://github.com/microsoft/cppwinrt/blob/master/natvis/cppwinrt.natvis#L56-L63

May be worth considering something similar to avoid these complications.

MaulingMonkey commented 9 months ago

I was using the tool formerly called "WinDbg Preview" that now goes by the name "WinDbg" and I'm struggling to disambiguate.

Ack. I've encountered this with PIX previously, which I disambiguated with the terms:

I think for myself I'll start calling these different versions of WinDbg:

It never crossed my mind that "WinDbg" and "WinDbg" would behave differently, so thanks for that insight, @MaulingMonkey!

It's pretty horrifying! 👍 A previous rabbit hole of CDB version specific debugging: https://github.com/rust-lang/rust/issues/76352 (and that was without an overhaul/rewrite between CDB versions!)

I also couldn't find any reference documentation that explained those tokens.

Both are C++ keywords (this, nullptr). I'm a little suprised to hear of them not working in Visual Studio, I'm less suprised that they might cause problems in "WinDbg Preview" / UWP WinDbg. That they work for some of us but not all of us... ack. I have vague recollections of similar problems with this in the past, though, so don't let me gaslight you into thinking your install is broken somehow. Or unusually broken, at least 😄.

__0.tag

While this somewhat undermines a goal of https://github.com/microsoft/windows-rs/pull/2077 ("This allows for changes to the HSTRING data structure without breaking the Natvis."), I'd still embrace this change - debug visualizers are sadly brittle no matter what you do IME, so I'd prioritize doing the dumb simple thing that works over cleverness, and rely on CI unit tests to catch the inevitable breakage.

riverar commented 9 months ago

It appears as though this just isn't defined for me (in WinDbg Preview and Visual Studio). [...] It's neither listed under the debugger intrinsics nor the pseudo variables. But it appears to be at the core of the issue I'm seeing.

this and nullptr are C++ keywords and appear to be supported across all the C++ expression evaluators I've tested (VSpre, WinDbg, WinDbg Preview). this is bound when possible, such as in natvis intrinsic expressions or in the debugger when working with C++ class member functions. (As an example of the latter, launch/attach to charmap.exe, bp CharMap!CDropSource::AddRef, g, click around, gu, then dx this should work.)

Here's a simpler type block that should evaluate in VS and WinDbg for you that demonstrates this usage. (If not, that's very strange, we'll have to diagnose that one separately!)

<Type Name="windows_core::strings::hstring::HSTRING">
  <DisplayString>hello {(void*)this}</DisplayString>
</Type>

I believe there are several issues here commingling and making a mess. Please correct me if I'm wrong!

  1. Tim might be seeing abnormal expression behavior in some of his clients; further investigation needed, maybe differences across dbgeng.dll versions?
  2. Expression engines seem to be handling cast/safety differently, which makes it tricky to deal with (observed) this coming in with one of two types:
    • non-pointer type windows_core::strings::hstring::HSTRING (e.g. HSTRING::new();)
    • pointer type windows_core::strings::hstring::HSTRING* (e.g., h!("Hello."))
  3. Symbol-embedded natvis doesn't appear to be working at all; further investigation needed
kennykerr commented 9 months ago

I'm not familiar with the natvis format, or its various dialects and variants, but I did notice that the one for the Rust standard library types doesn't use "this" at all.

https://github.com/rust-lang/rust/blob/master/src/etc/natvis/libstd.natvis

riverar commented 9 months ago

PR submitted w/ tweaked natvis. Works across VSpre, CDB, and WinDbg Preview. I've also verified it loads correctly when embedded in symbols.

MaulingMonkey commented 9 months ago

A compounding factor: when debugging *.natvis files on an unrelated project, after enabling Natvis diagnostic errors, I've noticed that incremental rust builds accumulates natvis files referenced by #![debugger_visualizer(natvis_file = "...")] rather than replacing them. This means you may have stale natvis files taking priority over your new natvis files:

Natvis: d3d9create-0.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(4,6): Warning: Conflicting <Type> entries detected for type 'local_path_dependency_not_in_workspace::TYPE' at 'd3d9create-0.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(4,6)' and 'd3d9create-0.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(4,6)'.  The <Type> entry at 'd3d9create-0.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(4,6)' will have priority.
Natvis: d3d9create-0.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(4,6): Warning: Conflicting <Type> entries detected for type 'local_path_dependency_not_in_workspace::TYPE' at 'd3d9create-0.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(4,6)' and 'd3d9create-0.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(4,6)'.  The <Type> entry at 'd3d9create-0.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(4,6)' will have priority.
Natvis: d3d9create-1.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(4,6): Warning: Conflicting <Type> entries detected for type 'this_crate::Type1' at 'd3d9create-1.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(4,6)' and 'd3d9create-1.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(4,6)'.  The <Type> entry at 'd3d9create-1.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(4,6)' will have priority.
Natvis: d3d9create-1.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(9,6): Warning: Conflicting <Type> entries detected for type 'this_crate::Type2' at 'd3d9create-1.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(9,6)' and 'd3d9create-1.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(9,6)'.  The <Type> entry at 'd3d9create-1.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(9,6)' will have priority.
Natvis: d3d9create-1.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(4,6): Warning: Conflicting <Type> entries detected for type 'this_crate::Type1' at 'd3d9create-1.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(4,6)' and 'd3d9create-1.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(4,6)'.  The <Type> entry at 'd3d9create-1.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(4,6)' will have priority.
Natvis: d3d9create-1.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(9,6): Warning: Conflicting <Type> entries detected for type 'this_crate::Type2' at 'd3d9create-1.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(9,6)' and 'd3d9create-1.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(9,6)'.  The <Type> entry at 'd3d9create-1.natvis (from C:\local\...\target\x86_64-pc-windows-msvc\debug\examples\d3d9create.pdb)(9,6)' will have priority.

This does not appear to happen when using natvis-pdbs (which simply passes /NATVIS:... to link.exe when creating a final .exe or .dll), so this is presumably a bug in rustc (and not windows nor link.exe.)

EDIT: reported upstream: https://github.com/rust-lang/rust/issues/120913

riverar commented 9 months ago

Nice catch, I feel bad for @tim-weis. He was probably hit by every single bug/quirk we've documented in this thread here so far 😂