microsoft / windows-rs

Rust for Windows
https://kennykerr.ca/rust-getting-started/
Apache License 2.0
10.32k stars 485 forks source link

Bug: Unhandled exception iterating RawGameControllers and using Steam #2252

Closed paul-hansen closed 1 year ago

paul-hansen commented 1 year ago

Which crate is this about?

windows

Crate version

0.43.0

Summary

If you add a program using this crate to Steam as a "Add a Non-Steam game" calling RawGameController::RawGameControllers()?.into_iter() throws an unhandled exception if there is a controller connected. If you run the same program outside of Steam it is fine. If you manually iterate over the controllers using GetAt() and Size(), it avoids the error, which makes me wonder if there is something wrong with the into_iterator implementation or something.

Toolchain version/configuration

Default host: x86_64-pc-windows-msvc rustup home: C:\Users\Paul.rustup

installed toolchains

stable-x86_64-pc-windows-msvc (default) nightly-x86_64-pc-windows-msvc 1.61.0-x86_64-pc-windows-msvc 1.65.0-x86_64-pc-windows-msvc

installed targets for active toolchain

wasm32-unknown-unknown x86_64-pc-windows-msvc

active toolchain

stable-x86_64-pc-windows-msvc (default) rustc 1.65.0 (897e37553 2022-11-02)

Reproducible example

use windows::Gaming::Input::RawGameController;
use windows::Win32::Graphics::Gdi::{RedrawWindow, RDW_INVALIDATE};
use windows::{
    core::*, Win32::Foundation::*, Win32::Graphics::Gdi::ValidateRect,
    Win32::System::LibraryLoader::GetModuleHandleA, Win32::UI::WindowsAndMessaging::*,
};
// Most of this is taken from the create_window example in the Windows crate:
// https://github.com/microsoft/windows-rs/blob/4726348167316b4624abbe57e0b09cd05f12e0d5/crates/samples/create_window/src/main.rs
// See other code comments for differences.
fn main() -> Result<()> {
    unsafe {
        let instance = GetModuleHandleA(None)?;
        debug_assert!(instance.0 != 0);

        let window_class = s!("window");

        let wc = WNDCLASSA {
            hCursor: LoadCursorW(None, IDC_ARROW)?,
            hInstance: instance,
            lpszClassName: window_class,
            style: CS_HREDRAW | CS_VREDRAW,
            lpfnWndProc: Some(wndproc),
            ..Default::default()
        };

        let atom = RegisterClassA(&wc);
        debug_assert!(atom != 0);

        CreateWindowExA(
            WINDOW_EX_STYLE::default(),
            window_class,
            s!("This is a sample window"),
            WS_OVERLAPPEDWINDOW | WS_VISIBLE,
            CW_USEDEFAULT,
            CW_USEDEFAULT,
            CW_USEDEFAULT,
            CW_USEDEFAULT,
            None,
            None,
            instance,
            None,
        );

        let mut message = MSG::default();

        while GetMessageA(&mut message, HWND(0), 0, 0).into() {
            DispatchMessageA(&message);
        }

        Ok(())
    }
}

extern "system" fn wndproc(window: HWND, message: u32, wparam: WPARAM, lparam: LPARAM) -> LRESULT {
    unsafe {
        match message {
            WM_PAINT => {
                // Call our code that prints the controller names. The exception happens in here.
                print_gamepads();
                ValidateRect(window, None);
                // The exception doesn't happen on the first frame because WGI always says there
                // are no controllers the first time you ask. Request new frames so it fails
                // without having to resize the window to get it to run again.
                RedrawWindow(window, None, None, RDW_INVALIDATE);
                LRESULT(0)
            }
            WM_DESTROY => {
                println!("WM_DESTROY");
                PostQuitMessage(0);
                LRESULT(0)
            }
            _ => DefWindowProcA(window, message, wparam, lparam),
        }
    }
}

fn print_gamepads() {
    let gamepads = match RawGameController::RawGameControllers() {
        Ok(gamepads) => gamepads,
        Err(e) => panic!("Error while fetching gamepads {e}"),
    };

    match gamepads.Size() {
        Ok(0) => println!("No Gamepads found"),
        _ => {
            // Exception happens on the below line when launched through steam
            for controller in gamepads {
                let name = match controller.DisplayName() {
                    Ok(hstring) => hstring.to_string_lossy(),
                    Err(_) => "unknown".to_string(),
                };
                println!("Found controller: {name}");
            }
        }
    }
}

Crate manifest

[package]
name = "steam_wgi_crash_repro"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
windows = { version = "0.43.0", features = ["Gaming_Input", "Win32_Foundation", "Win32_Graphics", "Win32_System", "Win32_Graphics_Gdi", "Win32_System_LibraryLoader", "Win32_UI", "Win32_UI_WindowsAndMessaging", "Foundation_Collections"] }

Expected behavior

Expected to be able to launch the example through Steam and it stays running, printing to the console every frame the names of which controllers are connected.

Actual behavior

You can launch the example through Steam but if a controller is connected or as soon as a controller is connected, it prints the name of the controller once then silently crashes.

Analyzing the .dmp file produced by Windows gives this output:

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00007ffc4a7e6b1b (gameoverlayrenderer64!VulkanSteamOverlayProcessCapturedFrame+0x0000000000003b8b)
   ExceptionCode: c000041d
  ExceptionFlags: 00000001
NumberParameters: 0

PROCESS_NAME:  steam_wgi_crash_repro.exe

ERROR_CODE: (NTSTATUS) 0xc000041d - An unhandled exception was encountered during a user callback.

EXCEPTION_CODE_STR:  c000041d

STACK_TEXT:  
00000060`8514c2d0 00007ff7`b6ef62fd     : 00000000`00000000 00000000`80000022 0000020a`6ea39330 00000060`8514c5f8 : gameoverlayrenderer64!VulkanSteamOverlayProcessCapturedFrame+0x3b8b
00000060`8514c300 00000000`00000000     : 00000000`80000022 0000020a`6ea39330 00000060`8514c5f8 0000020a`6ea39330 : steam_wgi_crash_repro+0x62fd

STACK_COMMAND:  ~0s; .ecxr ; kb

SYMBOL_NAME:  gameoverlayrenderer64+3b8b

MODULE_NAME: gameoverlayrenderer64

IMAGE_NAME:  gameoverlayrenderer64.dll

FAILURE_BUCKET_ID:  FATAL_USER_CALLBACK_EXCEPTION_c000041d_gameoverlayrenderer64.dll!Unknown

OS_VERSION:  10.0.22621.1

BUILDLAB_STR:  ni_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

IMAGE_VERSION:  7.69.51.22

FAILURE_ID_HASH:  {17205676-edd2-5e90-dc9c-604d33dd9f08}

Followup:     MachineOwner
---------

Additional comments

I also have a repo here with step by step instructions: https://github.com/paul-hansen/steam_wgi_crash_repro

I recently rewrote the Windows backend for the GilRs crate to use the windows crate and it just recently got merged into Bevy, since then we have had a couple users report this. Here is the related GilRs issue: https://gitlab.com/gilrs-project/gilrs/-/issues/132 And here are the help threads from Bevy's Discord: https://discord.com/channels/691052431525675048/1052636839376343141/1052636839376343141 https://discord.com/channels/691052431525675048/1049757548481347734/1049757548481347734 I tried to include all relevant info in this issue though.

kennykerr commented 1 year ago

I ordered a Bluetooth adapter so I can debug this on my dev box with an Xbox controller. Should be here in a day or two. 😜

In the meantime, I'll take a look at the iterator code to make sure its sound.

kennykerr commented 1 year ago

This looks like a bug in Steam. How can I tell? Well, the fact that it only crashed under Steam provides a clue that Steam may be hooking the RawGameController API and providing a buggy alternative. Let's prove it.

First, we need to prove that Steam is hooking the API. Since the RawGameControllers method returns a COM interface pointer, we can dereference that to get a pointer to a vtable slot inside the implementation. We can then pass that address to GetModuleHandleExW to retrieve the module handle containing this address. Then we can simply use GetModuleFileNameW to print out the name of the module:

use windows::{core::*, Gaming::Input::*, Win32::Foundation::*, Win32::System::LibraryLoader::*};

fn main() {
    let controllers = RawGameController::RawGameControllers().expect("RawGameControllers");

    unsafe {
        let mut module = HINSTANCE(0);

        GetModuleHandleExW(
            GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS | GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
            PCWSTR(*(controllers.abi() as *const *const std::ffi::c_void) as _),
            &mut module,
        )
        .expect("GetModuleHandleExW");

        let mut filename: [u16; 512] = [0; 512];
        GetModuleFileNameW(module, &mut filename);

        println!("module: {}", String::from_utf16_lossy(&filename));
    }

    std::io::stdin()
        .read_line(&mut String::new())
        .expect("read_line");
}

Running this directly on Windows 11, it prints out the following file name:

module: C:\Windows\System32\Windows.Gaming.Input.dll

That looks right. But if I run the same code from Steam, it prints out the following fishy file name:

module: C:\Program Files (x86)\Steam\gameoverlayrenderer64.dll       

That confirms that Steam hooks this API. So, what's the bug in their implementation? Let's take a look at a minimal repro:

use windows::Gaming::Input::*;

fn main() {
    loop {
        std::thread::sleep(std::time::Duration::from_millis(1000));
        let controllers = RawGameController::RawGameControllers().expect("RawGameControllers");

        for controller in controllers {
            println!(
                "detected: {}",
                controller.DisplayName().expect("DisplayName")
            );
        }
    }
}

We can run it in a loop since the controllers may not be available immediately. Running this directly on Windows 11, it prints out the following output:

   Compiling scratch v0.1.0 (D:\git\scratch)
    Finished dev [unoptimized + debuginfo] target(s) in 0.38s
     Running `target\debug\scratch.exe`
detected: Xbox One Game Controller
detected: Xbox One Game Controller
detected: Xbox One Game Controller
.
.
.

But running the same code under Steam just crashes. What gives? Well let's unpack the code a little. The RawGameControllers method returns a IVectorView<RawGameController> The Rust for loop then just uses an iterator to return successive values until the vector has been exhausted. The iterator for IVectorView<T> just uses the GetAt(index) method, so let's unroll that syntactic sugar and see what's going on:

use windows::{Gaming::Input::*, Win32::Foundation::*};

fn main() {
    loop {
        let _ = RawGameController::RawGameControllers().expect("RawGameControllers");
        std::thread::sleep(std::time::Duration::from_millis(1000));
        let controllers = RawGameController::RawGameControllers().expect("RawGameControllers");

        let mut current = 0;
        loop {
            match controllers.GetAt(current) {
                Ok(controller) => {
                    current += 1;
                    println!(
                        "detected: {}",
                        controller.DisplayName().expect("DisplayName")
                    );
                }
                Err(error) => {
                    assert!(error.code() == E_BOUNDS);
                    break;
                }
            }
        }
    }
}

Notice also that I call RawGameControllers twice. This just ensures that the first time we loop through the collection the devices have actually been loaded and the collection shouldn't be empty. You can really see what's going on now. The GetAt method is the virtual function call that actually travels into the implementation and either returns S_OK if current is a valid index or E_BOUNDS if not. Running this directly on Windows 11 produces the same results, but running it from Steam again crashes, but only after printing out a single result. Let's hook up a debugger. Since I've never used Steam before, I'll just cheat and drop this at the top:

std::io::stdin()
    .read_line(&mut String::new())
    .expect("read_line");

That way I can launch it from Steam and then attach a debugger to the running process. Stepping through the code we can see that the first call to GetAt(0) - with a valid index - succeeds but the next call to GetAt(1) - with an invalid index - causes an access violation inside the GetAt virtual function:

image

So instead of returning E_BOUNDS, the Steam implementation presumably doesn't check the bounds and causes an access violation trying to index beyond the end of the implementation's bounds. This explains why you originally noticed that it works if you first read the size of the vector and only call GetAt with valid indexes.

I hope that helps!

paul-hansen commented 1 year ago

This is excellent! Thanks so much for running this down the rest of the way and showing how you got there.

I'll see if I can track down the proper place to report this to Steam.

Edit: I reported this through the support page on my Steamworks account.

paul-hansen commented 1 year ago

Got a response from Steam:

Message from The Steam Team on Dec 19 @ 9:23pm Thanks for the great info. We actually do bounds checking and return E_BOUNDS if the array is accessed out of bounds, so there's something more going on here. Can you attach a crash dump so we can debug what's happening?

I sent them the crash dump. Mainly just wanted to share to let people know I've reached them and it sounds like it's being looked into.

paul-hansen commented 1 year ago

Good news!

Message from The Steam Team on Dec 20 @ 11:27am Thanks, that was very helpful. This is fixed for the next beta client.

I think we can follow https://steamcommunity.com/groups/SteamClientBeta/announcements to see when it hits beta.