winsiderss / systeminformer

A free, powerful, multi-purpose tool that helps you monitor system resources, debug software and detect malware. Brought to you by Winsider Seminars & Solutions, Inc. @ http://www.windows-internals.com
https://systeminformer.sourceforge.io
MIT License
10.95k stars 1.4k forks source link

SYSTEM_SERVICE_EXCEPTION BSOD on Windows 10 22H2 #1788

Closed Mattiwatti closed 1 year ago

Mattiwatti commented 1 year ago

Brief description of your issue

I'm using systeminformer.sys built from git master on Windows 10 22H2 with kernel 10.0.19041.3155 (EXE, PDB). After entering a query in the handle search window in SI, I sometimes receive a bugcheck as follows:

SYSTEM_SERVICE_EXCEPTION (0x3B) An exception happened while executing a system service routine.

P1: 0x00000000C0000005 (STATUS_ACCESS_VIOLATION) P2: 0xFFFFF80311042BA7 P3: 0xFFFFA38AE82E68E0 P4: 0x0000000000000000

(The parameters are copied from a picture I took because I didn't have crash dumps enabled when this happened, sorry about that.)

After attaching a debugger it seems that the source line causing the access violation is this call to FltReleasePushLock which was added to KphpReferenceAlpcCommunicationPorts in dce457a. The exception is caused by an invalid memory access attempt when the kernel dereferences the lock.

I haven't debugged this further so I can't explain why the exception occurs in the release call and not the acquire call. It's possible that the bugcheck can happen in either and I happened to see it in the release call.


After following the source code to find the origin of the dynamic data values and comparing these to a PDB dump, I discovered that the lock offset KphDynAlpcHandleTableLock is incorrect for my kernel version. This offset is defined as 0x10, but it should be 0x8 for this kernel. After modifying the value and generating a new kphdyn.c with the updated value, the bugcheck is fixed for me.

There are two reasons I'm creating an issue rather than a PR with the above fix:

  1. I can modify the offset in kphdyn.xml, but I can't make the same change to kphdyn.c on master because the binary data in this file is signed with a key I haven't got.
  2. While 0x8 is the correct value for KphDynAlpcHandleTableLock on my kernel version, older revisions of 10.0.19041.x do have an offset of 0x10, meaning the current value is correct for these and changing it would break KPH on these kernels. (I don't know the precise 19041.xxx revision in which ALPC_HANDLE_TABLE was first changed.)
My opinion which no one asked for:

I feel like this problem is inherent to using version specific private types in this way and I don't really see how to fix this more robustly other than simply not doing this, or at least not to this extent. `handleTableLock` is the result of chaining **three** `Add2Ptr(p, offset)`s on undocumented types in a row! The total `Add2Ptr()` count for `KphpReferenceAlpcCommunicationPorts` is **6**. What is this function really doing for me that justifies this? I know it's easy to criticize given that I didn't write any of the code. These are just my thoughts after debugging this for a while.

Steps to reproduce (optional)

  1. Build KPH/KSI and optionally SI from git, and install and run SI on the current latest Windows 10 build. Note: I believe simply using a signed systeminformer.sys from a nightly build of any version starting from 3.0.6770 will also work, but I haven't tested this.
  2. Press CTRL+F and search for something like an open file handle (but leave the type selection on "everything").
  3. If the system did not crash, repeat step 2 until it does. I found that holding down the <ENTER> key to perform searches non-stop worked very well in my one attempt to reproduce this. It should take no more than a few seconds.

Expected behavior (optional)

System Informer shows the search results.

Actual behavior (optional)

The system crashes as described above.

Environment (optional)

- Windows 10 22H2 x64, kernel 10.0.19041.3155 from KB5027293
- KSI/KPH and SI built from master at 2dbaf34
jxy-s commented 1 year ago

Thanks @Mattiwatti for digging into this and for the thorough analysis.

I can modify the offset in kphdyn.xml, but I can't make the same change to kphdyn.c on master because the binary data in this file is signed with a key I haven't got.

Yes, it will require one of the maintainers to do this work as it's protected information in the repo. In case you're not familiar, if you want to patch this locally and since your building the driver yourself you can generate your own key pair. See: https://github.com/winsiderss/systeminformer/tree/master/KSystemInformer#development

I don't know the precise 19041.xxx revision in which ALPC_HANDLE_TABLE was first changed.

I've been working on some automation to harvest windows updates and build a complete and deterministic list for our dyndata - but that's a ways out from being a reality (I've had to shelve that for for not work focus on some other things). We've at least located another build that modified the offsets. I'll go in and update the offsets in the repo in a bit.

jxy-s commented 1 year ago

I decided to take some time to finish the automation to the point where I can validate all the "known and available" Win10 and Win11 x64 versions out there. So, I've harvested the information necessary. The tooling is a bit hacky but it will let me generate a new set of dyndata that should appropriately represent the internals.

jxy-s commented 1 year ago

Should be resolved with https://github.com/winsiderss/systeminformer/commit/f35e7496d16d26f5d99d38aa3b55d26f8dcdb496 I'll do more testing myself tomorrow. If you have cycles @Mattiwatti I'd appreciate your time validating the fix.

Note that:

  1. There might be gaps is known kernels from the previous dyndata, I rebuilt it all using my new tooling and by scraping together the kernel versions I could.
  2. I capped the versions to what I could validate. The most recent releases (10.0.19041.3208 and 10.0.22621.1992) won't work until we can check their offsets. It's late here, I'll monitor for them to be available this week and update the dyndata if there as necessary.

Finally, these changes don't require a new driver build. The existing driver is compatible with the changes to dyndata.

Mattiwatti commented 1 year ago

I think f35e749 still needs a little tweak for Windows 10. I get the following on startup:

image

I grepped for STATUS_NOT_SUPPORTED and found KphpSetDynamicConfiguration. I added some quick printf logging and got the following (irrelevant entries removed):

0: MajorVersion = 10, MinorVersion = 0, BuildNumberMin = 22000,
   RevisionMin = 556, BuildNumberMax = 22621, RevisionMax = 1928
0: NOT_SUPPORTED: KphOsVersionInfo.dwBuildNumber 19045 < Configuration->BuildNumberMin 22000

1: MajorVersion = 10, MinorVersion = 0, BuildNumberMin = 22000,
   RevisionMin = 194, BuildNumberMax = 22621, RevisionMax = 493
1: NOT_SUPPORTED: KphOsVersionInfo.dwBuildNumber 19045 < Configuration->BuildNumberMin 22000

2: MajorVersion = 10, MinorVersion = 0, BuildNumberMin = 19041, RevisionMin = 1586,
   BuildNumberMax = 19041, RevisionMax = 3155
2: NOT_SUPPORTED: KphOsVersionInfo.dwBuildNumber 19045 > Configuration->BuildNumberMax 19041

3: MajorVersion = 10, MinorVersion = 0, BuildNumberMin = 19041,
   RevisionMin = 264, BuildNumberMax = 19041, RevisionMax = 1526
3: NOT_SUPPORTED: KphOsVersionInfo.dwBuildNumber 19045 > Configuration->BuildNumberMax 19041

So this seems to just be a small "mixup of build number types" (with kernel build number 19041 vs RtlGetVersion OS build number 19045) and nothing more. This wasn't an issue before because the valid build number range was 19042-22621.

I used the following quick and dirty workaround so I could test f35e749:

if (KphOsVersionInfo.dwBuildNumber > 19041 && KphOsVersionInfo.dwBuildNumber <= 19045) {
    KphOsVersionInfo.dwBuildNumber = 19041;
}

I think a better way to fix this would probably be to use the kernel build number instead of KphOsVersionInfo for these checks (the offsets are determined by the kernel after all, not the registry), but I'll leave this up to you.


Anyway, with above quick hack applied (and therefore loading the 10.0.19041.1586 - 10.0.19041.3155 profile as I assume was intended for this kernel), I can confirm f35e749 now fixes the BSOD in the handle search window on my machine.

Thanks for the quick response! I'll close this issue since it's fixed as far as I'm concerned.

jxy-s commented 1 year ago

Thanks again @Mattiwatti - you're right, it would be best to check the version of the kernel instead of trying to rely on RtlGetVersion. It's annoying that MS has diverged those.

This change will require a new driver, for the sake of tracking I'm going to reopen this.

jxy-s commented 1 year ago

Commit https://github.com/winsiderss/systeminformer/commit/93f5cf1e99ab330e1c70fb752fc3eab4dc2aa5bc will use the full version of ntoskrnl instead of relying on RtlGetVersion.

jxy-s commented 1 year ago

3.0.7029 should resolve 👍