Closed michaelweiser closed 2 years ago
Michael again a massive thank you for this work. This deadlock is a killer and it would be great to swat it and get win10 compatibility closer to where it should be.
Your ideas are all great - just like the way I check for loader lock but there it's so much easier as it's in the peb!
But I think there is a solution to be had here. I found this from 2012 but the idea is spot on: http://workblog.pilin.name/2012/10/how-to-get-x64-dynamic-function-table.html
So I've just tried adding a simple hook for AcquireSRWLockExclusive as for Win10 we needn't worry about the critical section mentioned in that post. However I haven't yet seen it fire. Please do give it a try - once we have the hook working we can set a flag to enable pointer capture, then call the bogus RtlAddFunctionTable to get the pointer, then disable capture, delete the entry then add a check to enter_hook to immediately return 0 if SRWlock held. That's the plan anyway! Let me know if you want a compiled monitor to test with.
HOOKDEF(void, WINAPI, AcquireSRWLockExclusive,
__in PSRWLOCK SRWLock
)
{
int ret = 0;
Old_AcquireSRWLockExclusive(SRWLock);
LOQ_void("misc", "p", "SRWLock", SRWLock);
DebugOutput("AcquireSRWLockExclusive: SRWLock 0x%p", SRWLock);
return;
}
Hi Kevin, I've made some progress although no breakthrough yet.
I've added a call to RtlAddFunctionTable()
to capemon's DllMain()
and I can see the hook on AcquireSRWLockExclusive()
learning the address of an SRW from that call. So the mechanics seem to be working in principle.
But unfortunately it's the wrong SRW, i.e. LdrpMrDataLock
, and RtlAddFunctionTable()
gets stuck trying to acquire it. My guess is that it's held by a routine that calls capemon's DllMain()
. A DLL that's being initialised going around modifying the function table is likely not something Microsoft wants people to do. :/
Here's a screenshot of WinDbg of that state with output from the analyser:
The code is at https://github.com/michaelweiser/capemon/commit/36fb25b72c47f6057e39f07c8c227af4353c1758.
So we'd likely need to defer the SRW discovery to a routine that's not part of DLL initialisation itself. Any idea of a good candidate?
Other stuff I've learned:
DebugOutput()
's because that's taking out locks on SRWs as well, worst case creating infinite recursion if there's a DebutOutput()
in AcquireSRWLockShared()
. DebugOutput()
being called while our discovery flag is switched on will make us learn the address of the wrong SRW.enter_hook
as before so we don't trigger calls to RtlLookupFunctionEntry
too early - not all of that will eventually be necessary, I assumeour_stackwalk()
until we've learned the SRW - it never really came into play up to nowI've also tried instead calling RtlLookupFunctionEntry()
from DllMain()
but apparently didn't provide it with the right parameters because it didn't even try to take out a lock on any SRW. If you have an idea how we could construct a stack frame for it to unwind which would reliably trigger use of the SRW, this might be a nice approach because it'd work without modifying any system structures like RtlAddFunctionTable()
does.
Trying a totally different approach, I've also attempted to learn the address of LdrpInvertedFunctionTableSRWLock
from calls to AcquireSRWLockShared()
by RtlLookupFunctionEntry()
from our_stackwalk()
in hopes that there would be some non-deadlocking ones first which take out the SRW in question before running into the case of that lock already being held exclusively. Unfortunately we don't seem to be that lucky and it seems to deadlock on the first call already.
I have a lingering suspicion that some of the above stuff didn't work for me because I ran afoul of the complexity of the hook call chains involved and was just staring at bogus output or not reading it right. So any critical feedback on and maybe testing of the code is highly welcome.
I'll be away starting tomorrow all next week so don't be alarmed if I fall silent for some time. @jack28 is in the loop what we're trying to achieve and can maybe do some additional tests.
Looking at this again today, I read the disassembly of RtlAddFunctionTable()
not to use LdrpInvertedFunctionTableSRWLock
at all. So this approach could really only be used to get at LdrpMrdataLock
and RtlpDynamicFunctionTableLock
.
RtlLookupFunctionEntry()
on the other hand seems to have the equivalent of AcquireSRWLockShared/Exclusive()
inlined when accessing LdrpInvertedFunctionTableSRWLock
which would explain why it does not appear in the backtrace and directly ends up at NtWaitForAlertByThreadId()
. So hooking that function might get us the SRW but would require to get it exclusively locked first.
Searching the disassembly of ntdll
in WinDbg reveals that RtlRemoveInvertedFunctionTable()
accesses LdrpInvertedFunctionTableSRWLock
unconditionally and uses RtlAcquireSRWLockExclusive()
to do so. This is likely going to be my next line of attack on the problem then. First step would be to figure out its prototype and call semantics, e.g. if it will gracefully fail if called with invalid data which would be ideal for our purpose.
Hi Michael, I too have been exploring a couple of possibilities; the most promising is using yara to locate RtlInsertInvertedFunctionTable and hence LdrpInvertedFunctionTableSRWLock, since we already have it compiled into the monitor. This would be nice and unintrusive, find via yara then add check to stackwalk function... I'll let you know how I get on...
After some more digging, calling RtlInsertInvertedFunctionTable()
and RtlRemoveInvertedFunctionTable()
directly is off the table because they're not exported from ntdll
. All other functions calling them are either not exported themselves or way too high-level already.
Before spending more time on this I'd love to hear if you found a way to discover those functions or the SRW at runtime. Because they don't seem to be identified by name or any other means within the DLL (because they're not exported), I could only think of using the debug symbols (which differ and would need to be downloaded for every build of ntdll) or matching some distinctive code sequence, e.g. cmpxchg
being used to access the SRW and its surroundings (but which might also change depending on compiler flags used by upstream when compiling it). There's very likely a relocation for each private function and the SRW in the DLL image but those can't be distinguished from all the others either.
I'm now wondering if it might actually be simpler to try and use the canary thread approach suggested by the dotnet guys, i.e. have another thread call RtlLookupFunctionEntry()
just to see if that blocks and if it does not report back within a couple of microseconds refrain from calling the function from our_stackwalk()
. Once the hooked function in question finishes and releases the SRW, the canary thread should then unblock automatically as well.
Hi Michael, I've managed to avoid the deadlock by using yara to locate RtlInsertInvertedFunctionTable and hence LdrpInvertedFunctionTableSRWLock. Here's a test build of capemon for you to try - let me know if it works! capemon_x64.zip
Hi Kevin, I've tried it. Unfortunately, it still hangs but at a different point. See below screenshot for the call stack.
Unfortunately, your build doesn't include symbols. So I can not do much more digging into it to tell the reason for the hang. What I can tell is that RtlLookupFunctionTable
is called and tries to acquire ntdll!LdrpInvertedFunctionTableSRWLock
and gets stuck once more.
Can you provide a debug build or, even better, the source code of your modifications?
BTW: I'll be unavailable next week but once again @Jack28 is in the loop and happy to try out what you throw at him.
Aha I've just realised yara is disabled with office settings - can you try launching without?
Happy to share source of course - bit of a mess currently as it's been hacked together but I'll try and tidy it up so I can share
LdrpInvertedFunctionTableSRWLock.zip Here is the code
@jack28 and now I have been looking into this again. In addition to setting the office
option to 0
in the doc
and doc2016
packages we needed to remove data\yara\capemon.yac
to get the new rule added to that cache(?). With that we see:
2021-07-20 12:36:44,256 [root] DEBUG: YaraInit: Compiled rules loaded from existing file C:\word-2016-dbg\data\yara\capemon.yac
2021-07-20 12:36:44,256 [root] DEBUG: YaraScan: Scanning 0x00007FFBE0E10000, size 0x1f5000
2021-07-20 12:36:44,271 [root] DEBUG: YaraScan: AccessibleSize 0x1f5000
2021-07-20 12:36:44,287 [root] DEBUG: YaraScan rule did not match.
[...]
2021-07-20 12:36:44,302 [root] DEBUG: YaraScan rule did not match.
2021-07-20 12:36:44,302 [root] DEBUG: YaraScan hit: RtlInsertInvertedFunctionTable
2021-07-20 12:36:44,318 [root] DEBUG: YaraScan string match: $function (0x1090e)
2021-07-20 12:36:44,318 [root] DEBUG: RtlInsertInvertedFunctionTable 0x00007FFBE0E2090E, LdrpInvertedFunctionTableSRWLock 0x00007FFBE0F7B4F0
2021-07-20 12:36:44,318 [root] DEBUG: YaraScan: successfully scanned 0x00007FFBE0E10000
The address correctly comes back as ntdll!LdrpInvertedFunctionTableSRWLock+0
in WinDbg.
Great stuff! I'd love to understand how exactly you extract the offset (relative to what?) from the lea
instruction using
LdrpInvertedFunctionTableSRWLock = (PVOID)((PBYTE)RtlInsertInvertedFunctionTable + *(DWORD*)((PBYTE)RtlInsertInvertedFunctionTable + 3) + 7);
but I can accept with my limitations since it works. ;)
Winword still doesn't fully start though. Looking at the source code it seems the logic is not fully active yet, ending up in disabling hook cycle detection. Should we try and dig into that or are you on it already?
Or is the code
if (*(PVOID*)LdrpInvertedFunctionTableSRWLock)
return TRUE;
meant to be a lightweight version of
//if (TryAcquireSRWLockExclusive((PSRWLOCK)&LdrpInvertedFunctionTableSRWLock)) {
// ReleaseSRWLockExclusive((PSRWLOCK)&LdrpInvertedFunctionTableSRWLock);
// return FALSE;
//}
?
For the latter I wonder whether its potential failure to function may be due to PVOID LdrpInvertedFunctionTableSRWLock
being a (typeless) pointer to the SRWLOCK
structure (aka PSRWLOCK
) already and taking the address of that pointer variable again ((PSRWLOCK)&LdrpInvertedFunctionTableSRWLock
) makes the function work with the wrong address. I may just have knots in my brain though. :)
I found that this issue was affecting a range of 64-bit programs, particularly gui apps. As I still haven't a 64-bit Office to test with, I decided to look at another easier app that was similarly affected to test on: x64dbg. With this as a test case, the code committed in https://github.com/kevoreilly/capemon/commit/aa1fd55c13b09aca976d0864470f05e31a2f7794 fixes the issue. The app no longer hangs on this deadlock and loads on 64-bit Win10:
So I am led to think that this specific issue is solved, and that therefore a separate distinct issue is now preventing Word from opening. Certainly the logic committed is tested and working, the commented-out test code with the TryAcquireSRWLockExclusive api was just that; I tested this and it didn't work. When I tried testing the 'raw' lock value that worked, so I went with that.
As for the code to locate the lock, if you look at the disassembly of RtlInsertInvertedFunctionTable, the logic relies entirely on this line:
The last 4 bytes of this instruction represent little-endian offset so 0x13954F. This is relative to the end of the instruction which is the VA 0x7FFC71CE2F8A plus the size 7, so 0x7FFC71CE2F91. Adding the two gives 0x7FFC71E1C4E0 which is the address of the lock:
Please let me know if you agree that this issue is fixed, then we can perhaps create a new issue for that which now prevents Word from launching.
Any feedback on this? I am still of the mind that this specific issue is fixed...
Thanks for your update and explanations! Doing some tests with Winword 2013 and 2016, the new logic reliably prevents the deadlock on startup. Both crash soon thereafter but those seem to be unrelated problems. I'll try and dig into those and report any separate issues I may be able to identify. I think this issue here can be closed. Thanks again!
Well worth trying options like minhook=1 or other hook exclusions to try to rule out a hooking issue.
No change in behaviour with minhook=1 in a quick try. Will dig some more and report in a separate issue if I come up with anything.
Hello @kevoreilly ! I believe that I've identified another case of this issue still reproducing in another x64-compiled binary. Here is the C code necessary for repro:
#define _WIN32_WINNT 0x0400
#define _WIN32_DCOM
#include <stdio.h>
#include <tchar.h>
#include <windows.h>
#include <wbemidl.h>
#include <combaseapi.h>
int main(void)
{
HRESULT hr = 0;
// BSTR strings we'll use (http://msdn.microsoft.com/en-us/library/ms221069.aspx)
BSTR resource = SysAllocString(L"ROOT\\CIMV2");
BSTR language = SysAllocString(L"WQL");
BSTR cpu_query = SysAllocString(L"SELECT * FROM Win32_Processor");
// initialize COM
wprintf(L"%s\r\n", L"[*] Initialize COM ...");
hr = CoInitializeEx(0, COINIT_MULTITHREADED);
wprintf(L"%s\r\n", L"[*] Initialize COM Security ...");
hr = CoInitializeSecurity(NULL, -1, NULL, NULL, RPC_C_AUTHN_LEVEL_DEFAULT, RPC_C_IMP_LEVEL_IMPERSONATE, NULL, EOAC_NONE, NULL);
// COM interface pointers
IWbemLocator* locator = NULL;
IWbemServices* services = NULL;
IEnumWbemClassObject* results = NULL;
// connect to WMI
wprintf(L"%s\r\n", L"[*] Connect to WMI ...");
hr = CoCreateInstance(&CLSID_WbemLocator, 0, CLSCTX_INPROC_SERVER, &IID_IWbemLocator, (LPVOID*)&locator);
hr = locator->lpVtbl->ConnectServer(locator, resource, NULL, NULL, NULL, 0, NULL, NULL, &services);
// issue a WMI query
wprintf(L"%s\r\n", L"[*] Execute query ...");
hr = services->lpVtbl->ExecQuery(services, language, cpu_query, WBEM_FLAG_BIDIRECTIONAL, NULL, &results);
// list the query results
if (results != NULL) {
IWbemClassObject* result = NULL;
ULONG returnedCount = 0;
wprintf(L"%s\r\n", L"CPU data:");
// enumerate the retrieved objects
while ((hr = results->lpVtbl->Next(results, WBEM_INFINITE, 1, &result, &returnedCount)) == S_OK) {
VARIANT name;
VARIANT speed;
// obtain the desired properties of the next result and print them out
hr = result->lpVtbl->Get(result, L"Name", 0, &name, 0, 0);
hr = result->lpVtbl->Get(result, L"MaxClockSpeed", 0, &speed, 0, 0);
wprintf(L"%s, %dMHz\r\n", name.bstrVal, speed.intVal);
// release the current result object
result->lpVtbl->Release(result);
}
}
// release WMI COM interfaces
results->lpVtbl->Release(results);
services->lpVtbl->Release(services);
locator->lpVtbl->Release(locator);
// unwind everything else we've allocated
CoUninitialize();
SysFreeString(cpu_query);
SysFreeString(language);
SysFreeString(resource);
system("pause");
return 0;
}
wbemuuid.lib` needs to be added in Linker/Input in VS project properties.
Compiling and running the binary under loader leads to a hang:
Again, as above, deactivating the NtProtectVirtualMemory
hook leads to expected behavior:
I've tried various things to solve the issue.
Firstly, I wanted to see how many times the enter_hook
method is called by adding a logging message:
By only doing this small change, the code hangs again. Which leads me to think that sending data to the debugger is also interfering with the execution?
Trying to modify the allow_hook_recursion
to 0 is not changing the outcome.
As a workaround, I was thinking about implementing hooks for VirtualProtect and VirtualProtectEx, and disable the ntdll hooks, at least for windows 10 x64.
Ultimately, I am thinking that the regexes used in YaraHarness.c
looking for RtlInsertInvertedFunctionTable
are not matching this type of execution?
As a side note, looking for the system processes and installed application using WMI, exposes the sandbox-related processes and applications to the malware process. At the moment, these are protected only against listing the processes with CreateToolHelp32Snapshot
WIN API. This means that querying WMI is a viable cape sandbox evasion. Implementing behavior-changing hooks against WMI queries should mitigate this.
I will attempt a pull request w.r.t the above observation.
Ultimately, thank you for this great sandbox!
Edit: typos.
Thank you @RaduEmanuel92 for reporting this issue. It is the same as issue #49 reported today, now fixed in https://github.com/kevoreilly/capemon/commit/65f4e2f8761601855fe0c98a78821ace5b58aab8
Starting a 64-bit Winword on current Windows 10 under observation by capemon gets stuck very early on. Attaching to the stuck
winword.exe
shows the following call stack:Apparently,
ntdll!RtlInsertInvertedFunctonTable
is called and (according to disassembly of the function) exclusively acquires Slim Reader/Writer lock (SRW)ntdll!LdrpInvertedFunctionTableSRWLock
. This would make sense as the function is likely to modify import tables. While holding the lock, it callsntdll!LdrProtectMrData
which, according to my debugging, eventually callsntdll!NtProtectVirtualMemory
, likely to protect access to those tables (again). Sincentdll!NtProtectVirtualMemory
is hooked by capemon, this triggerscapemon_x64!enter_hook
in order to decide whether to enter the capemon hook for that function or not. This decision is, amongst other things. based on whether the hook is itself called from a hook. To determine this, the stack is unwound by callingcapemon_x64!our_stackwalk
which on x64 is implemented usingntdll!RtlLookupFunctionEntry
. Unfortunately, this function does not appear to be safe for this kind of thing because it also acquires the already exclusively heldntdll!LdrpInvertedFunctionTableSRWLock
. This leads to the observed deadlock.Disabling hooking of
ntdll!NtProtectVirtualMemory
without involvement of stack unwinding mitigates the issue:This, however, leaves a massive blindspot regarding all calls to that function. This also only works because
ntdll!NtProtectVirtualMemory
appears to be the only hooked function called whilentdll!LdrpInvertedFunctionTableSRWLock
is held.Trying to hook
ntdll!RtlInsertInvertedFunctonTable
to temporarily disable hooking for all other APIs called from it (again without involving stack-based decisions) have not been successful because the hook does not seem to be called:Another idea I had (and found elsewhere: https://microsoft.public.win32.programmer.kernel.narkive.com/qxCAoEXI/using-rtllookupfunctionentry-for-profiling) was to try to acquire the lock from
our_stackwalk
to see if it was held or free andRtlLookupFunctionEntry
would block or not. Unfortunately, the symbol is not exported fromntdll
, so I cannot get at its address.Other projects have run into this problem as well and proposed a number of solutions, e.g.: dotnet/runtime#32286, dynamorio/drmemory#1222
The issue seems to be somewhat Windows-10-specific, because the same capemon_x64.dll is able to start up and monitor the same version of 64-bit Winword on a 64-bit Windows 7 without above workarounds. My guess is that import table mechanics, at least regarding memory protections on them, have changed between Windows 7 and Windows 10. I have not analysed the differences in detail though.
Is my understanding of the mechanics at play correct? Could my attempts at hooking
RtlInsertInvertedFunctionTable
or inspecting the state ofLdrpInvertedFunctionTableSRWLock
fromour_stackwalk
be made to work somehow? Any ideas what else could be done about this issue?Thanks!