Closed michaelweiser closed 1 year ago
Yes! It makes perfect sense. I now can't believe I didn't see this before. Thanks for shedding light - again, two issues here.
Looks like the underlying issue has been there for a long time - both from when I added the srw_lock_held()
function and before this (a long time ago) when I added the __try() __except()
by returning 0 and not considering this possibility. Not sure why it hasn't come up before, and disappointed in myself for not seeing that returning 0 has the potential for stack recursion.
While I think the CreateThreadBreakpoints()
function could be improved by potentially avoiding always calling GetThreadId()
, I guess the real point is that it is just triggering this underlying bug in enter_hook()
, essentially bringing it to the fore where previously it lay hidden.
Your workaround could in fact be called a fix. I will implement it (or a variant thereof) and get it tested and published asap. I may try and improve CreateThreadBreakpoints()
while I'm there.
A massive thanks for this work.
Fix now pushed - thank you again!
Hi,
I'm seeing stack overflow exceptions on Windows 10 with even the simplest program doing a single API call:
Unfortunately, I was not able to grab any meaningful backtrace beyond it happening in
enter_hook()
andoperate_on_backtrace()
. Through single stepping the code I think to have found the root cause but can only describe it verbally with links into the capemon source code:enter_hook()
calls__called_by_hook()
to prevent hook recursion: https://github.com/kevoreilly/capemon/blob/e62f1a43736f1ae64de918630a29e02ad2b2f3e5/hooking.c#L293__called_by_hook()
runsaddr_in_our_dll_range()
viaoperate_on_backtrace()
: https://github.com/kevoreilly/capemon/blob/e62f1a43736f1ae64de918630a29e02ad2b2f3e5/hooking.c#L181operate_on_backtrace()
in the 64 bit version runsour_stackwalk()
to retrieve the number of strack frames to look at: https://github.com/kevoreilly/capemon/blob/e62f1a43736f1ae64de918630a29e02ad2b2f3e5/hooking_64.c#L1168our_stackwalk()
will return zero if the SRW lock is held or an EXCEPTION_EXECUTE_HANDLER exception occurs (I'm fuzzy on the details of the latter): https://github.com/kevoreilly/capemon/blob/e62f1a43736f1ae64de918630a29e02ad2b2f3e5/hooking_64.c#L1126 https://github.com/kevoreilly/capemon/blob/e62f1a43736f1ae64de918630a29e02ad2b2f3e5/hooking_64.c#L1150operate_on_backtrace()
to never calladdr_in_our_dll_range()
and will default to returning zeroThis in the context of
__called_by_hook()
means thatenter_hook()
was not triggered from another hook. This essentially creates potential for unwanted hook recursion whenever the SRW lock is held or that execution exception occurs during stack unwinding.This seems to quite reliably be triggered and turned into infinite recursion by the Debugger:
__called_by_hook()
having toldenter_hook()
that it was not called by a hook,api_dispatch()
is calledapi_dispatch()
may (and in my observation basically always does) callInitNewThreadBreakpoints()
InitNewThreadBreakpoints()
callsCreateThreadBreakpoints()
CreateThreadBreakpoints()
callsGetThreadId()
GetThreadId()
internally (at least on Windows 10) callsNtQueryInformationThread()
-> which is hookedThis causes instantaneous inifinite hook recursion on any hooked API call (at the very least if the SRW lock is held), leading to the observed stack overflow.
To recap, the call chain is:
/any API call/ -> [recurse:
enter_hook()
+__called_by_hook()
== 0 ->api_dispatch()
->InitNewThreadBreakpoints()
->CreateThreadBreakpoints()
->GetThreadId()
->NtQueryInformationThread()
]My workaround looks like this:
What this does is make
our_stackwalk()
indicate the inability to walk the stack at all by returning-1
. This will still makeoperate_on_backtrace()
not calladdr_in_our_dll_range()
but the changed return code default of-1
will again indicate that fact to the caller. The only caller evaluating the return code at all is__called_by_hook()
. There we now cautiously return1
, meaning "yes, we've been or at least could have been called from a hook". This successfully prevents the infinite recursion and subsequent stack overflow in my tests.Does any of that make sense?