Open yjugl opened 1 week ago
cc @rnk and @bergeret who reviewed the ShadowExceptionHandler
patch
Hans (@zmodem) has been tackling a fair number of these issues recently.
Can we deescalate the war of interception? What happens if we stop intercepting C string routines in ntdll? I'm sure we have some critical ntdll interceptors, but memset, str* and others are mostly bug-finding interceptors, not functional allocator rerouting interceptors. To me it is very clear that we should trade away some false negatives in third party code to improve ASan reliability and robustness to Windows updates.
In general, this category of interceptor re-entrancy is a common sanitizer failure mode on all platforms, and why ASan avoids "libc" APIs on other platforms. It only happens that on Windows, "libc" APIs are usually layered above win32 APIs, but that assumption has been violated after this change because the EH machinery (RtlDispatchException) calls memset.
CC @barcharcraz who works on the sanitizers at MS, and who is upstreaming their currently downstream tweaks.
The title and the first comment are misleading. I have only tested with 26100.2161, but Gov Maharaj from Microsoft correctly figured that these changes are in fact present in all released versions of Windows 11 24H2 so far (starting with 26100.1). Updating accordingly.
What happens if we stop intercepting C string routines in ntdll?
I'm not even sure that the current behavior was intentional? When intercepting a function, the asan runtime essentially looks for and intercepts all instances it can find. I'm guessing that it looks in ntdll because we want to intercept certain functions there, and ntdll's crt functions just got sucked into the general interception machinery.
If we stop intercepting those functions, I suppose we might lose some checks on win32 api functions though. For example, assuming that SetWindowText
uses ntdll's strlen internally (I don't know if it does), today asan would flag if the string arg is poisoned memory. Not sure if this counts as a significant loss though.
Not intercepting the ntdll "minicrt" functions sounds reasonable to me, I'll take a look.
Noting here that this patch, applied on top of the two patches from #111638, is enough to get an ASAN build of Firefox running. It works by making the memset instrumentation force-commit shadow memory before looking at it. This patch may not be the ideal solution to the problem, because any call to memset will now produce a call to VirtualAlloc even when the shadow memory is already commited. Still thought it would be worth sharing in case it can help unstuck a situation for someone.
Edit: Link expired -- replaced by a link to Phabricator.
ntdll
ntdll.dll 10.0.26100.2161 from Windows 11 24H2 26100.2161 differs from previous versions of ntdll.dll in a subtle way. [edit: In fact all Windows 11 24H2 since 26100.1 have this difference -- thank you Gov Maharaj for figuring this out.]
Previously RtlDispatchException would almost directly reach into the vectored exception handlers, but now two buffers of respective sizes 0x58 and 0xD8 are memset to zero before reaching into the vectored exception handlers:
Now let me detail why we care here.
memset interception with ASAN
When compiler-rt ASAN instrumentation is in place, memset is replaced for instrumentation purposes. So any memset will go through:
In particular,
__asan_region_is_poisoned
will access the shadow memory corresponding to the region that we memset, in order to check if the region is poisoned.Shadow memory lazy commit on Win64
On Win64, shadow memory pages are first allocated as
MEM_RESERVE
. They are dynamically turned toMEM_COMMIT
on demand -- meaning that we rely on an exception handlerShadowExceptionHandler
to change the status of the page when we fail to access a reserved shadow memory page because it is not yet commited. This change was pushed by this revision.Putting it together
The Win64 memset interception in ASAN is incompatible with ntdll 10.0.26100.2161. As soon as a first access violation gets raised because a shadow memory page is reserved but not committed, we immediately reach a call to memset before we get a chance to reach the
ShadowExceptionHandler
. This call to memset itself triggers a new access violation and a new call to memset, etc. This is a neverending cycle, until eventually we overflow the stack.Related Firefox bug here.