Windows paths have been WTF-16-encoded for decades, but module names
in the import tables of Portable Executable are octets.
If a name contains values beyond ASCII — technically out of spec — then
the dynamic linker must somehow decode those octets into Unicode in order
to construct a lookup path. There are multiple ways this could be done,
and the most obvious is the process’s active code page (ACP), which is
exactly what happens. As a consequence, the specific DLL loaded by the
linker may depend on the system code page. In this article I’ll contrive
such a situation.
LoadLibraryA is a similar situation, and potentially applies the code
page to a longer portion of the module path. LoadLibraryW is
unaffected, at least for the directly-named module, because it’s Unicode
all the way through.
For my contrived demonstration I came up with two names that to
English-reading eyes appears as two words with extraneous markings:
õral.dll: CP-1252="C3 B5 …"
õral.dll: CP-1252="F5 …"; UTF-8="C3 B5 …"
Both end with ral.dll. I’ve included the CP-1252 encoding for the
differing prefixes, and the UTF-8 encoding for the second. I’m using
CP-1252 because it’s the most common system code page in the world,
especially the Western hemisphere. Due to case insensitivity, the actual
DLL may be named ãµral.dll — i.e. to match the second library case — but
the module name must be encoded as uppercase when building the import
library. Alternatively the second could be Õral.dll, particularly
because I won’t use it when constructing an import library.
The plan is to store the octets C3 B5 … in the import table. A process
using CP-1252 decodes it to õral.dll. In the UTF-8 code page it decodes
to õral.dll. For testing we can use an application manifest to
control the code page for a particular PE image — a lot easier than
changing the system code page. Otherwise, this trick could dynamically
change the behavior of a program in response to the system code page
without actually inspecting the active code page.
The libraries will have a single function get, which returns a string
indicating which library was loaded:
Constructing the import library can be tricky because you must consider
how the toolchain, editors, and shells decode and encode text, which may
involve the build system’s code page. It’s shockingly difficult to script!
Binutils dlltool cannot process these names and cannot be used at all.
With bleeding edge w64devkit I could reliably construct the DLLs and
import library like so, even in a script (Windows 10 and later only):
That produces two DLLs and one import library, detect.lib, with the
desired module name octets. A straightforward MSVC cl invocation also
works so long as it’s not from a batch file. It will quite correctly warn
about the strange name situation, which I like. My test program, main.c:
I designed peports to print non-ASCII octets unambiguously
(\xXX), and it’s the only tool I know that does so:
$ peports main.exe | tail -n 2
\xc3\xb5ral.dll
1 get
The module name has the C3 B5 … prefix octets. When I run it under my
system code page, CP-1252:
If I add a UTF-8 manifest, even just a “side-by-side” manifest, it
loads the other library despite an identical import table:
$ cc -o main.exe main.c detect.lib libwinsane.o
$ ./main
UTF-8
Again, without the manifest, if I switched my system code page to UTF-8
then UTF-8 would still be the result.
I can’t think of much practical use for this trick outside of malware. In
a real program it would be simpler to inspect code page, and there’s no
benefit to avoiding such a check if it’s needed. Malware could use it to
trick inspection tools and scanners that decode module names differently
than the dynamic linker. Such tools often incorrectly assume UTF-8, which
is what motivated this article.
Windows dynamic linking depends on the active code page
https://ift.tt/wC5DZVQ
nullprogram.com/blog/2024/10/07/
Windows paths have been WTF-16-encoded for decades, but module names in the import tables of Portable Executable are octets. If a name contains values beyond ASCII — technically out of spec — then the dynamic linker must somehow decode those octets into Unicode in order to construct a lookup path. There are multiple ways this could be done, and the most obvious is the process’s active code page (ACP), which is exactly what happens. As a consequence, the specific DLL loaded by the linker may depend on the system code page. In this article I’ll contrive such a situation.
LoadLibraryA is a similar situation, and potentially applies the code page to a longer portion of the module path. LoadLibraryW is unaffected, at least for the directly-named module, because it’s Unicode all the way through.
For my contrived demonstration I came up with two names that to English-reading eyes appears as two words with extraneous markings:
õral.dll
: CP-1252="C3 B5 …"
õral.dll
: CP-1252="F5 …"
; UTF-8="C3 B5 …"
Both end with
ral.dll
. I’ve included the CP-1252 encoding for the differing prefixes, and the UTF-8 encoding for the second. I’m using CP-1252 because it’s the most common system code page in the world, especially the Western hemisphere. Due to case insensitivity, the actual DLL may be namedãµral.dll
— i.e. to match the second library case — but the module name must be encoded as uppercase when building the import library. Alternatively the second could beÕral.dll
, particularly because I won’t use it when constructing an import library.The plan is to store the octets
C3 B5 …
in the import table. A process using CP-1252 decodes it toõral.dll
. In the UTF-8 code page it decodes toõral.dll
. For testing we can use an application manifest to control the code page for a particular PE image — a lot easier than changing the system code page. Otherwise, this trick could dynamically change the behavior of a program in response to the system code page without actually inspecting the active code page.The libraries will have a single function
get
, which returns a string indicating which library was loaded:Constructing the import library can be tricky because you must consider how the toolchain, editors, and shells decode and encode text, which may involve the build system’s code page. It’s shockingly difficult to script! Binutils
dlltool
cannot process these names and cannot be used at all. With bleeding edge w64devkit I could reliably construct the DLLs and import library like so, even in a script (Windows 10 and later only):That produces two DLLs and one import library,
detect.lib
, with the desired module name octets. A straightforward MSVCcl
invocation also works so long as it’s not from a batch file. It will quite correctly warn about the strange name situation, which I like. My test program,main.c
:I link
detect.lib
when I build it:I designed
peports
to print non-ASCII octets unambiguously (\xXX
), and it’s the only tool I know that does so:The module name has the
C3 B5 …
prefix octets. When I run it under my system code page, CP-1252:If I add a UTF-8 manifest, even just a “side-by-side” manifest, it loads the other library despite an identical import table:
Again, without the manifest, if I switched my system code page to UTF-8 then
UTF-8
would still be the result.I can’t think of much practical use for this trick outside of malware. In a real program it would be simpler to inspect code page, and there’s no benefit to avoiding such a check if it’s needed. Malware could use it to trick inspection tools and scanners that decode module names differently than the dynamic linker. Such tools often incorrectly assume UTF-8, which is what motivated this article.
via null program
October 12, 2024 at 03:16PM