microsoft / Detours

Detours is a software package for monitoring and instrumenting API calls on Windows. It is distributed in source code form.
MIT License
5k stars 978 forks source link

Using Detours if either the .exe or the .dll is within a directory with unicode characters? #283

Open sp00n opened 1 year ago

sp00n commented 1 year ago

I've run into an issue when I want to detour a function call of an .exe that is within a directory that contains unicode (UTF-16) characters.

As already described in this closed issue (#159), the DetourCreateProcessWithDllEx or DetourCreateProcessWithDlls functions only take LPCSTR arguments, so there is no unicode character support. So is there any way to pass the path to the .exe or .dll if say the path is something like E:\test\Bölükbaşı? I've found it to work for more "simple" directories with non-ANSI characters like E:\test\Chäröcter Teßt or E:\test\Bjørn, but it fails for the more complex ones as the first example (and also for e.g. Chinese and probably Arabic characters).

This seems like a pretty hefty downside of the library if it is limited to (maybe extended) ANSI characters, whereas Windows has been long designed around unicode / UTF-16 file system support.

lostmsu commented 10 months ago

I found a workaround for this issue: you can use GetShortPathName function to generate a DOS-compatible name, and that name will work.

sp00n commented 10 months ago

I found a workaround for this issue: you can use GetShortPathName function to generate a DOS-compatible name, and that name will work.

If I remember correctly, I did try that during my tests, and while it worked for some directories, some still threw an error.

lostmsu commented 10 months ago

@sp00n if I'd be guessing I'd say it would not work for relative paths, but should have worked for absolute ones.

LeoDavidson commented 10 months ago

Short names are not always turned on in the filesystem with Windows these days.

They can be useful as one possible workaround, but they won't always be available. It depends how the filesystem was formatted. Maintaining the short names, and testing for two names instead of one each time, slows down filesystem operations slightly, which I think is the main reason they aren't on all the time. That and not much software requires them anymore.

The default for Windows seems to have changed back and forth a few times in recent years, in my experience, but maybe there's more to it (e.g. Home vs Pro versions could be a factor). I don't know the exact rules for when they're on by default, as they seem to change, but people can also override that and turn them on or off if they wish. So you cannot depend on 8.3 names existing these days.

lostmsu commented 10 months ago

@LeoDavidson AFAIK by default they are on on NTFS and AFAIK ReFS can not be booted from. I can't imagine even 0.1% of all Windows users turning them off, so it should be a fine workaround. And the 0.1% users can turn them back on if they really need my software.

But this is still just a workaround. A proper fix should be implemented eventually one way or another.

LeoDavidson commented 10 months ago

They are not always on by default these days, and also not always on for every drive in a system either. That's the problem.

This SuperUser post has an answer saying Windows 8 and Server 2012 turned short names off by default, which fits with my memory of when it changed:

https://superuser.com/questions/1505174/how-comes-that-short-filenames-8-3-are-created-in-one-partition-and-not-in-ano

It's possible Microsoft reversed the decision again in later OS but I'm not sure. You can't depend on them being there in any case.

sp00n commented 10 months ago

Ah, I think it was disabled by default on my NVMe drives, which is why I quickly abandoned it.

// Edit My HKLM\System\CurrentControlSet\Control\FileSystem\NtfsDisable8dot3NameCreation setting is set to 2, which according to the SuperUser link above means it "Sets 8dot3 name creation on a per volume basis". And using the fsutil 8dot3name query DRIVELETTER: command revealed that only some of my drives/partitions have it enabled, for no apparent reason (that I can think of). The SATA SSD boot drive has it enabled, two of my SATA HDDs don't, but the third one does, for the NVMe it's disabled, as well as for two attached USB drives. 🤷‍♂️

lostmsu commented 10 months ago

@sp00n AFAIK off on ReFS

sp00n commented 10 months ago

@sp00n AFAIK off on ReFS

All my partitions are in NTFS though.

awakecoding commented 7 months ago

I think I'm hitting the same issue with https://github.com/Devolutions/MsRdpEx and after fixing all of the ANSI function calls in my code, I noticed that even after I called DetourCreateProcessWithDllExW I was stuck with an ANSI DLL path that clearly didn't get properly converted from UTF-8 to UTF-16 internally. This limits the ability of launching my program from a path that isn't ANSI compatible. Has anyone worked on patching Detours to handle UTF-16 DLL paths? I couldn't isolate the specific function in the code that would use it without internal conversions yet.

awakecoding commented 7 months ago

It looks like the string should be encoded using the system code page, because I just switched my system to the UTF-8 "code page" 65001, and I could successfully load my DLL using the UTF-8 file path. Just run the following from an elevated shell:

reg add "HKLM\SYSTEM\CurrentControlSet\Control\Nls\CodePage" /v ACP /t REG_SZ /d 65001 /f

No reboot is required, it'll work right away. Why this isn't the default even on Windows 11 23H2 is beyond me. As for fixing it inside Detours without touching system settings, I guess it may be possible to detect the current system code page and automatically convert from UTF-8 to the current one, but I fear many conversions would be lossy.

Maybe someone knows if it is possible to force the code page of the newly spawned process to UTF-8?

weltkante commented 5 months ago

Why this isn't the default even on Windows 11 23H2 is beyond me.

Probably because it would break a lot of things? the system codepage determines how non-unicode data is interpreted, in particular files, and changing the system codepage will potentially make a lot of files unreadable/corrupted if they weren't saved in a unicode format.

Its a change best done by a user when they setup a new Windows machine before applications start saving data in local files. Doing this change as an application developer for anything else than the application you develop is dangerous and risks corrupting user data.

Maybe someone knows if it is possible to force the code page of the newly spawned process to UTF-8?

Depending on how much control you have over the thing you launch, you can try specifying it in the manifest: https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page

But note that if the application works with non-unicode files (or other data sources) the above compatibility issues may hit the application and cause non-UTF8 data to be read as UTF8 instead, corrupting the data from the applications view.