ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.76k stars 2.54k forks source link

Prefer depending on NtDll rather than kernel32 or other higher level DLLs #1840

Open daurnimator opened 5 years ago

daurnimator commented 5 years ago

Proposal Clarification


"windows" as most people think of it is a stack of

You don't have to use all elements of this stack, and there are situations where you want to avoid or are not able to use kernel32.dll and/or user32.dll.

The Zig standard library should not assume that kernel32.dll and user32.dll are unconditionally available.

andrewrk commented 5 years ago

there are situations where you want to avoid or are not able to use kernel32.dll and/or user32.dll.

What situations are these? Does anyone know?

This is an important proposal if we have some use cases to guide it.

suirad commented 5 years ago

Some background for anyone not familiar: ntdll.dll is the only constant that is required for a windows userland process, as its what does the syscalls. kernel32.dll mostly just wraps ntdll and provides what is colloquially known as Winapi. user32 is essentially the kernel32 of Windows gui stuff. The windows userland stack ends up looking like: winlibc (aka fopen()) -> kernel32 (aka CreateFileA()) -> ntdll (aka NtCreateFile()).


A couple examples of when some of these may not be available/wanted:


I would also like to propose having a way to tell what version of windows is being used. Something like builtin.windows_xp / builtin.windows_vista because each changes how some winapi are expressed.

An example of how this would be immediately useful is in creating std.mutex where windows xp and above has CriticalSections available to use, windows vista+ enables use of SRWLock and windows 8+ has WaitOnAddress available, which is the closest thing to a futex windows has. This would result in the mutex itself resolving which of these would be best to use at compile time rather than at runtime.

jayschwa commented 5 years ago

Windows XP

Microsoft stopped maintaining XP nearly 5 years ago. What is Zig's policy on supporting old operating systems?

andrewrk commented 5 years ago

What is Zig's policy on supporting old operating systems?

I don't think it's relevant to the language. But the standard library does not support Windows XP, since it's not supported by Microsoft.

However anyone is free of course to create a third party package for interfacing with Windows XP. And note that much of the standard library has no OS dependencies, and thanks to lazy analysis, you can therefore use the parts of the standard library that have no OS dependencies even when targeting an unsupported OS.

PavelVozenilek commented 5 years ago

There's one very specific use case, software hardened against tampering, where people "include" these standard DLL's (their code sections) inside the executable.

However, this (and driver development) is something very rare, and there's always present danger of feature creep, under-documentation, untested/untestable functionality.

andrew-boyarshin commented 5 years ago

Proposal

OS

libc

Compatibility

libc might impose additional dependencies.

andrewrk commented 4 years ago

I would also like to propose having a way to tell what version of windows is being used. Something like builtin.windows_xp / builtin.windows_vista because each changes how some winapi are expressed.

This is #1907.


I'm accepting this proposal, and to provide further clarification of what such an acceptance looks like, it means:

When the standard library needs to interact with the Windows kernel, if there is ntdll API which provides the necessary components to perform the task correctly, then that is the preferred API to use. Generally, if another DLL such as kernel32 is wrapping ntdll functionality, Zig std lib should prefer NtDll directly.

Once we shave down the non-NtDll dependencies in the std lib, let's see what we have left and re-evaluate from there.

One strategy that can be used to find this information out is ProcMon. ProcMon can reveal when a higher level DLL is calling into a lower level one.

daurnimator commented 4 years ago

Generally, if another DLL such as kernel32 is wrapping ntdll functionality, Zig std lib should prefer NtDll directly.

To me this is like saying that on linux, we should prefer the raw syscall over libc functionality: should we be doing that too?

Also, I'm don't think all ntdll functions work correctly under wine; so that might make things harder for e.g. developers on linux to test.

andrewrk commented 4 years ago

To me this is like saying that on linux, we should prefer the raw syscall over libc functionality: should we be doing that too?

In some places this is true, for example with futexes. But generally, Zig users have a choice of which layer to target, by choosing whether to link libc. When linking libc, it makes sense to target the libc layer, for a few reasons:

However, not targeting the libc layer is also a primary use case, when not linking libc.

Also, I'm don't think all ntdll functions work correctly under wine; so that might make things harder for e.g. developers on linux to test.

The Zig project has already managed to get one bug fix upstreamed into Wine. Let's start with the optimistic attitude of improving the open source communities around us, before we give up and make compromises.

Now, if there are reasons to target kernel32 or other higher level DLLs, let's hear those reasons out. If they are compelling enough, then it probably makes sense to add an additional target configuration option, which is available to observe in std.builtin, and the std lib can decide which layer to prefer. But I'm not yet convinced this is desirable.

adontz commented 4 years ago

I am sorry, if I'll sound a bit harsh, but I have found a few false statements in this thread.

Linking to ntdll instead of kernel32 (as I see, now it is a weird mix) provides no benefits. ntdll is available in kernel mode and user mode, but exports different set of functions for different modes. For kernel mode, i.e. from driver, one will call ZwCreateFile and not NtCreateFile, so exporting user mode ntdll functions for means of driver development is simply a mistake. Such std will not be more useful, useful in more scenarios or something liek that. It's still user-mode only regular application case. Not a kernel driver, not early boot service, nothing like that.

kernel32 is always available to user mode application, it is loaded into process address space even if not linked at all. There are no reasons to avoid using kernel32, because it is always available. kernel32 is not a simple wrapper, function signatures are quite different and calls do not always map one to one. Those who link directly to ntdll will have to reimplement all kinds of argument conversion logic, or deal with all kinds of weird behavior.

Also, while Microsoft is champion of backwards compatibility (no irony, they are great), reading documentation gives us pretty straightforward directions

https://docs.microsoft.com/en-us/windows/win32/api/winternl/nf-winternl-ntcreatefile

Note Before using this function, please read Calling Internal APIs.

https://docs.microsoft.com/en-us/windows/win32/devnotes/calling-internal-apis

The functions and structures in Winternl.h are internal to the operating system and subject to change from one release of Windows to the next, and possibly even between service packs for each release. To maintain the compatibility of your application, you should use the equivalent public functions instead. Further information is available in the header file, Winternl.h, and the documentation for each function.

I do not know how can it be more clear that Nt-prefixed functions should not be used by regular user mode applications. Even if signatures match, parameter meaning or set of valid values may change. Drivers will use different (Zw-prefixed) functions anyway. So there are literally no reasons to avoid kernel32.

Also, paths are not compatible and it's not about maximum allowed length or some fancy prefix. "\Device\Harddisk0" is a valid NT path, but not a valid Win32 path with or without any prefix. There are many obvious and even more subtle differences between NT and Win32 behavior, so user mode std depending on NT calls is a really weird choice.

I can't remember a single language standard library (C, rust, D, .Net) to use these functions, and I've dissected many of them.

andrew-boyarshin commented 4 years ago

@adontz

  1. Kernel mode can also call Nt* versions. Moreover, it is preferred to when data comes from user mode and cannot be trusted.
  2. NTDLL is not used in kernel mode, because NTOSKRNL provides most of the NTDLL's API. Technically there is a common part (RTL) linked into both, difference part (for the function with Nt* name NTDLL usually does syscall, NTOSKRNL actually implements the function), plus a vast number of unique API's in both.
  3. KERNEL32 and KERNELBASE are not present in Native Executable image scenario, although it's hard to deny that scenario is rare.
  4. About DOS and NT paths: it depends on the function really. Some NTDLL functions expect DOS paths and convert them to NT paths internally before doing syscall, some expect already converted NT path.

Honestly, the proposal I've written might be good, but I've already decided against using Zig for low-level Windows development, so I don't really care if Zig stays user-mode only.

upd:

  1. .NET depended on a lot of internal functions around the 3.0-3.5 timeframe. Later they migrated to stable APIs, or, where applicable, just started to bring the code from Windows into their own and adapting to their needs.
daurnimator commented 4 years ago

kernel32 is always available to user mode application, it is loaded into process address space even if not linked at all

Ah, appears they added that in XP for the windows subsystem. Most of the books on windows internals are from the Win 2000 era :( Still, writing native subsystem programs is useful and interesting. midipix has a few examples.

kernel32 is not a simple wrapper, function signatures are quite different and calls do not always map one to one

That's sort of the issue: lots of kernel32 wrappers don't expose the full functionality of the NT API. From the basic stuff like missing flags, to different concepts of paths (nt paths are not null terminated, but length delimited resulting in end users needing tools like RegDelNull) to just plain missing functions (e.g. did you know nt supports fork... but kernel32 doesn't).

Also, while Microsoft is champion of backwards compatibility (no irony, they are great), reading documentation gives us pretty straightforward directions

https://docs.microsoft.com/en-us/windows/win32/api/winternl/nf-winternl-ntcreatefile

Note Before using this function, please read Calling Internal APIs.

https://docs.microsoft.com/en-us/windows/win32/devnotes/calling-internal-apis

The functions and structures in Winternl.h are internal to the operating system and subject to change from one release of Windows to the next, and possibly even between service packs for each release. To maintain the compatibility of your application, you should use the equivalent public functions instead. Further information is available in the header file, Winternl.h, and the documentation for each function.

I do not know how can it be more clear that Nt-prefixed functions should not be used by regular user mode applications. Even if signatures match, parameter meaning or set of valid values may change.

There have only been 2 notable function changes I know of since windows 2000, and they have coincided with major releases: despite Microsoft's statements, they are quite stable. What is unstable is the actual syscall numbers, which change even between monthly windows updates.

andrewrk commented 4 years ago

@adontz, I appreciate your input.

Linking to ntdll instead of kernel32 (as I see, now it is a weird mix) provides no benefits.

There is one very real benefit: the APIs are more powerful. For example, NtCreateFile has the ability to open a sub-directory path based on an open directory handle. The kernel32 APIs do not expose this. It has meaningful consequences in terms of avoiding file system race conditions.

When I first started working on Windows implementations of cross platform abstractions, I took the Microsoft documentation very seriously. But then I found out that pretty much every standard library uses SystemFunction036 from advapi32.dll for getting cryptographically secure random bytes, while Microsoft's docs says to use CryptGenRandom. The docs for CryptGenRandom now say

Important This API is deprecated. New and existing software should start using Cryptography Next Generation APIs. Microsoft may remove this API in future releases.

What I learned is that it's more important to consider ABIs rather than APIs.

Microsoft won't break RtlGenRandom and they won't break NtCreateFile. If they do, they have a lot more programs to worry about breaking than Zig binaries.

The bottom line here is that NtDll is lower-level ABIs to the kernel, and kernel32 is higher level code that wraps NtDll. This is clear when you look at calls in ProcMon. And frankly, Zig's open-source code that wraps NtDll is more robust and reliable than Microsoft's cruft inside kernel32.dll. I don't remember the specifics, but off the top of my head, there are functions in kernel32 that wrap NtDll functions that allocate heap memory, whereas the zig code that wraps NtDll functions does not. That can be the difference between deterministic latency or nondeterministic latency.

At this point I'm convinced that NtDll is an entirely appropriate abstraction layer for windows applications to target. My mind is open to counter-examples, but calls to NtDll functions have been in the zig std lib for quite some time now, and I'm not aware of a single issue it has caused. On the other hand, it has allowed us to unify the std.fs API across POSIX and Windows, since with NtCreateFile they both support operating on an open directory handle.

@andrew-boyarshin,

I've already decided against using Zig for low-level Windows development

Is there anything the Zig project can learn from your decision?

andrew-boyarshin commented 4 years ago

@andrewrk note that the following is purely about Windows kernel drivers, ReactOS kernel and NTDLL, user-mode Zig is cool, I play with it (but yet to use it in any real project).

  1. I'm not sure Zig would allow me to develop kernel driver code (standard library)
  2. I'm not sure Zig would allow me to throw and handle SEH exceptions properly (ExceptionCode, user buffer probing, exception filters)
  3. I'm not sure about calling conventions being compatible, but that might well be false. I've never tried Clang/LLVM for low-level code, and Zig is 2 layers where mistakes can exist. Clang+LLVM is 1.5 layer (Clang is very coupled with LLVM, so it's difficult to count them separately). I just feel MSVC is a safer (as in error-proof) bet (single layer monolith used by Windows team themselves).
  4. I'm not sure I can rely on Zig features being safe (as in secure) and fast (as fast as they can theoretically be when implemented in C using kernel APIs to the fullest). I'm sure writing C-like Zig will be ok, but when it comes to coroutines and complete kernel asynchrony (can they survive preemption? how will they play with IRQL?)...

I feel like most of these issues can be eliminated by providing a good sample of e.g. FS/registry filter, and showcase what works and what does not.

upd:

  1. Status codes. Zig has an awesome concept of static errors and obligatory explicit handling. Unfortunately it's impossible to use that since even MS Docs don't specify all possible NTSTATUS codes, not to mention non-documented functions. So, Zig offers no advantage in this respect.
daurnimator commented 4 years ago

I feel like most of these issues can be eliminated by providing a good sample of e.g. FS/registry filter, and showcase what works and what does not.

Is this something you'd like to take on @andrew-boyarshin ?

andrew-boyarshin commented 4 years ago

@daurnimator not in the foreseeable future. Certainly not in the next 5 months.

adontz commented 4 years ago

There is one very real benefit: the APIs are more powerful. For example, NtCreateFile has the ability to open a sub-directory path based on an open directory handle. The kernel32 APIs do not expose this. It has meaningful consequences in terms of avoiding file system race conditions.

Do you assume that directory cannot be deleted and/or moved while there is an open handle?

When I first started working on Windows implementations of cross platform abstractions, I took the Microsoft documentation very seriously. But then I found out that pretty much every standard library uses SystemFunction036 from advapi32.dll for getting cryptographically secure random bytes, while Microsoft's docs says to use CryptGenRandom.

I see no mention of cryptographic security of this function. But I do not expect some std.random to be cryptographically secure. I think reasonable expectations are that std.random should have nice distribution and be fast, but not crypto secure.

On the other hand, it has allowed us to unify the std.fs API across POSIX and Windows, since with NtCreateFile they both support operating on an open directory handle.

This is a strong argument. Really, I like it.

However, converting paths between namespaces is not an easy task. You may convert "C:\Windows\System32" to some "\??\C:\Windows\System32" and go away with that, but what about the opposite direction? "\SystemRoot\System32\" may be "C:\Windows\system32", "D:\Windows\system32" and so forth up to "Z:\Windows\system32". Windows installations are automated in enterprise environment and you can install on any large enough properly formatted partition of any disk and assign any unused drive letter during setup you cannot just assume C:

https://blogs.msdn.microsoft.com/jeremykuhne/2016/05/02/dos-to-nt-a-paths-journey/ https://stackoverflow.com/questions/4445108/how-can-i-convert-a-native-nt-pathname-into-a-win32-path-name

As far as I know there is not official way of converting NT path to Win32 path.

daurnimator commented 4 years ago

As far as I know there is not official way of converting NT path to Win32 path.

Why would you need to do so? Note that some NT paths are impossible to represent as a win32 path.

@adontz you might find https://googleprojectzero.blogspot.com/2016/02/the-definitive-guide-on-win32-to-nt.html?m=1 interesting.

adontz commented 4 years ago

@daurnimator I know of RtlDosPathNameToRelativeNtPathName_U, it was never documented.

Why? To convert paths back. As far as I remember NtQueryInformationProcess returns NT path, but GetProcessImageFileNameW does not.

Note that some NT paths are impossible to represent as a win32 path.

Exactly my point. I don't know what is your experience, but I'd rather stay away of these vague path conversion rules.

I think our conversation smoothly shifted from Win32 vs NT, to documented vs undocumented. There are a few attributes I want to focus, if you let me to.

  1. Documented / Undocumented.
  2. Supported / Deprecated (may affect security, compatibility).
  3. Available in all OS versions / Available in some (maybe even not latest) OS versions.

I personally prefer documented, not deprecated APIs, with safe fallback logic for APIs not available on some OS versions. For instance, I hardly imagine any reasonable fallback for TaskDialog or HTTP Server API.

Windows 7 extended support just expired (January this year). I think Windows 8.1 is a safe bet.

daurnimator commented 4 years ago

Why? To convert paths back.

When would you do that? For display purposes?

adontz commented 4 years ago

Why? To convert paths back. When would you do that? For display purposes?

  1. Yes, display purposes are important. If you cannot find a file or have any other file related problem, logging or displaying \Device\ paths will make all users and IT support freak out.

  2. Not only display purposes. NtQueryInformationProcess returns path like this

"\Device\HarddiskVolume5\Users\Adontz\Projects\VS2019\NtPathTest1\Debug\NtPathTest1.exe"

Now, imagine, I want to load a vendored dynamic library, or try to load a vendored dynamic library. It may be any closed source binary I want to use, or even precompiled binary distribution of open source. To avoid all kinds of errors I would like to give LoadLibraryW an absolute path. Of course it returns NULL and I get error 126 "The specified module could not be found.", because LoadLibraryW is Win32 and has no idea where is "\Device\HarddiskVolume5\".

AlexKotik commented 4 years ago

I can see one big downside about using ntdll based apis instead of kernel32 based: users that will compile stdlib code with ntdll apis will get all kind of antivirus false positives. As using ntdll apis is not common for user land programs, but quite common for different kinds of malware.

GavinRay97 commented 3 years ago

Maybe this isn't the best place to ask this, but this seems close enough.

This happens when trying to link and call a 70-line Zig staticlib that exports a single method from C:

$ clang-cl test.c && ./test.exe
rpp_parser.lib(rpp_parser.obj) : error LNK2019: unresolved external symbol NtClose referenced in function std.os.windows.CloseHandle
rpp_parser.lib(rpp_parser.obj) : error LNK2019: unresolved external symbol NtCreateFile referenced in function std.fmt.formatText.88
rpp_parser.lib(rpp_parser.obj) : error LNK2019: unresolved external symbol NtWaitForKeyedEvent referenced in function std.fmt.formatText.125
rpp_parser.lib(rpp_parser.obj) : error LNK2019: unresolved external symbol NtCreateKeyedEvent referenced in function std.math.absCast.146
test.exe : fatal error LNK1120: 4 unresolved externals

The only way that this doesn't happen is if the library is built with -Drelease-small or -Drelease-fast. Building with -Drelease-safe still causes the issue, but this time the referenced functions are different:

$ clang-cl test.c && ./test.exe
rpp_parser.lib(rpp_parser.obj) : error LNK2019: unresolved external symbol NtCreateKeyedEvent referenced in function std.array_list.ArrayListAligned(u8,null).appendSlice   
rpp_parser.lib(rpp_parser.obj) : error LNK2019: unresolved external symbol NtWaitForKeyedEvent referenced in function std.array_list.ArrayListAligned(u8,null).appendSlice  
rpp_parser.lib(rpp_parser.obj) : error LNK2019: unresolved external symbol NtClose referenced in function std.mem.Allocator.free.28
rpp_parser.lib(rpp_parser.obj) : error LNK2019: unresolved external symbol NtCreateFile referenced in function std.unicode.utf8Decode
test.exe : fatal error LNK1120: 4 unresolved externals
clang-cl: error: linker command failed with exit code 1120 (use -v to see invocation)

Now, I have read a good chunk of this thread and I now understand that the inclusion of ntdll is nearly required on Windows, but that leaves me with a couple of questions.

If you search the Discord history, you will find similar questions/confusion. Here another user in 2020 is confused why ntdll is being linked for a 20 line itoa() implementation that imports nothing:

Also please bear with me if these are pants-on-head dumb questions, I really don't know much about low level development.

daurnimator commented 3 years ago

@GavinRay97 on windows, all programs get ntdll loaded into them automatically when they start: its impossible to opt-out of it, even for "static" applications/libraries. Windows has no stable syscall interface, you instead call all syscalls via their ntdll wrappers.

GavinRay97 commented 3 years ago

Windows has no stable syscall interface, you instead call all syscalls via their ntdll wrappers.

Good lord. Alright haha, I never should have left Ubuntu. Thank you for the answer =)

So does that mean that static libs on Windows aren't really a "thing"?


EDIT: (For what it's worth, my .lib seems to be working fine from C in -Drelease-fast and -Drelease-small mode, without ntdll)

MarcTCruz commented 1 year ago

However, converting paths between namespaces is not an easy task. You may convert "C:\Windows\System32" to some "??\C:\Windows\System32" and go away with that, but what about the opposite direction? "\SystemRoot\System32" may be "C:\Windows\system32", "D:\Windows\system32" and so forth up to "Z:\Windows\system32". Windows installations are automated in enterprise environment and you can install on any large enough properly formatted partition of any disk and assign any unused drive letter during setup you cannot just assume C:

It's possible to get absolute path from process, on windows, with GetModuleFileName, so, when the root is missing as in your example, process root is taken instead.

matu3ba commented 1 year ago

Ntdll compatibility list taken from https://www.geoffchappell.com/studies/windows/win32/ntdll/api/index.htm "The very large table on this page lists all the functions and variables—there are well over two and a half thousand—that appear in the export directory of any known i386 (x86), amd64 (x64) or wow64 build of NTDLL.DLL from Windows NT up to and including the original Windows 10."

matu3ba commented 1 year ago

Counter-argument for relying on it for accurate runtime process or thread info from https://learn.microsoft.com/en-us/answers/questions/1183193/totalprocessortime-differs-between-windows-10-and also mentioned in https://github.com/ziglang/zig/issues/5191#issuecomment-1661030019 and https://github.com/ziglang/zig/pull/16638#discussion_r1280944719:

"Microsoft broke backwards compatibility and GetSystemInfo, GetThreadTimes, NtQuerySystemInformation and quite a few other functions are busted and broken on Windows 11 when you have more than 1 processor group.

Starting with Windows 11, process and thread affinities span all processors in the system, across all processor groups by default. Each processor group has completely different processor configuration and statistics... GetSystemInfo, GetThreadTimes and NtQuerySystemInformation only return values specific to the current thread' processor group affinity and they don't query multiple processor groups.

The Windows 11 kernel scheduler will execute your process/thread on different processor groups while you're calling those functions and this causes those functions to return completely different set of values that cannot be compared. Every program using those functions currently shows incorrect information exactly like what you've mentioned in your post.

You either need to call SetThreadGroupAffinity and restrict the thread to a specific group so those functions don't switch group and return incorrect values (and ignore statistics for other processors) or update your code and query the statistics for all processor groups by calling NtQuerySystemInformationEx."

However, standard things like filesystem and always loaded things should be fine.

Jarred-Sumner commented 1 year ago

I'm about one week in to porting Bun to work on Windows and so far using std.os is pretty rough

andrewrk commented 1 year ago

Please either submit bug reports, or don't. The above comment is just noise.

squeek502 commented 1 year ago

Invalid file path? That's a segfault due to the use of unreachable

This is https://github.com/ziglang/zig/issues/15607

std.os.access crashes with a file name that doesn't exist

Can't reproduce

const std = @import("std");

pub fn main() !void {
    _ = try std.os.access("something_missing", 0);
}
> zig run access.zig
error: FileNotFound

handful of methods have incorrect type definitions for the W versions

PRs / added tests welcome (this is something I plan on tackling as well though)

the buffer passed to NtCreateFile is not supposed to be exactly sized since reparse points exist

A failing example test case would be helpful here

Can't use stat(), so you have to use the more specific functions in std.fs

Related: https://github.com/ziglang/zig/issues/16738

zig uses the mingw Abi instead of msvc by default which causes missing symbol errors that were very difficult to narrow down

https://github.com/ziglang/zig/issues/6565

expikr commented 1 year ago

Interesting info on the origins of kernel32.dll, gdi32.dll, and user32.dll:

https://devblogs.microsoft.com/oldnewthing/20230926-00/?p=108824

Other useful resources:

https://learn.microsoft.com/en-us/windows/win32/devnotes/calling-internal-apis

http://undocumented.ntinternals.net/

ChrisDenton commented 11 months ago

When the standard library needs to interact with the Windows kernel, if there is ntdll API which provides the necessary components to perform the task correctly, then that is the preferred API to use. Generally, if another DLL such as kernel32 is wrapping ntdll functionality, Zig std lib should prefer NtDll directly.

While I don't entirely agree with this, I think this is defensible. If a kernel32, etc function is simply a direct wrapper for an ntdll function, use the ntdll function.

However, I do have a concern that some are taking the "prefer ntdll" directive in a more dogmatic direction; i.e. "use only ntdll". I do not think there's anything to be gained by this and much to lose (e.g. random number generation, process creation, etc). While recreating Windows user space from syscall wrappers is, of course, technically possible with enough reverse engineering, the scale of the task is much larger than people here seem to be recognizing (path weirdness is frankly one of the simplest issues). And even if you do reverse engineer Win32, you risk fragility and missing features.

In summary I have two related concerns about the statement "use only ntdll":

  1. The scale of the task is being vastly underestimated.
  2. Being dogmatic about this lacks sufficient motivation.
ellacrity commented 10 months ago

This choice makes very little to no sense to me. Just because it is technically possible to use ntdll.dll directly does not mean that it's a good idea. I think that the issue may have been opened with good intentions but it is clear that there is a general lack of understanding of Windows.

NTDLL is a lower-level library that exposes the "Native API" or NT API and is meant for situations where you cannot use Win32 for one reason or another. I do not understand why you would use this as the default. This is completely backwards. The much more common scenario is that you do have access to Win32, unless you are writing driver code. In that case, you would use NTDLL.

Using NTDLL as the default is, frankly, a strange decision and one that I strongly recommend against for your own sanity. Most of NTDLL's API is not even documented.

This issue will almost certainly result in more bugs, less stability and significant technical debt. You are in for a world of pain.

matu3ba commented 10 months ago

This choice makes very little to no sense to me. Just because it is technically possible to use ntdll.dll directly does not mean that it's a good idea. I think that the issue may have been opened with good intentions but it is clear that there is a general lack of understanding of Windows.

It would help, if you would provide technical arguments or concrete use cases for the Zig compiler and/or package manager, which can not be satisfied by other means. From my point of view, if Zig decides that either the compiler or the package manager should work in all reasonable use cases after Windows updates, then only ntdll would be not a good choice as Microsoft may decide to break or change ntdll call semantics for proprietary/money/security etc reasons.

Zig may also decide to be upfront about breakage and have tools to manage/test impact on end users + developers (basically testing supported api coverage automatically etc). However, I can not estimate the amount of complexity and impact to change the situation on breakage on this.

Win32 is ambiguous and contains many different dlls and has needlessly huge attack surface with much old rotten code, whicih from my point of view would be be better default sandboxed. See also https://github.com/mtth-bfft/win32k-mitigation or this very interesting talk on default-not-good kernel security of windows https://raw.githubusercontent.com/tyranid/infosec-presentations/master/Nullcon/2019/The%20Windows%20Sandbox%20Paradox%20(Flashback).pdf for more technical motivation.

UPDATE: I think so far win32k.sys and ntdll up to Win10 (x86 and x64) has not been mentioned https://github.com/j00ru/windows-syscalls. Windows 11 22H2 also got a way to disable fsctl calls.

drew-gpf commented 9 months ago

An example where this is undesirable: https://github.com/ziglang/zig/issues/11894

Actually implementing a workaround for this is incredibly annoying and tedious and is sure to have compatibility issues compared to just using CreateFile. Also, what if MS decides to implement some other kind of filesystem redirection in kernel32? wow64 redirection is allowed to change across windows versions as well, e.g. %windir%\system32\driverstore... is not redirected on win 7+ but is on Vista and earlier.

expikr commented 4 months ago

Should page_allocator on Windows perhaps use the various Nt***VirtualMemory functions instead of kernel32.Virtual*** ones currently being used?

ChanceNCounter commented 2 months ago

This appears to have been in progress for six years. Every few months somebody posts it to Reddit or tweets it or etc, and it gets a lot of laughs, which appears to have led to a couple of testy exchanges over the years.

My question is, given the number of times that this issue has made this project a laughing stock, has anyone on the project actually tried reaching out to a Microsoft contact for feedback on what the rest of the internet clearly thinks is an unfathomably bizarre choice?

alexrp commented 2 months ago

Responding to this point specifically:

This appears to have been in progress for six years. Every few months somebody posts it to Reddit or tweets it or etc, and it gets a lot of laughs, which appears to have led to a couple of testy exchanges over the years.

It is easy, especially on the internet, to settle for the status quo and laugh at any attempt to change it. One of the defining features of Zig - not just as a project, but also as a community - is the willingness to challenge conventional wisdom to push the state of computing forward. That doesn't always pan out, but it seems to me that it works more often than not. For an easy example, consider Zig's glibc cross-compilation support: Just 5-10 years ago, the mere idea of this would have gotten you laughed out of a room - it's a mess, it's just clearly too impractical. Yet here we are. People were similarly skeptical about the viability of Zig rolling its own compiler backends and moving towards LLVM independence, yet we have multiple backends being actively worked on, with the x86-64 backend in particular being close to usability.

Which is why, IMO, this:

what the rest of the internet clearly thinks is an unfathomably bizarre choice?

really doesn't carry much weight. Technical arguments are more than welcome, but what "the internet" thinks often turns out to be misguided or wrong.

ChanceNCounter commented 2 months ago

I am surprised you regard the aforementioned impressive technical undertakings as equivalent to the decision to target a lower level of Windows than other systems languages think is reasonable or wise.

dongle-the-gadget commented 2 months ago

challenge conventional wisdom to push the state of computing forwardr

Trying to change the state of affairs of things you don’t control (which is not the case for all examples you have given, since you have direct control over the Zig toolchain, but not Windows) is not a wise idea.

I heavily doubt that Microsoft would lean down on providing their internal APIs to other consumers. Instead it’s simpler for them to simply block execution of popular Zig applications if the NT functions change in a breaking manner.

ChanceNCounter commented 2 months ago

Well, you can continue to doubt what Microsoft would suggest, or you could heed my original suggestion (rather than defensively responding to everything else I wrote) and ask Microsoft. Six years, nobody seems to have thought of that. I tried to make that clear as politely as I was prepared to interact with this project. Good luck to you all.

alexrp commented 2 months ago

Trying to change the state of affairs of things you don’t control (which is not the case for all examples you have given, since you have direct control over the Zig toolchain, but not Windows) is not a wise idea.

I heavily doubt that Microsoft would lean down on providing their internal APIs to other consumers. Instead it’s simpler for them to simply block execution of popular Zig applications if the NT functions change in a breaking manner.

I don't think anyone here has suggested that we can apply pressure on Microsoft to do anything.

What has been suggested is that we can rely on functionality exported from ntdll.dll that, in practice, is unlikely to ever change for the lifetime of Windows. It's not just for fun that projects like Wine and ReactOS try to faithfully replicate the ntdll.dll API surface and functionality; enough real applications rely on these details for it to matter.

Microsoft may prefer not to acknowledge this officially, but they absolutely know it, and they know that backwards compatibility is the killer feature that has kept Windows on top as essentially the default consumer OS. It's also worth noting that Microsoft themselves have slowly but surely started documenting some of the more popular ntdll.dll functions because, again, they do know what goes on in practice.

Also, there's literally no evidence to suggest that the Zig project wouldn't change course if this approach turns out to truly be unworkable.


I am surprised you regard the aforementioned impressive technical undertakings as equivalent to the decision to target a lower level of Windows than other systems languages think is reasonable or wise.

I don't. The point I was making is that it's not generally useful to measure the difficulty of an engineering challenge based on the internet hive mind's opinion. The internet is full of doomers who are wrong more often than they're right - especially so in tech communities.

(Also, a lot of what we're doing with glibc cross-compilation is relying on implementation details of glibc; it's actually not too dissimilar. But again, not actually the point.)

Well, you can continue to doubt what Microsoft would suggest, or you could heed my original suggestion and ask Microsoft. [...] Six years, nobody seems to have thought of that.

As someone who worked at Microsoft, I can tell you that, even if you work there, it can be nearly impossible to get through to the relevant people unless you specifically know who they are. So I think it would be helpful to suggest the right point of contact here; gesturing in the general direction of a huge multinational corporation doesn't arm us with any information that we didn't already have.

(rather than defensively responding to everything else I wrote)

I tried to make that clear as politely as I was prepared to interact with this project.

I would encourage you to put yourself in the shoes of a maintainer, contributor, or even just ordinary community member of a large OSS project, then read your original comment back to yourself, and see if you truly believe that it comes off as polite. I can personally think of multiple ways to rewrite that comment in a considerably more constructive (and non-inflammatory) tone. I think my response was actually quite level-headed and, dare I say, polite.

And to be clear, I'm just one contributor of many; I don't represent the project. That's why I made it clear that I was only responding to that point.

teo-tsirpanis commented 1 month ago

While I believe it's highly unlikely that Microsoft will break even its undocumented ntdll APIs, I think this this is the wrong question to ask.

It's not bad to use NT APIs to do things not possible with Win32 APIs, but defaulting to NT APIs and avoiding as many Win32 APIs as possible should require a very strong technical justification which IMHO am not seeing after reading the whole discussion; on the contrary there are reported problems like lack of WOW64 redirection.

If Zig plans to target the underlying NT system, I believe it should be as a new platform distinct from Windows.

jdpatdiscord commented 1 week ago

The maintainers of ReactOS do not wish for any software project to target them, they advise against it. If anyone finds bugs with ReactOS when using a program that functions on Windows but not ReactOS, you should file a bug with the ReactOS Jira, no matter how esoteric the bug (i.e. something functions on XP but not 10; something functions in 10 but not 11, etc), ReactOS wishes to replicate this Microsoft behavior with their program compatibility mode.

So, pretend the ReactOS project doesn't exist when implementing things.

fithisux commented 3 days ago

Midipix uses ntdll in their ntapi layer.