ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.85k stars 2.48k forks source link

Eliminate the POSIX API layer #6600

Open LemonBoy opened 3 years ago

LemonBoy commented 3 years ago

The os namespace (to be renamed posix) is tackling the problem of providing a cross-platform abstraction over the operating system facilities from a, in my opinion, wrong angle. In this essay I will briefly explain why.

The aim here is having a single level of abstraction that works well enough both for direct consumers (users of std.os namespace) or indirect consumers (eg. using std.fs abstractions built on std.os) without leaking too many details about the underlying implementation. The current approach tries to clump everything under the posix name, a name that carries a heavy baggage of do's and dont's that other platforms (mainly Windows) may not agree with.

A few examples:

The point is that we should aim at breaking free from the posix rules and write our own, a full-blown posix compatibility layer is not something that belongs to the stdlib. I see Rust took this very same approach (I'm looking at the filsystem-related part), their API surface is small and comprises all the common bits required by the users (external ones and ones working on the stdlib). A small note lets the user know what platform-specific method is used, but that's it.

The gist of this proposal is:

Thanks for watching.

kprotty commented 3 years ago

a note on the pread/pwrite bit is that WriteFileGather on windows looks like it only supports page aligned userspace buffers + require null terminating its "iovec equivalent" structure. There exist general purpose vectored IO on windows for sockets using WSASend & friends but the generic "HANDLE" equivalent doesn't seem to be as flexible.

Another point on vectored IO is that having the function take in a generic iovec (which for windows sockets, could be WSABUF) would alleviate the os function from allocating its own in order to represent [][]u8 as the layout of slices is currently both undefined or not matching with the layout of socket iovec types, making casting to and from them probably not correct.

I would like to propose taking the idea of restructuring std.os even further and to get rid of it entirely (or at least hide it from the user), similar to what the Rust stdlib does. Replace the exposed functions by having them in their own higher level structs that they act upon (File/Dir, Socket, Pipe, Process, Thread, Random, IoPoll, Memory, .etc) and not requiring everything to go through a posix-based API as that can restrict and otherwise prohibit certain functionality from non-posix systems like Windows:

Make an std.os.posix which acts like the std.os.linux or partially the current std.os but only for posix systems while windows sticks to using std.os.windows. Then design the apis of the higher level structs based on the greatest common functionality for all stdlib supported OS instead of through the lens of posix only.

ikskuh commented 3 years ago

@kprotty i fully support this! Nothing more to add

katesuyu commented 3 years ago

I completely agree with this issue. Even Rust isn't a great example of how to do things, although we do fairly well at avoiding the most glaring flaw of Rust's OS abstractions at the moment: not supporting widechar APIs for Windows. I'm not sure we do this everywhere but I've seen Z and W functions enough to be fairly satisfied.

Also, relying on the existence of a given compatibility layer on every platform, even if you restrict it to things you can currently shim over with reasonable precision, has proven to be a huge mistake for stabilized language ecosystems. For example, the introduction of WASI as a target has made the Rust standard library shim over a huge amount of details in a quite inefficient manner, because their APIs and API consumers are hard-dependent on absolute paths and/or being able to access files and directory paths without an existing directory handle. Zig mostly works with directory and file handles already, and supports iterating over WASI preopens instead of pretending paths like /abc or ./abc are even remotely coherent on WASI.

In essence, although plenty of things can be reasonably supported in a cross-platform manner, you cannot commit yourself to such a specific standard like POSIX and expect it to work flawlessly everywhere and fit every use case. std.os should be native rather than trying to stretch itself over unnatural targets, and features elsewhere in the standard library that depend on OS-specific details should not promise features that are impossible to implement correctly for every target. They should be exposed, just not with the proposition that they will work anywhere. An perfect example of this is std.fs.cwd(), which makes the correct choice by emitting a compile error on WASI instead of attempting to shim a concept that is wholly inapplicable to WASI.

jayschwa commented 3 years ago

I think the standard library has too much if wasi, else if windows, else posix logic sprinkled throughout it. I can't tell if this proposal would help or hurt that.

My half-baked hot take: a namespace like std.fs shouldn't know what an operating system is. Instead, it should provide a conservative interface for file-like objects that other namespaces (OS or otherwise) can satisfy. io.Reader and io.Writer are good (albeit simpler) examples of what I mean.

This could be layered. If I'm writing a package that allows Zig to target "Jayschwa OS", I could choose to satisfy the std.fs interface directly. Or instead, I could choose to satisfy the lower-level POSIX interface (how BYOS kind of works now) and get its support for std.fs (and probably other interfaces) for free.

katesuyu commented 3 years ago

@jayschwa I agree that std.fs should allow overriding the basic file-like objects directly, however I disagree that the switches are bad. Referring to specific concrete APIs, as appropriate, massively improves the readability and navigability of the codebase. Deferring to a duck-typed platform abstraction layer is the real issue. Switches avoid forcing readers of the code to sift through a heap of files to find how each platform implements the same (usually simple) function, and avoid code duplication when multiple platforms are handled by the same switch arm.

Sticking the implementation of everything in separate platform-specific modules is what makes the Rust standard library (and many Rust crates that copy the standard library's organization) horrific to navigate, and the more layers of this you have, the more you end up like our present day std.os (which has already fallen for this trap: see const system and all the conditional usingnamespace declarations).

BYOS should be split up into more discretely implementable interfaces, but there should be no internal abstraction layer: std.fs should keep its switch statements, but allow its structures and functions to be overriden on a case-by-case basis. And specific APIs should not have implementations on platforms where this is impractical!

Rocknest commented 3 years ago

I like the idea of a zig-style posix layer that unifies error codes, checks if arguments are correct, takes care of version prefixed/extended/safe apis. This is must have until zig's own higher level apis provide all possible interactions with the os. However POSIX as a fit-any-os apis is obsolete, existing "posix-compatible" oses continue to diverge from it, some posix abstraction prevent efficient software so most new oses are not compatible with posix.

I think std.os.posix should exist mostly how it is now, however without any fallbacks, so if i want to use a posix api i must check first if it exists: std.os.posix.has("name"). Maybe we should have a common place for fallbacks that std uses, named std.os.@"$internal$" (or other not trivially accessible name), for example copy_file_range would be moved there.

squeek502 commented 3 years ago

I've run into confusion regarding this as well, e.g. with realpath (https://github.com/ziglang/zig/issues/4658#issuecomment-673437235):

(note that fs.realpath is just an alias for os.realpath so the following applies to os.realpath)

Right now, the status quo is:

  • On Linux, std.fs.realpath resolves symlinks before ... This matches the system behavior as far as I can tell (cat link/../file will output the contents of linked/../file and fail if it doesn't exist), although there seem to be some edge cases (from my testing, if link is a symlink to linked, then cd link/../dir takes you to ./dir if it exists [ignoring the symlink], otherwise it will take you to linked/../dir [resolving the symlink before ..]).

  • On Windows, std.fs.realpath resolves .. before symlinks. This matches the system behavior as far as I can tell (cd link\..\dir will never take you to linked\..\dir; if .\dir does not exist, it will fail with "The system cannot find the path specified.". Same deal with type link\..\file). Please correct me if I'm wrong on this, but I'm not even sure if there exists a function in the Windows API that resolves symlinks before ... daurnimator has mentioned that Zig might need something like RtlDosPathNameToRelativeNtPathName_U_WithStatus but from using the test code at the bottom of this article, that function does not resolve symlinks before .. either. If there is precedence for Linux-like symlink resolution on Windows, it would be helpful to get that information added to #4658

It's unclear to me what behavior is 'correct,' and currently it feels like both std.fs and std.os are not very clear about how/if they should handle platform-specific behavior.

qgcarver commented 3 years ago

If we go through with this proposal, are we making a decision on #1840 ?

Writing platform code for Windows is pretty shaky right now, and if we're renaming it to posix or not is related. #5037 #4426

Edit: to clarify, #1840 'prefers' ntdll, but #4426 is on shakier ground, no proposal has been accepted.

Rocknest commented 3 years ago

@squeek502 in my opinion all platform dependant behaviour in std.fs and other high level apis is a bug. For example in the case of realpath, on windows it should be emulated with few more syscalls, however if some behaviour is unemulatable then it should return error.Unsupported.

ron-wolf commented 3 years ago

I arrived here via the Zig English Telegram group. Here’s a related essay (found on Lobste.rs) about Go’s somewhat similar issues when compared with Rust: “Early Impressions of Go from a Rust Programmer”

fivemoreminix commented 3 years ago

@squeek502 in my opinion all platform dependant behaviour in std.fs and other high level apis is a bug. For example in the case of realpath, on windows it should be emulated with few more syscalls, however if some behaviour is unemulatable then it should return error.Unsupported.

IMO only common behavior which is known to function fine on most operating systems should be exposed in a high-level interface. There shouldn't be a need for an error.Unsupported when the high-level wrapper over OS functions implements interfaces to common OS functions like File.open(self, URI, mode), File.close(self) etc. OS equivalence should be chosen at compile time, to provide compile-time errors for impossible instructions.

When a user (of the OS library) requires an OS-specific API, they can get it from that OS-specific library. Fits with Zig zen of communicating intent precisely and preferring compile errors opposed to runtime crashes.

andrewrk commented 3 years ago

I generally support this. That said, std.os currently has 5,657 lines of code, and it's useful logic that needs to exist somewhere. So I'm not sure what it would look like to slap an "accept" label on this and then start implementing it. But I would be in favor of and supportive of self-contained patches that start moving the standard library toward's @LemonBoy's vision, provided that we have clear upgrade paths for the existing use cases we support.

@LemonBoy Here are some questions I have about your vision. And note these are not arguments against it; my intent is to help move your project forward.

Also just to clarify - nothing in the std lib is required to go through std.os as a "lower level API". That was only ever done when it made sense in terms of code organization. It just happens to be a convenient place to put a bunch of abstractions.

Can we come up with some clear bullet points which act as guidelines for creating patches towards the goal of this issue? Maybe an example patch that incrementally moves the std lib towards this?

(edit: also let's wait until 0.7.0 is tagged before starting to work on this, seems like a lot of breaking changes)

LemonBoy commented 3 years ago

What does it look like when I want to write zig code that calls execve and I don't care about Windows or other operating systems that don't support it? Currently that looks like calling std.os.execve. In C, it looks like #include and calling execve. Same question for: pipe, waitpid, kevent, and a few more.

The idea is for std.os.<os name> to contain all the syscall/libc wrappers with no sugar coating: if a functionality is not available there's no fallback implementation, if the host doesn't support a given syscall return error.Unsupported. This will be the corner stone for all the platform-specific code.

If a os.posix is still wanted it will only re-export a few fns from the std.os.<os name> namespace. Again, no cross-platform abstractions here, it should simply act as a convenience layer for the advanced user.

What about the current pattern of std.time, std.fs, std.process, std.net which are the high level operating system abstractions? How will these be affected by this?

The high-level abstractions build upon std.os.<os name> primitives and will bend over backwards to perform a given operation across all the different platforms. Whether to put those abstractions in std.<module> (I love go's approach, they have eg. time_unix and time_windows to keep the different implementations separate and avoid the ir.cpp effect :P) or in the newly-vacated std.os is an open question.

How does your vision handle the (underrated IMO) feature of calling libc functions when libc is linked but not otherwise?

As stated in the first point std.os.<os name> will either syscall or call the libc equivalent (and convert the error codes). The transparent libc/syscall switch comes with its own set of problems such as potential ABI incompatibilities (eg. the stat structure used by the kernel is likely to be different from the one used by the libc, especially after all the y38k changes) and subtle differences in behaviour (eg. #1337).

Can you incorporate the Bring-Your-Own-Operating-System-Layer idea into this vision?

Well this requires a bit of thought, the easieast solution would be to let high-level abstractions mentioned above add an extra arm in the platform-switching code. Eg. in std.fs (or std.os) we may want something like this:

// openFile is fn (dir_handle: OSHandle, path: []const u8, options: OpenFileOptions) OSHandle
const openFile = switch (builtin.os.tag) {
    .linux, .openbsd, .netbsd, ... => openFileUnix,
    .windows => openFileWindows,
    .freestanding => rootOs.openFile,
}

But what if the OS you're targeting has no openat semantics? Well you're screwed, unless you restrict the dir_handle parameter to be the cwd... but what if the OS has no concept of cwd?

CC @IridescentRose as the PSP libc suffers from the lack of openat semantics.

You can see that trying to accommodate every single use case means that we're painting ourselves in a corner and will have to target an unknown minimal set of features when designing the interface APIs.

This idea proposed by @jayschwa is interesting on paper as it moves part of the complexity out of the stdlib and back into the BYOS implementer court.

This could be layered. If I'm writing a package that allows Zig to target "Jayschwa OS", I could choose to satisfy the std.fs interface directly. Or instead, I could choose to satisfy the lower-level POSIX interface (how BYOS kind of works now) and get its support for std.fs (and probably other interfaces) for free.

Can we come up with some clear bullet points which act as guidelines for creating patches towards the goal of this issue? Maybe an example patch that incrementally moves the std lib towards this?

Well the first step would be defining the set of high-level abstractions to build and define their behaviour and how to implement it on Linux/BSD/Windows/Wasi. For example let's focus on a stat replacement:

const FileInfo = struct {
    size: u64,
    kind: enum { Directory, Regular, Pipe, ... },
    mod_time: SomeCoolY38KTimestamp,
    access_time: SomeCoolY38KTimestamp,
    // permissions (as bools? as bitmap + accessors?)
    // file mode
    // a tagged union could be added to hold all the other platform-specific infos
};
fn getFileInfoByHandle(handle: OSHandle) FileInfoError!FileInfo {
    // implement with statx on linux (or stat64 if not available)
    // implement with stat on Darwin/BSD
    // implement with GetFileInformationByHandleEx on WIndows
    // implement with path_filestat_get on Wasi
}

And voilá, we got a nice ergonomic (let's make extensive use of enums and getters) cross-platform abstraction that's even 2038-compliant!

andrewrk commented 3 years ago

OK thanks for clarifying the vision! I can see how this would work going forward.

rhencke commented 3 years ago

@LemonBoy You may find it interesting to look at how SQLite does it, too. I believe it's similar in spirit to your proposal. (But it's also hot-pluggable with support for custom implementations and multiple implementations at once which is pretty neat)

heidezomp commented 3 years ago

@LemonBoy Do you already have an idea of how the high-level abstraction will deal with OS-specific errors? Specifically, should there be error values that will only be returned on specific OSes (like the current NetworkSubsystemFailed error that is only returned on Windows), or should the high-level abstraction abstract away the OS-specific errors as well?

LemonBoy commented 3 years ago

You may find it interesting to look at how SQLite does it, too. I believe it's similar in spirit to your proposal. (But it's also hot-pluggable with support for custom implementations and multiple implementations at once which is pretty neat)

Yep, the idea is to build a similar (but wider) cross-platform abstraction in the stdlib so that not every app/library has to invent their own.

Do you already have an idea of how the high-level abstraction will deal with OS-specific errors?

If error unions had values we could just add a OSError: <error-type> and let the caller handle the weird cases. No idea about errors yet, grouping them loses some part of the information they carry, on the other hand having enormous error sets is not a pleasant experience for the caller who has to pick what to handle and what to rethrow.

ghost commented 3 years ago

If I'm writing a new OS, I'm not even going to think about bugging Zig proper to upstream support for it until it's well and truly done (not that we would or should even consider it), so until then I'll be dependent on BYOOS for development. If BYOOS is optimised for POSIX, that's going to push me to make it at least POSIX-like. In my eyes, that's discouraging experimentation, and entrenching half-century-old ideas.

I think the interface exposed by os, in std as well as the RSF, should be a bit higher-level, so as to abstract over as-yet-unforeseen system designs. That is, it provides a basic "standard library-esque" interface to spawning threads, creating processes, reading and writing files etc., rather than attempting to exactly provide all system or library calls for the platform. That is, rather than the standard library calling into os for low-level operations within its own high-level logic, os itself would provide the basic logic, and the standard library would provide extra patterns, functionality or ergonomics. (Note it would still be possible to provide a POSIX layer to depend on if such a thing were desired, by including a posix member in os; such a member would of course be included in the standard library's own os.) This way, entirely new classes of systems could be integrated with no legacy and no language contortion.

Apologies if this has already been said, it's 4am and my focus isn't great at the best of times.

enkore commented 2 years ago

Is this the correct place to complain about os.argv? (It's defined as {} on Windows which means code will compile but almost always panic as argv.len is generally considered to be at least 1; std.process.argsAlloc gives you [][:0]u8, while argv is [][*:0]u8, so the two interfaces aren't even compatible types, so if you wanna work on Windows, you're just always going to go to argsAlloc, probably after noticing stuff doesn't work on Windows.)

I'd find it much cleaner if either

ominitay commented 2 years ago

Is this the correct place to complain about os.argv? (It's defined as {} on Windows which means code will compile but almost always panic as argv.len is generally considered to be at least 1; std.process.argsAlloc gives you [][:0]u8, while argv is [][*:0]u8, so the two interfaces aren't even compatible types, so if you wanna work on Windows, you're just always going to go to argsAlloc, probably after noticing stuff doesn't work on Windows.)

I'd find it much cleaner if either

* os.argv simply were not defined on Windows, giving you compile errors so you know something is broken

* os.argv being populated on Windows. This of course requires some small allocations in the startup code, just like below-main libc code has to do on Windows.

Someone should make a PR for this. The first option is the sensible way to go imo: allocations shouldn't be made without the user doing so. I'm happy to make a quick PR for this if wanted.

enkore commented 2 years ago

I'm not convinced the "no implicit allocations" rule applies, because this would just be another explicit allocation made by start.zig for the user.

ominitay commented 2 years ago

Another? I don't think that start.zig performs any heap allocation as of present. And I don't believe that we should change that. It also guarantees a memory leak if a program uses std.os.exit, which I don't think is acceptable.

enkore commented 2 years ago

There's TLS on Linux, which does it through mmap. start.zig can also init+start the event loop and I haven't checked, but would be very surprised if that doesn't allocate.

It also guarantees a memory leak if a program uses std.os.exit, which I don't think is acceptable.

I don't quite follow - just to clarify, we're talking about splitting the Windows process command line and putting that into os.argv, so this only affects the Windows .exe target, where the OS will get rid of any remains on process exit regardless.

InKryption commented 2 years ago

I think the question of whether to populate argv on Windows could go either way; you can always opt-out of any argv population by exporting the entry point directly. Or maybe it could be opt-in, with a global flag like pub const populate_argv_on_windows = true;, or pub const always_populate_argv = true; Edit: or, for my suggestion, you could go the other direction and make it opt-out to the tune of pub const dont_populate_argv_on_windows = true;/pub const native_argv_only = true;

ominitay commented 2 years ago

There's TLS on Linux, which does it through mmap.

Didn't spot that. TLS avoids using mmap if it can fit things in a buffer, but is also necessary to do here, so I consider that to be reasonable.

start.zig can also init+start the event loop and I haven't checked, but would be very surprised if that doesn't allocate.

The event loop is optional though, and has to be opted into by the user, so I consider that explicit at least.

Perhaps there should be an option to populate argv, but I think that's an unnecessary abstraction, when a user could (and should, imo) just set up an allocator (on the heap or stack), and use std.process.args instead.

ominitay commented 2 years ago

Additionally, it makes things simpler for library authors etc than saying maybe argv will compile error on Windows, or maybe it won't.

ominitay commented 2 years ago

maybe it could be opt-in

This would defeat the point of such a change.

enkore commented 2 years ago

Short-term I think #10734 should be merged, just to have an immediate/obvious fix for the situation as it is now. The discussion might already merit a separate ticket because I see two points of contention:

  1. Should start.zig provide os.argv on Windows?
    • No hidden allocations -> technically not true for start.zig (if one wants to apply the rule to it), but as @ominitay pointed out TLS shouldn't normally allocate and async is opt-in. As @InKryption pointed out, a flag could be added to make os.argv optional (a big hammer for a small problem imho).
  2. Should os.argv and std.process.args use the same types?
    • os.argv has to work with POSIX's type, which is zero-terminated strings, not slices.
    • std.process.args uses zero-terminated slices, the preferred Zig type for this kind of thing.
  3. Should use of os.argv be discouraged / removed?
    • As we have seen, os.argv isn't an abstraction across supported platforms, not even those who have command lines. Maybe it shouldn't be in the os namespace?
    • std.process.args is clearly the better interface for general use: works everywhere, uses slices.
InKryption commented 2 years ago

I think, put like that, I would be in favor of the third option: std.os.argv should be removed.

ominitay commented 2 years ago

I wholly agree with the third option, following the Zig Zen: Only one obvious way to do things.

blblack commented 6 months ago

As someone with a background in writing systems-level software (as in network daemons and such) in the traditional *nix/POSIX/C world, and who loves the idea of Zig-the-language as a C-the-language replacement for both porting old and developing new systems-level software, I'd like to offer an opinionated take on the current state of affairs, the reasonable stuff I've seen outlined above, and a few opinions of my own. Maybe this can at least restart the debate process towards an implementable outcome everyone can aim towards.

Keep in mind I'm relatively-new to Zig itself and still finding my way around. If I've straight up misunderstood something, please let me know!

Current state of affairs in master

Opinions and what I think might be a reasonable capture or at least interpretation of current consensus, maybe?

blblack commented 6 months ago

Further bits (which perhaps belong in a separate proposal?)

Assuming the above seems reasonable, I do have some other thoughts beyond that which might merit either further debate here or perhaps a separate proposal:

blblack commented 6 months ago

Note very-related work ongoing in #19354