ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
32.18k stars 2.35k forks source link

Should EINVAL always be unreachable? #6925

Open jorangreef opened 3 years ago

jorangreef commented 3 years ago

The std lib handles EINVAL as unreachable for os.openatZ().

However, there are some legitimate runtime use cases that could trigger EINVAL, e.g. trying to open a file descriptor in O_DIRECT on Linux, but on a file system that does not support O_DIRECT.

This then leads to undefined behavior rather than a safe error.

It also means you would have to reimplement os.openatZ() if you wanted to write something to detect fs support for O_DIRECT.

There are also other scenarios that could trigger EINVAL, e.g. the system does not support synchronized I/O for this file, or the O_XATTR flag was supplied and the underlying file system does not support extended file attributes.

These are all fairly common failure modes for storage/backup/sync applications that need to probe file system characteristics.

Should EINVAL always be unreachable? Could the std lib start to move away from unreachable for this and other syscalls?

jorangreef commented 3 years ago

Related: https://github.com/ziglang/zig/issues/6389

andrewrk commented 3 years ago

It should be handled on a case-by-case basis. For some syscalls, EINVAL should be unreachable because it can only mean something like an invalid pointer address being passed to the kernel, or invalid flags.

However, as you pointed out, some kernel APIs unfortunately have decided to dual-purpose this error code to mean other things, in which case we must to handle the ambiguity by mapping it to an error code. We had to make this change with EBADF recently for some syscalls, for example.

I do want to keep the error sets clean, however, so I am opposed to a blanket modification of all EINVAL code sites without cause.

jorangreef commented 3 years ago

For some syscalls, EINVAL should be unreachable because it can only mean something like an invalid pointer address being passed to the kernel, or invalid flags.

Ah I see... so returning an error would be less safe in that particular case compared to unreachable (or panic) because the program state is likely corrupted. If the intention is to crash the program and fail fast, would there be a reason not to use @panic?

JOT85 commented 3 years ago

@jorangreef @andrewrk I recently I came across this issue. I was writing a script to do a few things, including setting my screen brightness. I'm on an AMD system so I'm writing to /sys/class/backlight/amdgpu_bl0/brightness. If you write an invalid number, or a number that's out of range, it gives error EINVAL, however in std/os.zig, pub fn write, it marks EINVAL as unreachable so I can't handle this error nicely.

The man page describes EINVAL as:

fd is attached to an object which is unsuitable for writing; or the file was opened with the O_DIRECT flag, and either the address specified in buf, the value specified in count, or the current file offset is not suitably aligned.`

So I think this is mainly the fault of my amdgpu driver misbehaving, but I feel like this kind of error, which is handed to the program, should be returned.

I do want to keep the error sets clean, however, so I am opposed to a blanket modification of all EINVAL code sites without cause.

Maybe adding EINVAL to the error sets isn't the right solution. I think the errors currently marked as unreachable should return unexpected since that error result is unexpected (it shouldn't happen) but not necessarily unreachable (because it can occur, either with a driver like amdgpu, or, as #6389 mentions, changing kernel behaviour).

unreachable isn't a great way to behave upon something that can, and does, actually happen.

andrewrk commented 3 years ago

If the intention is to crash the program and fail fast, would there be a reason not to use @panic?

Yes. @panic is what unreachable does in safe build modes. In unsafe build modes, it is undefined behavior. unreachable is semantically correct, and the application developer has the knob to decide how to exploit the semantics: allow the compiler to optimize unreachable, or turn them all into panicks. In a future improvement to zig, the application developer will have this knob at a finer grained level, applying different release modes to different packages.

However what we can do is decide that the code path is not, in fact, unreachable, and that leads us to...

I think the errors currently marked as unreachable should return unexpected since that error result is unexpected (it shouldn't happen) but not necessarily unreachable (because it can occur, either with a driver like amdgpu, or, as #6389 mentions, changing kernel behaviour).

I would support a change to make EINVAL map to error.Unexpected for the functions where we already have Unexpected in the error set. Note that what we are doing here is saying, this code path is not unreachable, but it is unexpected.

We may want to consider doing this carefully - in a debug build currently, hitting that EINVAL => unreachable line is a great debugging experience, and returning error.Unexpected would be slightly less ideal.

jorangreef commented 3 years ago

We may want to consider doing this carefully - in a debug build currently, hitting that EINVAL => unreachable line is a great debugging experience, and returning error.Unexpected would be slightly less ideal.

Agreed, I would just add that even in debug builds, there are situations where an error should be thrown for EINVAL, and where this would be the right thing to do e.g. where you need to test a filesystem for O_DIRECT support. Receiving and being able to handle EINVAL in this context is perfectly normal and expected as we would expect some Linux filesystems not to support O_DIRECT e.g. in some virtualized environments. There's no way to do this without having access to EINVAL as an error, and not being able to do this in debug builds would force the developer to resort to safe or fast optimizations, or writing their own syscall wrappers.

marler8997 commented 3 years ago

@jorangreef also note that it's pretty easy to call a system call directly in Zig. If you're doing a platform-specific test like testing for O_DIRECT, it's easy to just call std.os.linux.open directly.

jorangreef commented 3 years ago

Thanks @marler8997 , yes we're using os.system.openat already but it would be great to use more of the std lib. O_DIRECT is more than platform-specific, it's also file-system specific (even on Linux).

notcancername commented 1 year ago

Often, it is necessary to accept a file descriptor and perform some operation on it. Crashing when that file descriptor is invalid leaves a program with no way to prevent the UB without reimplementing or pasting standard library code.

The similarity this bears to memory allocation failure is remarkable. The main argument for handling EBADF as unreachable is that it's "always a race condition". Not only is this obviously not true as Andrew himself demonstrates here, but it is the very same argument as "memory allocation failure never happens because overcommit exists".

Handling memory allocation failure by crashing limits the reusability of the code because there are use cases in which the user needs to handle it. Handling EBADF by crashing limits the reusability of the code because there are use cases in which the user needs to handle it.

marler8997 commented 1 year ago

Handling EBADF by crashing limits the reusability of the code because there are use cases in which the user needs to handle it.

Saying it "limits the reusability" is too strong IMO, it just puts an extra requirement on the caller to verify the file descriptor is valid.

Handling "memory allocation failure" isn't analogous to "requiring valid file descriptors". The equivalent in terms of file desciptors would be handling "file descriptor creation failure", which you can and should always do.

The equivalent of "requiring valid file descriptors" in terms of memory allocation would be "requiring valid pointers". This is such a common occurrence that Zig incorporates it into its type system, via optional and non-optional pointers. The majority of Zig functions require valid pointers (not optional). This doesn't mean the code can't be reused, it just puts a requirement on the caller to verify the pointer is valid before calling it, this is the same requirement Zig's std library puts on file descriptors.

In fact on systems that use pointers for handles (like Windows), you can use "optional vs non-optional" on the handles themselves to declare whether they are required to be "valid". The zigwin32 library leverages this which enforces handle validity at compile time instead of runtime, pretty neat :)

P.S. Actually it's more accurate to say that this enables libraries to accept invalid handles on Windows and have that protected by the type system. The different with std is that by convention it pretty much requires all file descriptors to be valid.

notcancername commented 1 year ago

Saying it "limits the reusability" is too strong IMO, it just puts an extra requirement on the caller to verify the file descriptor is valid.

I'm sorry, but "verifying the file descriptor is valid" is system-dependent and extra code that is not necessary, violating Zig's goal of optimality (for instance: ioctl syscall on Linux, GetHandleInformation on Windows, I'm sure other flavors exist). The underlying APIs elect to make invalid file descriptors a valid error condition. Removing this (supported!) feature from the std.os APIs, which, correct me if I'm wrong, are intended to be multiplatform abstractions and not entirely new APIs.

Handling "memory allocation failure" isn't analogous to "requiring valid file descriptors". The equivalent in terms of file desciptors would be handling "file descriptor creation failure", which you can and should always do.

You misunderstand. File descriptors are not always acquired by the same code that uses them. Often, a file descriptor even originates from the user directly, as is the case with test/[.

The equivalent of "requiring valid file descriptors" in terms of memory allocation would be "requiring valid pointers". This is such a common occurrence that Zig incorporates it into its type system, via optional and non-optional pointers. The majority of Zig functions require valid pointers (not optional). This doesn't mean the code can't be reused, it just puts a requirement on the caller to verify the pointer is valid before calling it, this is the same requirement Zig's std library puts on file descriptors.

This is a terrible analogy. Unlike with file descriptors on many systems, it is categorically impossible to check whether a pointer is valid. The only case in which it is possible is with null pointers. It is analogous to checking whether a file descriptor is -1.

marler8997 commented 1 year ago

You misunderstand. File descriptors are not always acquired by the same code that uses them. Often, a file descriptor even originates from the user directly, as is the case with test/[.

Ah this does bring up an example I hadn't thought of. I think the requirement for valid file descriptors is fine within the context of a single program where there's an established API, but file descriptors can cross process boundaries. Triggering "unreachable" because a process received an invalid file descriptor from another process doesn't seem like a good approach :)

Here are solutions that come to mind

  1. std could be modified to gracefully return errors for invalid file descriptors
  2. the application could try to verify foreign file descriptors are valid before passing them to std
  3. depending on what the application is doing with the file descriptor, it could call a lower level method that does not trigger unreachable on an invalid file descriptor (only required for the first syscall)

Each solution has tradeoffs. I don't particularly like the first solution (what I think is being proposed here) since I think the use cases that require this are much more rare compared to code that is able to verify a file descriptor is valid and wouldn't need to handle this error (i.e. code that would be doing error.BadFileDescriptor => unreachable). My argument is that solutions 2 and 3 are acceptable and require no changes to the std. Solution 2 requires an extra syscall (like you mentioned), but you still have the option of solution 3 which can be done with no extra code, you're just circumventing Zig's std abstrations.

notcancername commented 1 year ago

With option 3, you unfortunately run into the whole reusability problem.

My .02 is just that EBADF should be made handleable in std.os. It doesn't make sense to do that for std.fs, but std.os is supposed to abstract existing APIs, and it does not do a good job of that if it cannot meet the use cases of the APIs it it supposed to replace by causing UB.

LordMZTE commented 11 months ago

I had another use-case today where I'd need to handle EBADF: I'm writing a HTTP server which I want to be able to cleanly shut down when it receives a SIGINT. To do this, I call std.http.Server.deinit in my signal handler. This causes the (blocking) std.http.Server.accept in the main loop to throw an EBADF. I'm not aware of another way to handle this.

nektro commented 11 months ago

@LordMZTE getting EBADF means you're likely calling Server.deinit twice. I have a PR open for the server use case in active reveiw https://github.com/ziglang/zig/pull/16166

marler8997 commented 11 months ago

I had another use-case today where I'd need to handle EBADF: I'm writing a HTTP server which I want to be able to cleanly shut down when it receives a SIGINT

If I'm remembering correctly, you can accomplish this by calling shutdown on the listen sock when you receive SIGINT but make sure not to call close until after the accept call completes.

P.S. Calling close while you are calling accept might be considered a misuse of the posix socket API. If so, it's interesting to note that the std library's choice to make this unhandlable will have resulted in forcing you to fix this misuse rather than adding a "band-aid" solution where you catch/swallow an EBADF error.

LordMZTE commented 11 months ago

Yep! It looks like the problem was indeed calling deinit twice. shutdown worked just fine for me. Sorry for derailing the issue!

daurnimator commented 10 months ago

situations where an error should be thrown for EINVAL, and where this would be the right thing to do e.g. where you need to test a filesystem for O_DIRECT support.

Are these situations enumerable? e.g. can we do a series of checks such as if (flags & O_DIRECT != 0) return error.Invalid else unreachable?