ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.95k stars 2.55k forks source link

`std.fs` is missing an "open any" API #16738

Open squeek502 opened 1 year ago

squeek502 commented 1 year ago

Right now, the status quo is:

This means that there is currently no cross-platform std.fs.Dir API that allows opening any path (regardless of its type). Having an API that allows opening any path on all platforms is possible, though, and would be useful (see https://ziggit.dev/t/how-to-check-if-path-points-to-a-file-or-a-directory/).


There are a few ways that this could be addressed:

  1. Add filter: enum { file_only, any } to File.OpenFlags and let the caller of Dir.openFile choose which behavior they want
  2. Add a separate Dir.openAny function that can open any type, and make Dir.openFile always return error.IsDir for directory paths on all platforms
  3. Make openFile open any type on all platforms and therefore make that the 'cross-platform' behavior (i.e. don't even have a function that returns error.IsDir when opening a directory as read-only) [this is the option I prefer the least, since it means that you'd have to do a stat call afterwards if you wanted the error.IsDir behavior (which wouldn't be necessary on windows if OpenFile was just called with .file_only)]

Note that all of these options would effectively close https://github.com/ziglang/zig/issues/5732 (since in option 1 & 2, a stat call would be done on non-Windows in the file-only version to get the matching error.IsDir behavior)

rohlem commented 1 year ago

To me the category "file" intuitively doesn't contain "directory", therefore option 2 of having a separate function openAny sounds the least surprising.

Looking into std.fs.File there is an enum Kind with more exotic entries like block_device, whiteout, door, event_port - openAny would presumably permit all of these. Do you have any insight on whether most applications would want to agnostically handle these the same, or would a method openFileOrDir be more useful for the common use case? (Although I'm not sure what operations besides renaming are even equally supported for files and directories either.)

squeek502 commented 1 year ago

Do you have any insight on whether most applications would want to agnostically handle these the same, or would a method openFileOrDir be more useful for the common use case

The primary use case for openAny over openFileOrDir would be if you wanted to do:

var something = try std.fs.cwd().openAny("something", .{});
defer something.close();

const stat = try something.stat();
switch (stat.kind) {
    .file => {},
    .named_pipe => {},
    // ...
}

(a real world example of this sort of thing here)

Plus, AFAIK openFileOrDir would require an open + stat call while openAny would be able to just be able to use an open call. In other words, openFileOrDir would just remove some possible use-cases of openAny for questionable benefit.

andrewrk commented 1 year ago

I like (2) and I think that it could potentially even be called open rather than openAny. What would it return though?

As a related side note, I have decided to revert #12060.

squeek502 commented 1 year ago

What would it return though?

Good question. My initial thought would be File, since most of the File APIs would be useful regardless of the kind of the fd, but it may not be a perfect match.

This issue is also related: https://github.com/ziglang/zig/issues/12377 where some File APIs are useful for/missing for Dir

It's possible we may want to go for some sort of design where:

- There is a type like `Handle` or `Fd` or something like that, and this is where most of the useful-for-any-kind APIs would be moved into - Both `Dir` and `File` would store that new `Handle` type instead of a `os.fd_t`, and `Dir` would effectively guarantee `kind == .directory` and `File` would effectively guarantee `kind == .file` if they are `stat`ed Usage would then look something like: ```zig var dir = try std.fs.cwd().openDir("blah", .{}); defer dir.close(); // `stat` is a wrapper around `dir.handle.stat()`, or maybe you'd have to call `dir.handle.stat()` directly const stat = try dir.stat(); ``` and then `openAny` would just return a `Handle`. ```zig var handle = try std.fs.cwd().openAny("something", .{}); defer handle.close(); const other_stat = try handle.stat(); // maybe some way to go from Handle to File/Dir once the type is known? ``` --- Note that `Dir.stat` is currently a wrapper around `File.stat`: https://github.com/ziglang/zig/blob/4c97919e8e032a91e45d44ee438c9f00d16f52a4/lib/std/fs.zig#L2637-L2643

EDIT: See the comment after this one; I no longer think the design detailed in this comment makes sense with the 'non-dir' wrinkle: https://github.com/ziglang/zig/issues/16738#issuecomment-1685208296

squeek502 commented 1 year ago

So it actually turns out that OpenFlags.Filter.file_only is misnamed. When specified, the flag FILE_NON_DIRECTORY_FILE is used in NtCreateFile:

FILE_NON_DIRECTORY_FILE The file being opened must not be a directory file or this call fails. The file object being opened can represent a data file, a logical, virtual, or physical device, or a volume.

In more practical terms, this means that .file_only will successfully open named pipes (and as a further wrinkle for stat there's nothing in FileAllInformation that distinguishes it as a named pipe, would need to separately query FilePipeInformation or FileFsDeviceInformation to distinguish that I think)

As an example:

> test.exe \\.\pipe\mypipe
opening NT-prefixed path '\??\pipe\mypipe' with filter: .file_only
attributes: 0x80
kind=fs.file.File.Kind.file

0x80 is FILE_ATTRIBUTE_NORMAL: A file that does not have other attributes set. This attribute is valid only when used alone.

Relevant test code (with debug prints in the std library functions):

    var output_file = try std.fs.cwd().openFile(output_path, .{ .mode = .read_write });
    defer output_file.close();

    const stat = try output_file.stat();

So openFile is already a misleading name and should likely be changed (openNonDir would be the most accurate I guess). This might make option 2 less attractive?

jibal commented 1 year ago

To me the category "file" intuitively doesn't contain "directory"

And to me, who worked on UNIX kernels and did other systems programming on Posix systems for many years, intuitively (A) "file" is any object in the file system, something represented by a path, that has an inode and you can perform a staton it, that is a named component of a directory (which, BTW, is why open{File,Dir} are somewhat confusingly in Dir).

But I also have the more common intuition of (B) a file as a random-access collection of bytes stored on disk, as opposed to a directory, symlink, socket, block or character device, etc. ... the sort of thing that on Posix systems is sometimes referred to as a "plain file".

Which indicates that the term "file" is ambiguous. And it's really not possible to avoid the ambiguity when we have

std.fs.File.Kind.File, where the first Filerepresents (A) and the second represents (B), and std.fs.Dir.OpenError.FileNotFound, where Filerepresents (A), or the first proposal with std.fs.Dir.openFile(path, File.OpenFlags{.filter: file_only}), where the two instances of File refer to (A) and file_only refers to (B).

Note that std.fs.Dir.openFile already returns a std.fs.File, which has members like Kind, stat, chmod, etc. that apply to any kind of "file" (A), which answers Andrew's question of what openAny (or of course option 1) should return, and indicates that Dir.openFile already encompasses any kind of "file" (A).

Given the existence of numerous instances of "File" referring to any kind of fs object, I think option 1 is preferable to OpenAny. But I would change OpenFlags.filter: enum {file_only, any} to perhaps bool directory_allowed, as that is what the Windows implementation offers, per @squeek502's last comment. If users want finer control on the kind of disk object they are willing to accept, they can use stat.

But wait ... suppose you open a directory using Dir.openFile ... there needs to be a File.toDir() that returns a Dir so that you can perform directory operations on it. Is that even doable for all systems? It is on Windows since you can do directory operations on the handle returned by NtCreateFile and it is on Linux which has fdopendir. I'm not familiar with WASI ... perhaps it can save the path and reopen it as a directory.

BTW, Dir contains chmod and chdir functions that create a File and then calls File.chmod or File.chdir. I think there ought to be a Dir.toFile() function that returns a File so that other operations such as stat , accessed, etc. can be performed. If any of the File functions break when given a directory (or other non "file" (B)) then that's a problem with this proposal generally and appropriate tests should be added.

Another possibility is to merge Dir into File ... after all, File already has numerous functions (read, write, seek, etc.) that don't apply to every Kind, so why not also have directory functions there? In which case I would suggest static functions for opening/creating absolute paths, and {open,create,makeDir}Relative with a dir: File parameter for paths relative to an open directory. I don't think you should have to mess with cwd() when it's not necessary, and the {open,create}File functions living in Dir has a somewhat high surprise factor.

In any case, I think the openAny idea uncovers a bunch of issues related to the fs.{File,Dir} dichotomy that warrant a rethink.

rohlem commented 1 year ago

Naming idea: std.fs.Entry for "file system entry" as in @jibal's (A): Anything identified / exposed via the file system. (The most explicit name for type (B) aka collection of bytes that come to my mind is Blob. Though maybe DataFile is more intuitive / discoverable.)

jibal commented 1 year ago

@rohlem I think fs.Entry is reasonable, as long as the fs. is retained. But most type (B) are text files, so I don't think Blob or DataFile are appropriate, as those usually indicate things that are not human-consumable. fs.Entry, fs.Entry.Kind.File, fs.Entry.open or fs.Dir.openEntry, and std.fs.Dir.OpenError.EntryNotFound would go pretty far to eliminate the ambiguity, but it's a pretty ambitious change.

squeek502 commented 1 year ago

@jibal see https://github.com/ziglang/zig/issues/16736 for my thoughts on Dir and relative/absolute path stuff.

pjz commented 1 year ago

However it's split up on the inside, maybe a fs.Entry.open(.{ .kind = .{ .file, .dir } }) would be a good top-level wrapper/affordance?

nektro commented 11 months ago

for this use case my go-to has been to stat() it and then check .kind and use openFile/openDir respectively