ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.77k stars 2.54k forks source link

std.fs: `Iterator.next` doesn't handle `ENOENT`, but it can happen in valid scenarios #12211

Closed squeek502 closed 2 years ago

squeek502 commented 2 years ago

Zig Version

master

Steps to Reproduce

getdents can return ENOENT if the directory referred to by the fd is deleted before the next getdents call (but after the fd is opened). Currently this causes an unexpected error to be returned from IterableDir.Iterator.next.

Relevant code for the Linux implementation of Iterator.next: https://github.com/ziglang/zig/blob/a2ab9e36faded9755ecc1fe809c49140120c3c61/lib/std/fs.zig#L604-L612

Contrived test case (note that I've run into this in a non-contrived use case, though):

const std = @import("std");

test "iterator ENOENT" {
    var tmp = std.testing.tmpDir(.{});
    defer tmp.cleanup();

    var iterable_subdir = try tmp.dir.makeOpenPathIterable("subdir", .{});

    var iterator = iterable_subdir.iterate();

    // Contrived reproduction, but this can happen outside of the program, in another thread, etc
    try tmp.dir.deleteTree("subdir");

    while (try iterator.next()) |_| {}
}

Expected Behavior

ENOENT to be handled in some way by Iterator.next (unsure in what way exactly, but it shouldn't be unreachable or an unexpected error).

Note: I've only tested this on Linux, unsure how this manifests on other platforms.

Actual Behavior

Test [1/1] test "iterator ENOENT"... unexpected errno: 2
/home/ryan/Programming/zig/zig/lib/std/debug.zig:560:19: 0x20ee18 in std.debug.writeCurrentStackTrace (test)
    while (it.next()) |return_address| {
                  ^
/home/ryan/Programming/zig/zig/lib/std/debug.zig:158:31: 0x20bbba in std.debug.dumpCurrentStackTrace (test)
        writeCurrentStackTrace(stderr, debug_info, detectTTYConfig(), start_addr) catch |err| {
                              ^
/home/ryan/Programming/zig/zig/lib/std/os.zig:5462:40: 0x20d7e2 in std.os.unexpectedErrno (test)
        std.debug.dumpCurrentStackTrace(null);
                                       ^
/home/ryan/Programming/zig/zig/lib/std/fs.zig:611:68: 0x20abbb in Iterator.next (test)
                            else => |err| return os.unexpectedErrno(err),
                                                                   ^
/home/ryan/Programming/zig/tmp/walkerdelete.zig:14:29: 0x2093bd in test "iterator ENOENT" (test)
    while (try iterator.next()) |_| {}
                            ^
/home/ryan/Programming/zig/zig/lib/test_runner.zig:79:28: 0x22ee1c in (root).main (test)
        } else test_fn.func();
                           ^
/home/ryan/Programming/zig/zig/lib/std/start.zig:570:22: 0x22830c in std.start.callMain (test)
            root.main();
                     ^
/home/ryan/Programming/zig/zig/lib/std/start.zig:514:12: 0x20defe in std.start.callMainWithArgs (test)
    return @call(.{ .modifier = .always_inline }, callMain, .{});
           ^
/home/ryan/Programming/zig/zig/lib/std/start.zig:424:17: 0x20b286 in std.start.posixCallMainAndExit (test)
    std.os.exit(@call(.{ .modifier = .always_inline }, callMainWithArgs, .{ argc, argv, envp }));
                ^
/home/ryan/Programming/zig/zig/lib/std/start.zig:338:5: 0x20b092 in std.start._start (test)
    @call(.{ .modifier = .never_inline }, posixCallMainAndExit, .{});
    ^
Test [1/1] test "iterator ENOENT"... FAIL (Unexpected)
/home/ryan/Programming/zig/zig/lib/std/os.zig:2429:19: 0x228074 in std.os.unlinkatZ (test)
        .ISDIR => return error.IsDir,
                  ^
/home/ryan/Programming/zig/zig/lib/std/fs.zig:1750:25: 0x20fdf3 in std.fs.Dir.deleteFileZ (test)
            else => |e| return e,
                        ^
/home/ryan/Programming/zig/zig/lib/std/fs.zig:1731:13: 0x20d397 in std.fs.Dir.deleteFile (test)
            return self.deleteFileZ(&sub_path_c);
            ^
/home/ryan/Programming/zig/zig/lib/std/os.zig:5464:5: 0x20d7f1 in std.os.unexpectedErrno (test)
    return error.Unexpected;
    ^
/home/ryan/Programming/zig/zig/lib/std/fs.zig:611:43: 0x20abd2 in Iterator.next (test)
                            else => |err| return os.unexpectedErrno(err),
                                          ^
/home/ryan/Programming/zig/tmp/walkerdelete.zig:14:12: 0x2093ee in test "iterator ENOENT" (test)
    while (try iterator.next()) |_| {}
           ^
0 passed; 0 skipped; 1 failed.
squeek502 commented 2 years ago

After a bit of investigation, I think a reasonable change here might be to just add

.NOENT => return null,

to the .linux implementation of Iterator.next.

Some reasoning:

iter.next() catch |err| switch (err) {
    error.DirNotFound => null,
    else => |e| return e,
}

which is exactly what returning null from within Iterator.next would do.

ghost commented 2 years ago

I think your reasoning for returning null makes sense and should go in a PR.

marler8997 commented 2 years ago

If we return a DirNotFound error, then the application can always turn that error into a null, but if we return null, any application that would care whether the directory was removed can't do the reverse.

I can imagine applications that might care about this distinction. For example, say an application is reading config files from a directory. If that directory gets removed while it's iterating, rather than continuing on like it's finished with all the files, it may want to stop and assert an error. (I would want this behavior in my zigwin32 project that reads JSON files to generate the zig bindings).

It would also be problematic for programs that do want to handle this case differently to detect it afterwards. This would mean you'd always have to check whether the directory still exists after you're done iterating over it (an extra check for the "happy path") and this check will always be a race condition since the directory could have been removed and recreated. The only correct solution they would have is to use the lower-level APIs to iterate over the directory.

Actually now that I've thought about it, I would think that in most cases you'd probably want to assert an error if a directory is removed while you're iterating over it. In zig for example, if you're iterating over the files in a cache directory to copy, then you wouldn't want to continue like nothing happened if the source directory was suddenly removed.

squeek502 commented 2 years ago

@marler8997 I would normally agree with you, and had similar thoughts initially, but AFAIK the APIs in std.fs.Dir are meant to behave similarly on different platforms and AFAIK, detecting this condition can only happen on Linux/WASI. On non-Linux UNIX platforms, ENOENT is not a possible error, and getdents just returns 0 if the directory being iterated is deleted (only tested on FreeBSD, other platforms might behave differently).

So, if you wrote some code that handles DirNotFound in some particular way, you'd only get that behavior on Linux, and on non-Linux platforms you'd just get a silent end-of-iteration.

I figured that making Linux behave like non-Linux was the better route than trying to get non-Linux to behave like Linux (which would involve something like an exists check after any 0 return of getdents).

marler8997 commented 2 years ago

Some systems will have extra error codes that others do not. When this is the case, I think it's better to return the error when possible rather than ignore it by default. I believe this is what the rest of the std library does and I think it's the right choice in general.

Maybe you missed it, but I addressed the solution you suggested where you check whether the directory still exists afterwards. This solution is a race condition since the directory can be deleted/re-created. After I had thought on it, it seemed like most programs (or at least many) would want to handle the case where the directory is deleted as a non-happy path. With this API, all cases that want the "correct behavior" are forced to duplicate their directory traversal code to use the lower-level API for any systems that can detect this error and fallback to the higher level API otherwise.

ominitay commented 2 years ago

I agree with marler: returning null is a potentially misleading result, and an error is more appropriate here.

squeek502 commented 2 years ago

I can see pros/cons to both approaches, and this general topic seems to be something that remains unresolved in terms of how the std.fs API should be designed. Some more relevant issues/discussions on the same theme:

I personally still lean towards returning null in this instance, though:

One potential compromise might be to split next into next and nextLinux as is done for the .macos, .ios, .freebsd, ... switch case of Iterator, where nextLinux would return DirNotFound, and then next converts it to null. That way, if you're on Linux, you could call nextLinux directly if you want to handle the error, but otherwise you could just call next to get consistent cross-platform behavior.