ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.67k stars 2.47k forks source link

unhandled NTSTATUS: DELETE_PENDING in std.os.windows.DeleteFile #6452

Open andrewrk opened 3 years ago

andrewrk commented 3 years ago

cc @kubkon

c:\msys64\home\andy\dev\zig\build-llvm10>bin\zig.exe build test
Test [1/3] test "self-hosted"... tests [1/20] referencing decls which appear later in the file [1/1] update [1/3] write.tests [1/20] referencing decls which appear later in the file [2/1] error.Unexpected NTSTATUS=0xc0000056
C:\msys64\home\andy\dev\zig\build-llvm10\lib\zig\std\debug.zig:410:53: 0x7ff71cc81dc0 in std.debug.writeCurrentStackTraceWindows (test.obj)
    const n = windows.ntdll.RtlCaptureStackBackTrace(0, addr_buf.len, @ptrCast(**c_void, &addr_buf), null);
                                                    ^
C:\msys64\home\andy\dev\zig\build-llvm10\lib\zig\std\debug.zig:395:45: 0x7ff71cbb675f in std.debug.writeCurrentStackTrace (test.obj)
        return writeCurrentStackTraceWindows(out_stream, debug_info, tty_config, start_addr);
                                            ^
C:\msys64\home\andy\dev\zig\build-llvm10\lib\zig\std\debug.zig:116:31: 0x7ff71cb25008 in std.debug.dumpCurrentStackTrace (test.obj)
        writeCurrentStackTrace(stderr, debug_info, detectTTYConfig(), start_addr) catch |err| {
                              ^
C:\msys64\home\andy\dev\zig\build-llvm10\lib\zig\std\os\windows.zig:1572:40: 0x7ff71cbc19ee in std.os.windows.unexpectedStatus (test.obj)
        std.debug.dumpCurrentStackTrace(null);
                                       ^
C:\msys64\home\andy\dev\zig\build-llvm10\lib\zig\std\os\windows.zig:828:40: 0x7ff71cdb79fa in std.os.windows.DeleteFile (test.obj)
        else => return unexpectedStatus(rc),
                                       ^
C:\msys64\home\andy\dev\zig\build-llvm10\lib\zig\std\os.zig:1850:30: 0x7ff71cd2528c in std.os.unlinkatW (test.obj)
    return windows.DeleteFile(sub_path_w, .{ .dir = dirfd, .remove_dir = remove_dir });
                             ^
C:\msys64\home\andy\dev\zig\build-llvm10\lib\zig\std\fs.zig:1319:21: 0x7ff71cc8b268 in std.fs.Dir::std.fs.Dir.deleteFileW (test.obj)
        os.unlinkatW(self.fd, sub_path_w, 0) catch |err| switch (err) {
                    ^
C:\msys64\home\andy\dev\zig\build-llvm10\lib\zig\std\fs.zig:1284:36: 0x7ff71cbbfc06 in std.fs.Dir::std.fs.Dir.deleteFile (test.obj)
            return self.deleteFileW(sub_path_w.span());
                                   ^
C:\msys64\home\andy\dev\zig\build-llvm10\lib\zig\std\fs.zig:1622:39: 0x7ff71cb33652 in std.fs.Dir::std.fs.Dir.deleteTree (test.obj)
                    if (dir.deleteFile(entry.name)) {
                                      ^
C:\msys64\home\andy\dev\zig\build-llvm10\lib\zig\std\testing.zig:265:35: 0x7ff71cb2aa61 in std.testing.TmpDir::std.testing.TmpDir.cleanup (test.obj)
        self.parent_dir.deleteTree(&self.sub_path) catch {};
                                  ^
C:\msys64\home\andy\dev\zig\src-self-hosted\test.zig:432:26: 0x7ff71cb28fb4 in TestContext::TestContext.runOneCase (test.obj)
        defer tmp.cleanup();
                         ^
C:\msys64\home\andy\dev\zig\src-self-hosted\test.zig:419:32: 0x7ff71cb247e2 in TestContext::TestContext.run (test.obj)
            try self.runOneCase(std.testing.allocator, &prg_node, case);
                               ^
C:\msys64\home\andy\dev\zig\src-self-hosted\test.zig:21:16: 0x7ff71cb23c37 in test "self-hosted" (test.obj)
    try ctx.run();
               ^
C:\msys64\home\andy\dev\zig\build-llvm10\lib\zig\std\special\test_runner.zig:61:28: 0x7ff71cc6e830 in std.special.main (test.obj)
        } else test_fn.func();
                           ^
C:\msys64\home\andy\dev\zig\build-llvm10\lib\zig\std\start.zig:154:65: 0x7ff71cb24abe in std.start.WinMainCRTStartup (test.obj)
    std.os.windows.kernel32.ExitProcess(initEventLoopAndCallMain());
                                                                ^

I'm not sure what the minimal repro is. This happened when I pulled latest master and ran the tests on Windows.

--- a/lib/std/os/windows.zig
+++ b/lib/std/os/windows.zig
@@ -821,6 +821,7 @@ pub fn DeleteFile(sub_path_w: []const u16, options: DeleteFileOptions) DeleteFil
     switch (rc) {
         .SUCCESS => CloseHandle(tmp_handle),
         .OBJECT_NAME_INVALID => unreachable,
+        .DELETE_PENDING => ???,
         .OBJECT_NAME_NOT_FOUND => return error.FileNotFound,
         .INVALID_PARAMETER => unreachable,
         .FILE_IS_A_DIRECTORY => return error.IsDir,

So the question is what do we do here?

LemonBoy commented 3 years ago

The orange site suggests that the file you were trying to delete was already queued for deletion, but something else (indexer, AV, the very same Zig process maybe) was still keeping an open handle to it.

This means the delete operation is extremely platform-dependent and, on Windows at least, it may succeed and leave the file behind. Windows 10 has recently introduced a flag (FILE_DISPOSITION_POSIX_SEMANTICS) to fill this gap (and it seems that's now the default on recent builds) while other versions are SOL. You can always rename the file to something unique and then mark it for deletion, but that opens another can of worms :(

In #6397 @kubkon suggested to follow the ReactOS (and hopefully the Windows libc) behaviour by using FILE_DISPOSITION_INFORMATION rather than using the FILE_DELETE_ON_CLOSE flag. IMO that's better as it handles the directory-not-empty case for free and it's complementary to the use of FILE_DISPOSITION_INFORMATION_EX on newer platforms.

kubkon commented 3 years ago

I'll be happy to take a look but first I really want to have the first working prototype of MachO linker done (without debug info, etc., but something that will not get immediately killed by macOS due to missing dyld info, etc.). Alternatively, if anyone is eager to have a look I'll be happy to mentor!

EDIT: Oh, I forgot to mention that I agree with @LemonBoy in that we should definitely investigate the approach taken by ReactOS, and see if it helps out. In the meantime @andrewrk, if this bug will slow you down, we can revert the offending commits.

kubkon commented 3 years ago

BTW, the CI failure on Win with a cryptic name of LLVM: Out of memory is unrelated, right?

andrewrk commented 3 years ago

No problem @kubkon - what do you suggest we do to get master branch passing in the meantime? We'll need that to be green so that I can merge #6250 today, and then your macho code will go on top of that

BTW, the CI failure on Win with a cryptic name of LLVM: Out of memory is unrelated, right?

Where are you seeing this?

kubkon commented 3 years ago

No problem @kubkon - what do you suggest we do to get master branch passing in the meantime? We'll need that to be green so that I can merge #6250 today, and then your macho code will go on top of that

I suggest we revert the offending commit for the time being -> #6462.

BTW, the CI failure on Win with a cryptic name of LLVM: Out of memory is unrelated, right?

Where are you seeing this?

Oh, in our CI. For instance see CI#1 or CI#2. Both jobs are from today. I'm wondering if we have perhaps reached the Azure mem limit on Windows and hence the LLVM OOM.

andrewrk commented 3 years ago

That is entirely possible you can see the rss has been steadily growing of the std lib tests. I think it might be time to start moving some std lib API to the std lib ~graveyard~ orphanage.

squeek502 commented 3 years ago

See also this from https://github.com/ziglang/zig/issues/5537#issuecomment-639182047:

From https://go.microsoft.com/fwlink/?LinkId=140636 section 4.3.1 (page 33):

The delete on close flag can be specified when creating or opening a link or stream. While the handle remains open, the handle is in the delete-on-close state. This means that additional handles can be opened to the file or stream provided they specify FILE_SHARE_DELETE access. After receiving the cleanup IRP for a handle that specified DELETE_ON_CLOSE, the handle’s corresponding link or stream enters the delete-pending state. While in this state any attempt to open the link or stream will fail with STATUS_DELETE_PENDING.

Note that if the DELETE_ON_CLOSE flag is specified on a directory with child files or directories, the create operation will succeed, but the delete on close flag will be silently ignored when processing the cleanup IRP and the directory will not be deleted.

Seems like FileDispositionInformation might be better for this use case? See the linked PDF above.

The relevant FileDispositionInformation info from section 4.3.2 (page 33):

4.3.2 Set Delete-on-closeusing FileDispositionInformationInformation Class (IRP_MJ_SET_INFORMATION)

The FileDispositionInformation information class can be used to mark a link or stream for deletion and transition it to the delete-pending state. In this state attempts to open the link or stream will fail with STATUS_DELETE_PENDING.

This operation can fail for any of the following reasons:

  • The handle wasopened by ID –STATUS_INVALID_PARAMTER
  • The file is an internal file system metadata file –STATUS_CANNOT_DELETE
  • The file is marked read-only –STATUS_CANNOT_DELETE
  • The volume is marked read-only _-STATUS_MEDIA_WRITE_PROTECTED or STATUS_ACCESS_DENIED (on CDFS & UDF file systems when the file system is read-only)
  • The handle is to a directory that still has files or directories in it –STATUS_DIRECTORY_NOT_EMTPY
moosichu commented 2 years ago

I recently ran into this issue wheren I leaked file handles, resulting in a delete pending error when trying to create a new file in the same location. The delete then did go through when the program closed, with it all happening quick enough for me not to be sure what is going on.

Getting error.Unexpected NTSTATUS=0xc0000056 is user unfriendly, considering the common case for this kind of error will be some kind of file handle leakage, is there an benefit in indicating that to the user? (maybe only in debug builds?).

Leaking file handles generally seems to currently be a problem that can silently exist for a while (especially on some platforms) before you suffer the consequences for it in an unexpected place. Would be it be worth creating an issue for this problem specifically and trying to find a solution for making the default debugging experience for those kinds of issues a lot more friendly?

moosichu commented 2 years ago

Also for reference, here is a repro case for this specific problem on Windows:

const std = @import("std");
const fs = std.fs;

pub fn main() anyerror!void {
    var test_src_dir = try fs.cwd().openDir("", .{});
    defer test_src_dir.close();
    {
        const test_file_handle = try test_src_dir.createFile("test.txt", .{});
        test_file_handle.close();
    }
    {
        const test_file_handle = try test_src_dir.openFile("test.txt", .{ .mode = .read_only });
        _ = test_file_handle; // leak file handle, do not close!
        try test_src_dir.deleteFile("test.txt"); // this will queue a pending deletion
    }
    {
        // Returns error.Unexpected NTSTATUS=0xc0000056, as file handle leaked above is keeping old test.txt alive.
        const test_file_handle = try test_src_dir.openFile("test.txt", .{ .mode = .read_only });
        test_file_handle.close();
    }
}
kubkon commented 2 years ago

I recently ran into this issue wheren I leaked file handles, resulting in a delete pending error when trying to create a new file in the same location. The delete then did go through when the program closed, with it all happening quick enough for me not to be sure what is going on.

Getting error.Unexpected NTSTATUS=0xc0000056 is user unfriendly, considering the common case for this kind of error will be some kind of file handle leakage, is there an benefit in indicating that to the user? (maybe only in debug builds?).

Leaking file handles generally seems to currently be a problem that can silently exist for a while (especially on some platforms) before you suffer the consequences for it in an unexpected place. Would be it be worth creating an issue for this problem specifically and trying to find a solution for making the default debugging experience for those kinds of issues a lot more friendly?

I think we should figure out how to avoid this problem altogether rather than figuring out how to better flag that to the user. Is this something you'd like to look into perhaps?

moosichu commented 2 years ago

What do we want to happen when the user deletes a file without closing the file handle to it elsewhere? On Linux this issue was just silently ignored, do we want that? Becuase there seems to be a genuine user error there.

Do we want the system to immediately error on trying to delete that file? Even on Linux? I'm not sure I have the answers to those questions - but a consistent experience is probably desirable across platforms.

I would be happy to look into implementing any solution, but I'm not really sure what an appropriate solution would look like.

EDIT: Actually in the short term I'm going to focus on other problems, because I do want to see zar through to completion ASAP, and I have limited time as is. But I will keep an eye on this issue and might swing back around to it later.

kubkon commented 2 years ago

For Windows, I'd look into Wine and ReactOS to see how they flag it and handle it. I have very little experience with actually using this API, and my first question would be: how do you normally handle this on Windows? What is the best practice wrt to file deletion on Windows? And so on. Another thing to consider, if the API is customisable enough, why not expose it in such a way so that the user of libstd can call it either with delete deferred or force an immediate delete (if at all possible).

moosichu commented 2 years ago

I have very little experience with actually using this API, and my first question would be: how do you normally handle this on Windows?

Same - and I think the investigations required to get this right are probably beyond my bandwidth right now. There's also multiple aspects to this problem. Because on one hand - what do you expect as a consumer/user of std.os (also #6600 is related)? Is consistency of behaviour across platforms the priority here? If so what do we want to do (consistently)? Or is the preference to behave consitently within a platform's ecosystem - is it OK for the file APIs of std.os to differ between platforms for that?

I think it comes to this example: https://github.com/ziglang/zig/issues/6452#issuecomment-1089290379 Currently on Linux this runs without any errors, on Windows it runs with an "Unexpected" error. What do we expect/want this to do on each platform?

kubkon commented 2 years ago

For libstd I'd very much in favour of the default acting the same across platforms but providing enough lower-level functionality to make it fully customisable. Higher-level helpers would do too.

moosichu commented 2 years ago

InKryption mentioned on Discord seeing some discussion about adding some debug state info for catching this stuff as well in a PR somewhere, so just mentioning that here at least to remind myself to try and find it and link it here.

squeek502 commented 1 year ago

Haven't tried the reproduction in https://github.com/ziglang/zig/issues/6452#issuecomment-1089290379 to see how it behaves, but it's worth noting that the behavior here has changed after https://github.com/ziglang/zig/pull/15316:

matu3ba commented 1 year ago

Should be fixed in PR #15501. Can be closed once Windows 10 SP1 is mandatory/WIndows 10 support is dropped with known shortcoming for non-ntfs systems.