Proposal: Add a way for a test to expect a panic

mdsteele commented 6 years ago

When I add safety checks to code (e.g. a panic that fires if the API user violates their end of the API contract), I want to be able to test those safety checks (e.g. by intentionally violating contract in the test, and checking that we get the expected panic). However, there doesn't seem to be any way to catch panics in Zig, even with a custom panic handler (which has to return noreturn, and thus cannot swallow the panic, or otherwise not halt/hang the program, as far as I can tell?), so this kind of testing doesn't seem to be possible. For comparison, Rust tests support this via the #[should_panic(expected = "substr")] annotation.

Some possible proposals:

1) Add a general mechanism for catching panics. Go has this in the form of recover, and Rust has catch_unwind. Zig could have a builtin to do a similar job. This builtin would be usable anywhere, but explicitly anti-recommended for ordinary error handling.

2) Provide a test-block-specific way to expect panics. Zig could have a builtin (e.g. @expectPanic(comptime expected_message_substr: []const u8) that would apply to the scope of the test block, and would cause the test to succeed if it panics with a matching message, or fail if it panics with the wrong message or doesn't panic at all.

3) Don't add this feature at all, and live with the fact that safety checks have to be tested at a higher level (e.g. by running each expect-panic-test as a separate subprocess and expecting that subprocess to crash, e.g. how {#code_begin|test_err|expected message#} in langref.html.in appears to work). Perhaps some mechanism could be added to std.build to make this a bit easier.

My personal preference would be for (2), since that makes the tests very easy to write. If we want (1) as a more general mechanism, then perhaps we could still have (2), or something like it (possibly in userspace), as a convenience. If we don't want to support catching panics, then (3) could work, but I worry that that would make writing safety tests painful enough that library authors would be discouraged from writing them.

andrewrk commented 6 years ago

This proposal would have the added benefit of allowing zig to self-host its runtime safety tests: https://github.com/ziglang/zig/blob/master/test/runtime_safety.zig

PavelVozenilek commented 6 years ago

For tests which call @panic or are rightfully expected to crash, add test annotation (as proposed in https://github.com/ziglang/zig/issues/1010#issuecomment-389230050, #567 and #513). Test runner would then fork, invoke just the crashing test and check the exit code. I'd say this is variant of the proposal # 3.

Do not add any save-me-from-the-crash feature available to the user, that's the road to hell.

There's use case when you do handle very, very rare situation, but add an assert there, just to make sure that this rare situation doesn't occur by some mistake in your regular code. For this special variant of assert should be invented, as proposed in https://github.com/ziglang/zig/issues/1304#issuecomment-408897007. I use not very helpful name assert2in C.

Example:

if (something-almost-impossible-happened) {
   assert2(false); // assert2 does nothing in stress tests, it fires when running ordinary code
   return 0;
}

bronze1man commented 6 years ago

save-me-from-the-crash feature available to the user

Please provide only one way to return and cache error. Golang's recover feature just make programer process their error with panic and recover which is less code than if xxx return xxx.But you will need to define function with two version, one return error, another panic if error happen. And two version of one functions is the begin of hell. But I think zig's error handle is easy enough which can just only use return error.

Test runner would then fork, invoke just the crashing test and check the exit code.

Crash process and catch by the parent process should be a good idea.

binary132 commented 6 years ago

In my experience panic / recover is not really abused in Go much.

isaachier commented 6 years ago

@binary132, look at how the Go JSON parser cleverly uses recover to handle invalid input errors.

Edit: Speak of the devil... an article on this topic: https://eli.thegreenplace.net/2018/on-the-uses-and-misuses-of-panics-in-go.

thejoshwolfe commented 6 years ago

isn't this possible in userspace using setjmp (or equivalent) to effectively create a restore point and then longjmp out of your panic handler? Pretty sure this would never fly at comptime though.

isaachier commented 6 years ago

This relates more to the LLVM IR and how it implements panics. There is definitely a mechanism in the IR to throw and recover from exceptions, but Zig might not want to expose this in the language to avoid abuse.

Update: More details here: https://llvm.org/docs/ExceptionHandling.html.

kristoff-it commented 5 years ago

I think it's fairly common practice in Go to use recover to get a last chance to log remotely the final error message in a program before crashing.

Without a way to do so, any logging instrumentation added to an executable will not be able to record everything and in some environments it can be a problem if you can't do remote logging properly. An alternative could be to use an external process to do the remote logging (so that it can also read the Zig executable's final message), but it's a nuisance, especially when deploying to a container.

If there are no bad technical implications, I would recommend leaving the option of using some kind of recover to the user also at run-time, not just for tests, even if the odd json parser library might abuse it.

andrewrk commented 5 years ago

Hi Loris, this is not documented yet (See #1517) but you can override the default panic function in the root source file. In the panic function you define what happens when an assertion fails. The default behavior is here: https://github.com/ziglang/zig/blob/56e07622c692f70eb10836b86c5fda02c53e2394/std/special/panic.zig

If you specify a panic handler in the root source file, like this:

const builtin = @import("builtin");
pub fn panic(msg: []const u8, error_return_trace: ?*builtin.StackTrace) noreturn {
    // your implementation here
}

Here you can do this logging instrumentation and then call the default panic handler, or do some other strategy.

kristoff-it commented 5 years ago

Oh I see, so, if understand correctly, it can already be done but only at top level, where recover can instead be called at any level.

Could there be a legitimate reason for a library to want to hide panics? I honestly don't know. Food for thought, at least for me.

Thank you for the clarification :)

thejoshwolfe commented 5 years ago

It would be rare for a library to handle panics or to expect panics to be handled. such a library would probably advertise itself as a library specifically for managing application state, such as a logging library that "owns" stdout, stderr, and the panic handler. in such a scenario, the library would document that any application that wishes to take advantage of the panic handling code should define a top-level panic handler that calls into the library's implementation.

The reason there's no library-local support for handling panic is that crash-related situations have app-wide impact, and managing that is the responsibility of the app.

Manuzor commented 5 years ago

What about zig DLLs loaded from C or something else? It's not uncommon in game code for example to have an executable that loads other subsystems (e.g. renderer) or the core game itself from DLLs. If the executable itself was written in C and one of those DLLs was written in zig, what would happen if a zig panic occurs?

I'm thinking of a scenario where parts of a larger existing system would be rewritten one by one because porting the whole thing might not be feasible. Or maybe source code is not available for porting.

andrewrk commented 5 years ago

If the executable itself was written in C and one of those DLLs was written in zig, what would happen if a zig panic occurs?

Zig's panic system is set up to cross object file boundaries in this way, but I don't think this use case is solved yet. The first thing that I would like to try is to have the panic handler be a function exported with weak linkage, which means that the C code could choose to override the panic function by declaring a function with the same symbol name.

Another potential thing would be to do #1439 and #2189 and then have std/special/panic.zig support something like pub panic = null; in the root source file. This would make it a linker error if the C code in this use case forgot to declare the panic handler.

I think that addresses static linking. To address DLLs, I'll have to experiment a little bit. I'm not sure how external function references work in DLLs when they are provided by the loading module.

iacore commented 2 years ago

I feel like just forking the process and let one process exit (panic) is the solution.

Another janky solution is to use overlayfs and replace the panic function with new code temporarily. I hope Zig has something like mock in Python so that replacing a struct be easier.

matu3ba commented 2 years ago

I feel like just forking the process and let one process exit (panic) is the solution.

Yes. The tricky parts are

separating panicking code paths from non-panicking ones
detect if the test binary invoked itself
make an api to optionally limit time for waiting on the child process

Case 1. is unfortunately really annoying:

var tmp1: u8 = 1;
testing.expectPanic(panickingFunctionCall(tmp1), "panic_message1");
tmp2 = 2;
testing.expectPanic(panickingFunctionCall(tmp2), "panic_message2");
tmp3 = 3;
testing.expectPanic(panickingFunctionCall(tmp3), "panic_message3");

Now the test binary calls itself as fork, but without adjustment you would be at tmp1.

One could do separate program paths, but that sounds complex:

var tmp1: u8;
switch (arg1) {
case 1: {
    testing.expectPanic(panickingFunctionCall(1), "panic_message1");
},
case 2: {
    testing.expectPanic(panickingFunctionCall(2), "panic_message2");
},
...

Another (simpler?) option is to enumerate panic calls an only execute the non-panicking code with each instantiation.

    comptime var counter = 0;
    var runtime_counter: u32;
    if(argv1 == "") {
        runtime_counter = 0; // no recursive invocation
    } else {
        runtime_counter = stringToInt(argv1);
    }
    testing.expectPanic(panickingFunctionCall(1), "panic_message1", runtime_counter, counter);
    testing.expectPanic(panickingFunctionCall(2), "panic_message2", runtime_counter, counter);
}

The implementation expectPanic would then look something like this:

pub fn (fn ..., message: u8[], runtime_counter: u32, counter: comptime_int) {
    const runtime_instance = counter; // non-comptime variable derived by comptime instance
    if (counter == runtime_counter) {
        // the runtime calls 
    }
    counter += 1;
}

Requiring panics to be tested with a simple struct + init + deinit should at least remove the if-else block. Though I am unsure, if comptime var can be modified in the function call.

The simplest solution is to have the user convention "1 panic per test block" and use test filters, but this prevents any tracking of what calls have been done to a function generating the panic cases.

iacore commented 2 years ago

Rust's solution is to spawn one process for each test by default.

After #4170 is available, the api can be expectPanic(fn () { @panic(...); }).

matu3ba commented 2 years ago

@locriacyber Yes: https://doc.rust-lang.org/rust-by-example/testing/unit_testing.html#testing-panics

As I understand it (and its not explained in the docs) Rust relies heavily on test filters. You are forced to annotate the attribute around the test block and you can not use stuff like expectPanic as function inside the test block due to my aforement mentioned problems. Or how is it possible in Rust to have a complicated function that calls code that panic or not panics? I see this as significant drawback of Rust, as you cant scale expected panics for better test coverage (if you want).
I dont understand how the looks of the function affect here the comptime scope. If Zig wants to support multiple expectPanic inside a test block for scaling, then we need to keep track which path should be called by which invocation arguments.

So we have 3 options to implement that: Either the binary 1. execs itself recursively, we 2. create many binaries/a comptime prong to distinguish called panic paths or we 3. create/provide/abuse the package manager/a test runner binary and limit ourselves artificially to 1 panic test per test scope.

Checking what an empty test leaves us with

Usage: ./test path/to/zig
thread 354316 panic: Wrong number of command line arguments
/home/misterspoon/dev/git/zig/zig/lib/std/special/test_runner.zig:20:9: 0x236a40 in std.special.processArgs (test)
        @panic("Wrong number of command line arguments");
...

I have no idea, why we "Usage" tells us to provide a path to the zig compiler.

matu3ba commented 2 years ago

So I do shared memory and wait for the thread to write it in the custom panic handler.

@SpexGuy That would be my default but @fengb has a good point about process cleanup.

@fengb There might be weird leaks

SpexGuy: Ok what if we have a std.testing.panicDefer(fn () void) and std.testing.panicDeferArg(T: type, fn (T) void, arg: T) which enqueues a defer to be run by a test which is expected to panic. If a test which is not expected to panic panics, the process exits. Then we can use either setjmp/siglongjmp or threads to handle panics.

SpecsGuy ideas: Panic crashes the whole process. We keep panicDefer in case the test needs to delete files or something like that, which would not be handled by process cleanup, but normal use doesn't need it. The test runner accepts extra arguments. These include --start-at - tells it what test to run first --subprocess - annotates the output with machine readable tags so that it can be parsed --spawn - (naming could be improved) runs as a parent process. Spawns itself with --subprocess and --start-at, reads, parses, and forwards the output. If the child process panics, it notes the test that was running when the panic occurred, and spawns a new subprocess with --start-at for the test after the one that paniced.

We could also have other nice arguments like --list - list the names and indices of all the tests --filter - specify a test filter at runtime --test - specify a single test by name for debugging

But this is also why the machine readable output is needed, the parent process need to track the current progress so it can restart at the right point. Which also means installing a custom panic handler

Mouvedia commented 2 years ago

--test - specify a single test by name for debugging

~~@matu3ba Please separate this one into its own issue.~~ I would expect that to already be possible. ~~It should be a fuzzy search. i.e. you fed it a string and it runs all the tests that match it~~

matu3ba commented 2 years ago

@Mouvedia --test-filter [text] as flag is already part of zig test. Its merely a potential renaming.

matu3ba commented 2 years ago

The implementation requires:

~~using stdout and stderr of child processes.~~
~~reusing the test runner workaround (pipes on linux, socket on windows),~~
~~anonymous pipes working on windows the docs are unclear or could be wrong => this bears better performance and is the standard solution for parent and child processes~~
~~sockets are working on linux, see lib/std/x/net/ => only AF_INET and AF_INET6 sockets are portable (worse performance)~~

TODOs:

[x] implement anonymous pipes for windows in C https://gist.github.com/matu3ba/f348ef37a67112e39611d9baa741dedc
[x] test subprocesses in Zig https://gist.github.com/matu3ba/dbcead4e15291cb57abba78994dbefba https://gist.github.com/matu3ba/0add09568327bad64e7181f9faadca40 (which use pipes)
[x] test anonymous pipes (basically read and write some stuff and not calling wait + writing to a file for the subprocess to get the pipe handle/id)
[x] ~~(for later) portable pipe abstraction => check if windows pipes can be anonymous and async with instead of using the slower and less secure named ones.~~ not possible
[x] document that "Anonymous pipes are built upon Named pipes."https://stackoverflow.com/a/51448441
[x] document "non-blocking io on Windows anonymous pipes is faked via calling the synchronous method on another thread." https://stackoverflow.com/a/40250530/9306292
[ ] user-modifiable "zig build test" logic to compare output awaiting #11214
[ ] provide a debugging flag to print information to file
[ ] poc test runner to test panics (next)

Followup TODOs:

[x] clean up test runner interface with a lib instead of hacky shell scripts
[x] provide tmpDir paths and other standard paths
[ ] parallelized execution of test processes by user input to build.zig (opt-in, opt-out?)
[ ] (if no workaround) fix stage1 to accept custom paths to test runners instead of hard-coding them: very annoying behavior https://github.com/ziglang/zig/issues/6621#issuecomment-1065756671
[ ] improve logic and flags based on something that checks, if "pipe_available" and "can_fork" and have something nice for embedded

matu3ba commented 2 years ago

~~This is waiting for #11138 and more specifically TestArgs to test child_process settings and simplify IPC setup and testing for gradual deployment.~~ waiting for upstreaming #11341

Long-term goal is to have a test runner API, which enables test runner performance optimizations. This will require refactoring out the build functions from src/test.zig to 1. create persistent binaries + files from strings, 2. automatically derive and store their paths for logically related tests to reuse, 3. define test block dependencies similar or potentially reusing build.zig with the paths from 2. see also https://github.com/ziglang/zig/issues/11080#issuecomment-1062218391

matu3ba commented 2 years ago

TODO list for poc

assumption: user ensures that test(s) (blocks) are independent from another, inside test(s) (blocks) there can be dependencies

[ ] have a way to list+enumerate all tests inclusive their respective test blocks
[ ] figure out how to create another pipe between processes for the control plane (stdin,stdout and stderr are for data only! no complex parsing shennanigans or signaling etc unless there is significant performance gain)
[ ] --verbose or --debug to let each process write with timestamp to file => figure out what the Kernel provides (multithreading works, but multiprocesses?)
[ ] custom panic handler + cleanup routine (try not to deal with stage1 having weird behavior https://github.com/ziglang/zig/issues/6621#issuecomment-1065756671)

matu3ba commented 2 years ago

// principle:
// 2 binaries: compiler and test_runner
// 1. compiler compiles main() in this file as test_runner
// 2. compiler spawns test_runner as subprocess
// 3. test_runner (control) spawns itself as subprocess (worker)
// 4. on panic or finishing, message is written to pipe for control to read
//    (worker has a custom panic handler to write panic message)
// communication via IPCs stdin, stdout, stderr, testout (exit code of each test block)

// compiler
//    | --spawns--> control
//    |                |       --spawn at testfn0-->                worker
//    |                |      <--testout: testfn_exit_status0 0--      |
//    |                |      <--testout: testfn_exit_status1 0--      |
//    |                |                 ...                           |
//    |                |      <--testout: testfn_exit_statusx msglen-- |
//    |                |      <--stdout: panic_msg--                   |
//    | <--(unexpected panic_msg) stderr: panic_msg+context(for CI)--  |
//    |                |       --spawn at testfnx+1-->                 |
//    |                |                                               |

// limitation: can test only 1 panic per test block (testfn)
// unclear: are 2 binaries needed or can 1 binary have 2 panic handlers?
// unclear: should there be a dedicated API or other checks to ensure stuff gets cleaned up?

next: fix/extend child_process.zig to allow arbitrary count of pipes.

matu3ba commented 2 years ago

Without non-fully portable pipe hacking (see #11393 for justification), this becomes complex: https://github.com/matu3ba/zig/blob/test_panics/lib/std/special/test_runner.zig and definitely slower than not spawning a subprocess + 1 tcp connection with custom parsing stuff or 2 tcp connections.

Probably I'll write udp next and fix some shennanigans in the tcp code.

Thus, this change should go into a separate test_runner.

UPDATE: tcp keep_open with 1 runner server would be necessary. Otherwise tcp connections are only reusable after a few seconds after shutdown, which destroys user experience. Having exactly 1 test runner server as very long-live process sounds very annoying and introducing unnecessary global state on a system even more so.

matu3ba commented 2 years ago

After way too much experimenting (https://github.com/matu3ba/zig/tree/test_panics3) my result is that only a declarative approach is not painful to use for creating additional pipes. Either in form of porting posix_spawn or hand-rolling the changes (function pointers etc).

Not using additional pipes would mean that user output could break the test runner.

I think there is no way around either

polluting stdin by original pipe with closing in child process user code or
force the user to provide stubs ie as function pointers to write some global variables for the file descriptor for usage (or worse mmapped regions to parse them etc). or
comptime-do 2 to minimize global state.

matu3ba commented 2 years ago

This is pending on a decision how or what to enable the user doing in the spawn functions of child_process.zig.

To keep it simple for now until we ported+adjusted posix_spawn, we should follow posix_spawn, which rules out doing below mentioned actions inside the child_process. Note, that all possible actions in the child process setup code are phrased "action".

To my knowledge there are 2 options such that the execved childprocess knows the additional open pipes:

The pipe to be used is written (in some encoded form) into stdin and parsed within the child process code (that means test runner or whatever runner we provide)
- additional parsing + naming complexity
- reliable
The pipe to be used is written into environment variables and the child process reads from those
- low complexity, because environment variables are key value pairs
- can be unreliable, if user specifies identical environment variables to parent process
- making reliable requires sanitizing environment variables

Besides having 3 possibilities (enabling either one or both), there is the question of what good defaults for encoding of stdin + parsing and naming of environment variables would be.

pending https://github.com/ziglang/zig/pull/11701 to have stdout and stderr never cluttering test output.

iacore commented 1 year ago

Here's a POC of detecting if child was killed by SIGABRT.

need -lc
assume that @panic will cause SIGABRT
Linux only
The main process will not crash.

const std = @import("std");

pub fn main() !void {
    std.log.info("test panic: {}", .{panicked(panic)});
    std.log.info("test no_panic: {}", .{panicked(no_panic)});
}

fn panic() void {
    @panic("Hello");
}

fn no_panic() void {
    std.log.info("I won't panic", .{});
}

const c = @cImport({
    @cInclude("unistd.h");
    @cInclude("sys/wait.h");
});

/// detect if a function will panic
fn panicked(comptime test_fn: fn() void) bool {
    const pid = c.fork();
    if (pid != 0) {
        // parent
        var wstatus: c_int = undefined;
        const err = c.waitpid(pid, &wstatus, c.WUNTRACED | c.WCONTINUED);
        if (err < 0) unreachable;
        // std.testing.expectEqual(w.pid, pid) catch unreachable;
        if (c.WIFSIGNALED(@intCast(c_int, wstatus))) {
            const signal = c.WTERMSIG(@intCast(c_int, wstatus));
            std.log.info("received SIGNAL {}", .{signal});
            return signal == c.SIGABRT;
        }
        return false;
    } else {
        // child
        test_fn();
        c._exit(0);
    }
}

matu3ba commented 1 year ago

Hey @locriacyber, this was 1. blocked by #13639 and with your solution you cant print error messages from the test itself, which can make it annoying to diagnose error source if the stack trace is broken for some reason.

Also, continuing on error with the next tests requires a custom message to encode which tests failed and where to continue, so simple signaling wont work (unless we are okay with this restriction).
Another problem is that a user will likely want to have their own logging with timestamps not cluttered in a more complex setup (the current approach replaces stdout or stderr of the test binary for the status report), so this requires either to encode the status messages into weirdly escaped strings and accept random failures [code for the happy path] if such sequence is used in the message, let the user do the encoding and decoding [aweful user experience] or user a dedicated pipe (socket is slow and prevents multiple test runners acting independently on the system).

Problem 1 blocked #12754, which solves use case 3 (requires custom encoding and decoding what the pipe id is in the test runner).

Problem 2 remains open and I think this requires some compiler additions to enumerate all test cases and create a jump table for going there upon cli input number (continuing in the next test case), but hopefully it works fast enough to do with comptime. I tried to played around a bit with this and filed https://github.com/ziglang/zig/issues/10920, where the idea was to comptime enumerate the test functions with inline for and an comptime generated enum array to allow filtering etc.

iacore commented 1 year ago

with your solution you cant print error messages from the test itself The child process in my solution inherit stdout & stderr from parent.

I don't think this should be a builtin, since the actual solution is a hack.

matu3ba commented 1 year ago

@locriacyber I dont understand what you mean with "I don't think this should be a builtin, since the actual solution is a hack."

I'll give you some brief reasoning on the usability:

Take as synthetic example compiler_rt addv, which is addo but panicking on overflow: https://github.com/ziglang/zig/blob/5b9d0a446af4e74c9f915a34b39de83d9b2335f9/lib/compiler_rt/addosi4_test.zig#L37 It will be annoyingly tedious to slap every overflow into its own panic test block.

So in some or the other way we want a jump table for the test binary to jump to the next test, if we want to recover from panics or we will be limited to only 1 panic possible to be tested per test block.

iacore commented 1 year ago

I think this should be a regular function, as a builtin would imply that the function is more primitive and has direct analogy in assembly. This is something I observed in past Zig's decisions.

Current implementation of panic uses unreachable or std.os.abort (-> SIGABRT). Not sure if it's possible to recover from libc abort on some platforms.

matu3ba commented 1 year ago

regular function

I was thinking on a partial comptime evaluated function, which generates per test block a switch case at comptime. The thing, which needs to be fixed is to get 1. index and 2. name of test function at comptime as @Snektron reported here https://discord.com/channels/605571803288698900/785499283368706060/1045703743087321188.

Current implementation of panic

The process will still be terminated, but a message will be send in the panic handler (not sure if signal handler or panic handler is better) on pipe to test supervision process. The supervision process then spawns the test process with cli info to jump to the next test after being spawned (unless max test size reached).

matu3ba commented 1 year ago

(sorry, already late. Will fix this up tomorrow.) I am working on this. This might take a while and will be at first be limited to 1 panic per test block, which will be required the last test item (because the compiler offers no locations to jump right after the expectPanic):

const std =@import("std");
test "test1" {
    std.testing.expectPani(@panic("testme"), "testme");
    std.testing.expectPani(@panic("first impl will not be able to test this"), "first impl will not be able to test this");
    std.testing.expectPani(@panic("neither this"), "neither this");
    std.testing.expectPani(@panic("or this"), "or this");
}
test "test2" {
    std.testing.expectPani(@panic("This is a space waste"), "This is a space waste");
}

I would be in favor of something like this, but I'd prefer first having something runnable before the big? bikeshed. Mostly, since it doesnt look to me like it would destroy comptime semantics with parallelization. However, its also making things more complex while still not enable comptime-panic testing.

const std =@import("std");
test "test3" {
    // Each test block has an implicit variable `builtin.testblk`, which is comptime usable to count stuff.
    // std.testing.expectPanic would then become
    // 
    std.testing.expectPanic(@panic("testme"), "testme");
    std.testing.expectPanic(@panic("first impl will not be able to test this"), "first impl will not be able to test this");
}

Now within lib/std/testing.zig:

pub fn expectPanic(comptime testblk: *builtin.testblk, expected: []const u8, current) error.ResultNotEqual!void {
    if (expected != current) return error.ResultNotEqual;
    testblk.count += 1;
}

and within lib/panic_tester.zig:

std.debug.print("builtin.test_functions[0].count: {d}\n", .{ builtin.test_functions[0] }); // prints 4

matu3ba commented 1 year ago

After having thought about this abit more, I dont think adding more than 1 panicking control routine is viable without "hidden control flow" by stateful 1. setup + 2. teardown method and more compiler in-builds to get the necessary jump points. Since dynamic and user-defined panic tests remain sufficiently flexible, I think its okay and I'll finish up the PR.

DoumanAsh commented 1 year ago

I know most people are not inclined to panic recovery, but is there consensus on this? I would like to know two things about panics in Zig:

Can you provide way to recover from it in order to resume operation of program in some failing state?
Can you provide way to eliminate all panics from code without resorting to aborts or UB which is not desirable?

matu3ba commented 1 year ago

I know most people are not inclined to panic recovery, but is there consensus on this?

Panics are used in Kernels in non-recoverable situations, see https://en.wikipedia.org/wiki/Kernel_panic. So the only safe thing is to crash or go into "a known safe mode". Now what the "safe mode" or state would be for Zig and how to cleanly free all resources (in all cases/in supported cases) without making the language = compiler unnecessary complex is the question, you could think about.

Can you provide way to recover from it in order to resume operation of program in some failing state?

What options do we have? We can either make 1. panic return or 2. panic longjump to a predefined "safe source location" and "do some magic behind the programmers back".

Option 1: Now we have an implicit defined error and leaking of panic semantics into calling code, at worst only in test mode. Very uncool.

Option 2: 2.1 defer and errdefer will make this hard, because they are executed at the end of current scope, so recovering would require to change defer semantics, possibly implicit, only for tests/some mode. In turn, this can change test behavior, for example by leaking memory where no leaks are in the code.

2.2 If we would want to ignore memory leaks tested in follow-up locations, but continue execution after panic check we can now either: 2.2.1 restrict panic tests to assume things are stateless (boiling down to only math on the stack, because things like file descriptors could be tested by the user) 2.2.2 fork in the test to let 1 process use the panic path and the other to continue with the non-panic one 2.2.3 restrict panics as must be run at the end of tests and silently ignore follow-up tests (checking by compiler would leak panic semantics into semanti analysis). Best I can think of without doing additional stuff in Sema is have convention to to init() defer deinit() with defer being the stuff that panics in the end and init containing the fork/process spawn.

Can you provide way to eliminate all panics from code without resorting to aborts or UB which is not desirable?

This would eliminate the very definition of reaching panics in the first place, which is to specify the detection of unrecoverable state. panics dont provide a way to specify where the safe state has been, as this could also be coded/used by the programmer in the error system.

matu3ba commented 1 year ago

Option 2.2.2 would mean having potentially unbounded process spawns and memory consumption at the same time, because the user might mess up the test code. (very bad) Option 2.2.3 is the thing I've implemented, but so far not with init(), defer deinit() but handling of panics in the child process of the test runner. (no build system protocol integration yet)

EDIT: Numbering messed up, but the gist should be understandable. If not, let me know.

DoumanAsh commented 1 year ago

I'm sorry, but I'm not familiar with internals of zig itself. I think you should consider options by first formally defining requirements.

I'm asking specifically because panic is not always acceptable behavior and for Zig to be usable generally it must provide means to recover fully from panic or have ability to eliminate abort side effect (even if you provide hook, you cannot resume program normally unless you have OS support)

First of all, Zig has issue lacking destructors which prevents ability to automatically clean resources so automatic recovery would impose requirement on destructors as otherwise you have a lot of nasty behavior due to essential destructors not being present (Seems like to be tackled by #782 with some hacky way and unclear how it would behave during stack unwinding as it is abnormal control flow)

This mean that effectively, simplest option - panic recovery - is undesirable, unless done in conjunction with defer to be called during stack unwinding

This would eliminate the very definition of reaching panics in the first place, which is to specify the detection of unrecoverable state

I should elaborate that my question was in context of code outside of your control which can be:

Zig standard library
Third party dependency

Manual control over panic is only possible within own code. Hence effective ability to detect 'bad' code that relies on panics should be available so that user could avoid unexpected abort. This way need for panic recovery could be relieved at least partially. In Rust you can achieve it somewhat with some linking hacks, but these are pretty messed up

If Zig is not going to provide proper tools to restore program state after panic, then it would be good to at least have ability to mis-compile if panic is present in code

P.S. of course there is hard way to not use standard library or third party code in general 😄 P.S.S. I'm not interested in imposing any views, but I'm interested to know final decision as it is important factor in language usability

matu3ba commented 1 year ago

it must provide means to recover fully from panic or have ability to eliminate abort side effect

From the top of my mind, I cant think of any, which is not better solved with respawning a process or copy-pasting stack memory + providing shared memory relevant data in a new process image, relevant file descriptors etc (throwing away the rest via letting the Kernel clean up the memory + state).
Without Kernel, you must handle all panics in the first place or you can not recover (unless very specialized device).

Andrew said, that he had some ideas (for pointer to stack memory, not sure if thats https://gist.github.com/PeyTy/7983dd85026cc061980a6a966bc1afc2), but the overall issue with https://github.com/ziglang/zig/issues/782 is that it is unfeasible to compute all possible paths to ensure proper cleanup happens (and its a huge compilation time cost) and then you are stuck with either affine memory ownership semantics (Rust) or linear ones + all the edge cases not handled by such system. Rust for example can introduce memory leaks.

unless done in conjunction with defer to be called during stack unwinding

That does not fix the core of the problem: Code, which panics may not have the cleanup handled in any way, because the user could assume the process crashes and the Kernel does the cleanup.

detect 'bad' code that relies on panics should be available so that user could avoid unexpected abort. If Zig is not going to provide proper tools to restore program state after panic, then it would be good to at least have ability to mis-compile if panic is present in code

That sounds very much like static analysis and review. 2. I do agree, that the zig docs could have an explanation for which use cases assertions, checks or panics should be used and to emphasize explaining the structure for use cases crystal clear in the README for potential users (per component). 3. We'll get there eventually.

DoumanAsh commented 1 year ago

Rust for example can introduce memory leaks.

As users it should be our responsibility to ensure that destructor code doesn't panic which would screw up stack unwinding even further. Another important thing is that like in C++ it might be the best have way to detect if stack unwinding is ongoing, but overall I would just recommend not to panic in stack unwinding, or actually all panics should result in aborts if there is already stack unwinding

That does not fix the core of the problem: Code, which panics may not have the cleanup handled in any way, because the user could assume the process crashes and the Kernel does the cleanup.

Do you mean that user didn't set up defer? Because if defer runs always, regardless of how function exits (normally or due to stack unwinding from panic) it would then let user ensure it runs, but it is up to user to use destructor

That sounds very much like static analysis and review.

Not necessary to be stack analysis, but even simplest way to disallow linking panic handler would be nice We do not live in ideal world, so if your zig code is entry point you might want to have as much control as possible

jamii commented 1 year ago

ability to mis-compile if panic is present in code

I was hoping that the below might do the trick, but it seems that panic is always referenced. Presumably because it's referenced directly from the compiler rather than by normal mechanisms?

const std = @import("std");

pub fn panic(msg: []const u8, error_return_trace: ?*std.builtin.StackTrace) noreturn {
    _ = msg;
    _ = error_return_trace;
    @compileError("panic");
}

pub fn main() void {}

DoumanAsh commented 1 year ago

Yes, it should not work, compiler should still be able to compile code even if it is never used as it needs to verify validity of your code

The way I did it in Rust is by introducing destructor function that links invalid item and it would be eliminated during optimization if panic never happened. This unfortunately is very bad way to do it so I don't think it is good idea to even try to use it

jamii commented 1 year ago

Huh, yeah, that works (EDIT only in release-fast - it still fails to compile in release-safe):

const std = @import("std");

extern fn extern_panic() void;

pub fn panic(msg: []const u8, error_return_trace: ?*std.builtin.StackTrace) noreturn {
    _ = msg;
    _ = error_return_trace;
    extern_panic();
    std.os.exit(0);
}

pub fn main() void {
    //@panic("oh no");
}

should still be able to compile code even if it is never used

@compileError is only an error if it is reachable. Usually a function that contains compileError but is never called is fine. But panic must be special - maybe because it's referenced directly from the compiler, or from some builtin code that is always emitted.

ziglang / zig

Proposal: Add a way for a test to expect a panic #1356