Test runner tends to swallow latest logs before exiting

laytan commented 3 days ago

    Odin:    dev-2024-06:9d5bf063e
    OS:      macOS Sonoma 14.5 (build: 23F79, kernel: 23.5.0)
    CPU:     Apple M1
    RAM:     8192 MiB
    Backend: LLVM 18.1.4

The new test runner tends to swallow latest logs/messages before exiting, for example a panic that stops the test might not actually show up in the output at all.

For example with this test:

@test
test_panic :: proc(t: ^testing.T) {
    log.panicf("HMMMM")
}

Just running it a couple times and I get different results each time:

$ odin test tests -define:ODIN_TEST_NAMES=test_panic -define:ODIN_TEST_LOG_LEVEL=warning
[FATAL] --- [2024-06-28 21:51:43] [client.odin:214:test_panic()] HMMMM
[FATAL] --- [2024-06-28 21:51:43] [client.odin:214:test_panic()] panic: log.panicf

$ odin test tests -define:ODIN_TEST_NAMES=test_panic -define:ODIN_TEST_LOG_LEVEL=warning
[FATAL] --- [2024-06-28 21:51:45] [client.odin:214:test_panic()] HMMMM
[FATAL] --- [2024-06-28 21:51:45] [client.odin:214:test_panic()] panic: log.panicf
tests_client  [|                       ]    0/   1 :: test_panic
1 thread                                    0/   1 :: total

$ odin test tests -define:ODIN_TEST_NAMES=test_panic -define:ODIN_TEST_LOG_LEVEL=warning
[FATAL] --- [2024-06-28 21:51:47] [client.odin:214:test_panic()] HMMMM

I first thought this might be because it panics as soon as it starts, but even putting a time.sleep() before the panic shows some of the same issues.

laytan commented 3 days ago

I am pretty sure this is because the logs are all buffered for a single thread to output (iirc the implementation), and before these buffers are written the panic has exited the program.

I noticed that the panics aren't actually caught by the signal handler, which if it did, would probably fix this?

laytan commented 3 days ago

panic raises a SIGTRAP (5) on my machine, adding this to the signal handler and running the test makes this all show, but it never exits.

$ odin test tests -define:ODIN_TEST_NAMES=test_panic -define:ODIN_TEST_LOG_LEVEL=warning
[FATAL] --- [2024-06-28 22:05:39] [client.odin:215:test_panic()] HMMMM
[FATAL] --- [2024-06-28 22:05:39] [client.odin:215:test_panic()] panic: log.panicf
tests_client  [|                       ]    0/   1 :: test_panic
1 thread                                    0/   1 :: total

Double BUT, on my actual test case (not this small reproduction) it still doesn't actually show the panic before stopping and never exiting.

I noticed this is actually a general problem with the signal handler, if I cause a segfault it also just stops the test and never actually exits the runner.

Kelimion commented 3 days ago

panic raises a SIGKILL (5) on my machine, adding this to the signal handler and running the test makes this all show, but it never exits.
$ odin test tests -define:ODIN_TEST_NAMES=test_panic -define:ODIN_TEST_LOG_LEVEL=warning
[FATAL] --- [2024-06-28 22:05:39] [client.odin:215:test_panic()] HMMMM
[FATAL] --- [2024-06-28 22:05:39] [client.odin:215:test_panic()] panic: log.panicf
tests_client  [|                       ]    0/   1 :: test_panic
1 thread                                    0/   1 :: total
Double BUT, on my actual test case (not this small reproduction) it still doesn't actually show the panic before stopping and never exiting

Couldn't the test runner just override the assert proc in the context it passes to the tests so that the panic doesn't actually panic? That would also allow testing.assert_did_panic(t) to test if the code above it properly raised a panic, which is not something we can currently do.

laytan commented 3 days ago

panic raises a SIGKILL (5) on my machine, adding this to the signal handler and running the test makes this all show, but it never exits.
$ odin test tests -define:ODIN_TEST_NAMES=test_panic -define:ODIN_TEST_LOG_LEVEL=warning
[FATAL] --- [2024-06-28 22:05:39] [client.odin:215:test_panic()] HMMMM
[FATAL] --- [2024-06-28 22:05:39] [client.odin:215:test_panic()] panic: log.panicf
tests_client  [|                       ]    0/   1 :: test_panic
1 thread                                    0/   1 :: total
Double BUT, on my actual test case (not this small reproduction) it still doesn't actually show the panic before stopping and never exiting
Couldn't the test runner just override the assert proc in the context it passes to the tests so that the panic doesn't actually panic? That would also allow testing.assert_did_panic(t) to test if the code above it properly raised a panic, which is not something we can currently do.

The assertion handler proc is divergent so you can't (easily) do that. You could use libc.setjmp etc. to hack it though (make the handler proc jump out to some other handler). But I am unsure if that is a good idea.

laytan commented 3 days ago

I think the current way with the signal handler is working, but there is something up with the actual handler that's been made that causes it to not actually stop the runner. Of course a testing.assert_did_panic would be nice though, I agree.

Kelimion commented 3 days ago

panic raises a SIGKILL (5) on my machine, adding this to the signal handler and running the test makes this all show, but it never exits.
$ odin test tests -define:ODIN_TEST_NAMES=test_panic -define:ODIN_TEST_LOG_LEVEL=warning
[FATAL] --- [2024-06-28 22:05:39] [client.odin:215:test_panic()] HMMMM
[FATAL] --- [2024-06-28 22:05:39] [client.odin:215:test_panic()] panic: log.panicf
tests_client  [|                       ]    0/   1 :: test_panic
1 thread                                    0/   1 :: total
Double BUT, on my actual test case (not this small reproduction) it still doesn't actually show the panic before stopping and never exiting
Couldn't the test runner just override the assert proc in the context it passes to the tests so that the panic doesn't actually panic? That would also allow testing.assert_did_panic(t) to test if the code above it properly raised a panic, which is not something we can currently do.
The assertion handler proc is divergent so you can't (easily) do that. You could use libc.setjmp etc. to hack it though (make the handler proc jump out to some other handler). But I am unsure if that is a good idea.

Right.... B*lls. Of course we could also invert it and say t.expect_panic = true so that when a panic is raised, the test runner knows whether that's a failure, or the absence of it should be considered one.

laytan commented 3 days ago

panic raises a SIGKILL (5) on my machine, adding this to the signal handler and running the test makes this all show, but it never exits.
$ odin test tests -define:ODIN_TEST_NAMES=test_panic -define:ODIN_TEST_LOG_LEVEL=warning
[FATAL] --- [2024-06-28 22:05:39] [client.odin:215:test_panic()] HMMMM
[FATAL] --- [2024-06-28 22:05:39] [client.odin:215:test_panic()] panic: log.panicf
tests_client  [|                       ]    0/   1 :: test_panic
1 thread                                    0/   1 :: total
Double BUT, on my actual test case (not this small reproduction) it still doesn't actually show the panic before stopping and never exiting
Couldn't the test runner just override the assert proc in the context it passes to the tests so that the panic doesn't actually panic? That would also allow testing.assert_did_panic(t) to test if the code above it properly raised a panic, which is not something we can currently do.
The assertion handler proc is divergent so you can't (easily) do that. You could use libc.setjmp etc. to hack it though (make the handler proc jump out to some other handler). But I am unsure if that is a good idea.
Right.... B*lls. Of course we could also invert it and say t.expect_panic = true so that when a panic is raised, the test runner knows whether that's a failure, or the absence of it should be considered one.

Yeah that's a good thing to have of some kind in the future, for now I would at least like my panics to show up 😆

Feoramund commented 3 days ago

panic raises a SIGKILL (5) on my machine, adding this to the signal handler and running the test makes this all show, but it never exits.

OS: macOS Sonoma 14.5 (build: 23F79, kernel: 23.5.0)

https://github.com/odin-lang/Odin/blob/06652bebce111640930a513c825e59f7b37bbd4f/core/thread/thread_unix.odin#L171-L176

pthread_cancel is not enabled on Darwin, therefore the test runner would sit there, spinning its wheels forever.

Of course we could also invert it and say t.expect_panic = true so that when a panic is raised, the test runner knows whether that's a failure, or the absence of it should be considered one.

I did consider having a "test should fail" sort of setting at one point, but by that time, I was too deeply developed into the test runner's current design to try to hack it on as a new feature; it would've needed some redesign to make it work. Same with making memory leaks show up as errors. It's still possible, but things will need to be moved around, if I remember correctly.

EDIT: I did not consider having "failing tests are good" as a feature that many people would want, which is why I didn't add it initially. I also assumed the thread rewrite would address the Darwin situation.

laytan commented 3 days ago

pthread_cancel is not enabled on Darwin, therefore the test runner would sit there, spinning its wheels forever.

Aha, I wonder why though, even if it is unreliable, at least trying to do it is better than just sitting there.

I think we should remove that when statement and also listen for signal SIGTRAP (5) in the signal handler, doing both those things locally makes it all work on my end.

It does mean that any signal that we don't explicitly catch will cause this behaviour with it swallowing the logs/output, but that is probably very uncommon when we have a signal handler for most signals.

Kelimion commented 3 days ago

pthread_cancel is not enabled on Darwin, therefore the test runner would sit there, spinning its wheels forever.

Aha, I wonder why though, even if it is unreliable, at least trying to do it is better than just sitting there.

I think we should remove that when statement and also listen for signal 5 in the signal handler, doing both those things locally makes it all work on my end.

It does mean that any signal that we don't explicitly catch will cause this behaviour with it swallowing the logs/output, but that is probably very uncommon when we have a signal handler for most signals.

And once I've absorbed graphite's thread redesign into core and he's taken another pass at sync, we'll also be in a better place.

Feoramund commented 3 days ago

pthread_cancel is not enabled on Darwin, therefore the test runner would sit there, spinning its wheels forever.

Aha, I wonder why though, even if it is unreliable, at least trying to do it is better than just sitting there.

I came upon a StackOverflow discussion about thread cancellation when I was first writing the signal handler, and from what I recall, Darwin only truly cancels a thread when it explicitly checks pthread_cancel, which is precisely the behavior I witnessed with FreeBSD, from which I believe Darwin is at least partly based on. This may explain the "unreliable" behavior.

Contrast this with Linux, which, from the discussion, is stated to use a hijacked signal handler to force a thread to immediately cancel/terminate without checking one of the cancelation points.

I think we should remove that when statement and also listen for signal 5 in the signal handler, doing both those things locally makes it all work on my end.

I can ready a patch for this in a minute.

Kelimion commented 3 days ago

Darwin only truly cancels a thread when it explicitly checks pthread_cancel, which is precisely the behavior I witnessed with FreeBSD, from which I believe Darwin is at least partly based on. This may explain the "unreliable" behavior.

I ran into this with core:net and it's why we don't test it on Darwin, because that test requires knowing it returns the correct thing if two threads are racing to acquire a connection, and you're the one who won / lost, and the thread that lost that race needs to be told to get lost, which GCD just doesn't care about despite setting cancellation ahead of time.

odin-lang / Odin

Test runner tends to swallow latest logs before exiting #3830