when using concurrency, a test case failure in a background go routine does not show which test case failed

plastikfan commented 2 months ago

When multiple go routines are active, it looks like if an error occurs in a go routine other than the main one, the test case failure does not identify the running test case; for example here is the output of a test failure, which does not indicate which test case failed:

•••••••••••panic: interface conversion: interface {} is nil, not int

goroutine 341 [running]:
github.com/snivilised/lorax/rx.Item[...].Num(...)
    /Users/plastikfan/dev/github/snivilised/lorax/rx/item.go:135
github.com/snivilised/lorax/rx.NumericItemLimitComparator[...](...)
    /Users/plastikfan/dev/github/snivilised/lorax/rx/limiters.go:24
github.com/snivilised/lorax/rx.(*minOperator[...]).next(...)
    /Users/plastikfan/dev/github/snivilised/lorax/rx/observable-operator.go:1292
github.com/snivilised/lorax/rx.runSequential[...].func1()
    /Users/plastikfan/dev/github/snivilised/lorax/rx/observable.go:347 +0x416
created by github.com/snivilised/lorax/rx.runSequential[...] in goroutine 339
    /Users/plastikfan/dev/github/snivilised/lorax/rx/observable.go:320 +0x17e

Ginkgo ran 1 suite in 9.900629456s

Is there a technique to create test suites and cases that can help alleviate this issue? It is difficult to easily track down which case is failing without, manually setting the focus on a single test at a time, which could be tedious when there are a lot of test suites and cases to run.

PS: I am not entirely sure that the non identification of the test case is down to the fact that the error occurred in an alternative go-routine, I am speculating with an educated guess.

onsi commented 2 months ago

hey @plastikfan - unfortunately, in Go, there is no way for one goroutine to catch a panic thrown in a different goroutine. So there's nothing Ginkgo can do to catch this panic and associate it with a test. Even worse: the panic causes the entire suite to exit immediately and a bunch of tests aren't being run. This is one of many reasons why Go nudges you towards handling issues by returning errors instead of panicking.

if you are launching a goroutine that might panic or that is making explicit assertions (e.g. calling Fail or using a Gomega matcher) then you should put defer GinkgoRecover() at the top of your goroutine. This gives Ginkgo an explicit hook to catch the panic and associate it with the current spec:

func doStuff(c chan bool) {
    defer GinkgoRecover()
    panic("welp")
    close(c)
}

It("works", func() {
    c := make(chan bool)
    go doStuff(c)
    Eventually(c).Should(BeClosed())
})

this will work correctly.

If you can instrument the goroutines in this way then that'll work. If not (e.g. they're in your production code) you'll need to switch to a different way of handling errors. If, on the other hand, they are in a 3rd party library that you've imported then you should open an issue - in general production code should not panic in a goroutine in this way as it could bring production services down.

plastikfan commented 2 months ago

Hi @onsi, thank you for your quick reply. And that is great advice, I'll give it a shot. The panic is not caused by an explicit panic, its caused by an accidental nil pointer de-reference, but I'll definitely take heed of your adice about preferring errors rather than raising a panic. The only exception I have to this rule, particluarly when creating a library, is when a particular issue is known to be a programming error/misuse, in which case, I like to fail loudly with an informative error message. Is that frowned upon too?

onsi commented 2 months ago

Is that frowned upon too?

I imagine opinions will vary. And I get what you’re saying. There are times when you want to fail fast. regexp.MustCompile() is an example where the standard library will panic and blow up. The intent is to give the developer a fast feedback loop when they are building a (hard coded) regular expression. Other examples are panicking when starting a service up if it fails to (for example) read its configuration, or open a port - in such cases the service is about to exit anyway so might as well panic.

Throwing a panic in a library is doable if you have a similar “you’re using it wrong” use case. But doing this in a goroutine makes it (a) hard for you to test the edge-case (not just in the context of Ginkgo, any go-based test suite will struggle since you can’t intercept the panic in another goroutine) and (b) hard for the user to catch and handle the panic (e.g. “I know I’m using this wrong in this case but rather than fix it I’m happy to just catch the error and try something else” <= this becomes impossible if the panic is in a bit of code that’s typically running in a separate goroutine from the original caller/consumer.).

onsi / ginkgo

when using concurrency, a test case failure in a background go routine does not show which test case failed #1392