grafana / k6

A modern load testing tool, using Go and JavaScript - https://k6.io
GNU Affero General Public License v3.0
23.92k stars 1.2k forks source link

client.asyncInvoke leads to: panic: send on closed channel #3747

Open gm42 opened 1 month ago

gm42 commented 1 month ago

Brief summary

While using client.asyncInvoke I managed to trigger this panic:

panic: send on closed channel
Run             [======================================] setup()
goroutine 105931 [running]:----------------------------]
go.k6.io/k6/metrics.PushIfNotDone(...)-----------------]
    go.k6.io/k6/metrics/sample.go:135
go.k6.io/k6/lib/netext/grpcext.statsHandler.HandleRPC({0xc0027b2008?}, {0x1bb81d0, 0xc030f8d0b0}, {0x1bb3018, 0xc012619d60})
    go.k6.io/k6/lib/netext/grpcext/conn.go:254 +0x350
google.golang.org/grpc.(*csAttempt).finish(0xc0197f3d40, {0x1baa740, 0xc03cc89af8})
    google.golang.org/grpc@v1.63.2/stream.go:1172 +0x47c
google.golang.org/grpc.(*clientStream).retryLocked(0xc029cc2240, 0x0?, {0x1baa740?, 0xc03cc89af8?})
    google.golang.org/grpc@v1.63.2/stream.go:721 +0x5c
google.golang.org/grpc.(*clientStream).withRetry(0xc029cc2240, 0xc013fefb98, 0xc013fefbe0)
    google.golang.org/grpc@v1.63.2/stream.go:791 +0x28c
google.golang.org/grpc.(*clientStream).RecvMsg(0xc029cc2240, {0x1769800?, 0xc019c4ce00?})
    google.golang.org/grpc@v1.63.2/stream.go:927 +0x110
google.golang.org/grpc.invoke({0x1bb81d0?, 0xc030f8c840?}, {0xc0193bb4e0?, 0x20?}, {0x1769800, 0xc019c4cd80}, {0x1769800, 0xc019c4ce00}, 0xc0222eba00?, {0xc0222eba00, ...})
    google.golang.org/grpc@v1.63.2/call.go:73 +0xcb
google.golang.org/grpc.(*ClientConn).Invoke(0xc00060cc08, {0x1bb81d0?, 0xc030f8c840?}, {0xc0193bb4e0?, 0x131?}, {0x1769800?, 0xc019c4cd80?}, {0x1769800?, 0xc019c4ce00?}, {0xc0222eba00, ...})
    google.golang.org/grpc@v1.63.2/call.go:37 +0x23f
go.k6.io/k6/lib/netext/grpcext.(*Conn).Invoke(0xc000a5ce20, {0x1bb8208?, 0xc0005e54a0?}, {{0xc0193bb4e0, 0x1d}, {0x1bc7bf0, 0xc00293a588}, 0x1bf08eb000, 0xc024084448, {0xc019e4ec00, ...}, ...}, ...)
    go.k6.io/k6/lib/netext/grpcext/conn.go:141 +0x696
go.k6.io/k6/js/modules/k6/grpc.(*Client).AsyncInvoke.func1()
    go.k6.io/k6/js/modules/k6/grpc/client.go:306 +0xae
created by go.k6.io/k6/js/modules/k6/grpc.(*Client).AsyncInvoke in goroutine 1
    go.k6.io/k6/js/modules/k6/grpc/client.go:305 +0x27c

k6 version

0.51.0

OS

Linux

Docker version and image (if applicable)

Docker image tag 0.51.0

Steps to reproduce the problem

To reproduce this problem you need the gRPC endpoint to crash (OOM), which then causes a code 14 gRPC status:

error reading from server: EOF

Afterwards the panic may happen, but it's not 100% reproducible. Perhaps a test with -race will find the bug?

Expected behaviour

No panics when using client.asyncInvoke

Actual behaviour

A panic happened

joanlopez commented 1 month ago

Hey @gm42,

How often have you been able to reproduce it? I tried multiple times, simulating an OOM scenario in the server side, with no luck.

That said, have you tried to run client.close as part of the teardown function and see if it sill panics? If you're still able to reproduce it, I'd suggest to try that, as arguably one could say that the source of the panic is due to the lack of cleanup, although we could explore a mechanism to prevent that from happening at all, of course.

Thanks! 🙇🏻

joanlopez commented 1 month ago

Any news @gm42? I spent some more time trying to reproduce this by reproducing a few more corner cases, as well as intentionally faking the 14 UNAVAILABLE gRPC status code, with no luck.

Would you be able to share how the script looks like, by striping sensitive details, if any, please?

joanlopez commented 1 month ago

Considering that:

I guessed: aha, it looks like this is probably "reproducible" with Go code that escapes the "main thread" without actually using promises/callbacks adequately (think of any goroutine that calls metrics.PushIfNotDone, with a reference to the channel and a non-cancelled context; which I may happen by mistake, and would probably be easy to force intentionally? 🤔

Thus, I've been taking a look at code in google.golang.org/grpc, referred from that stack trace, looking for any code that could produce this situation, and I haven't spot any so far.

However, I still think this is "doable" (intentionally for instance). So, do you think we should try to prevent that from k6 surface? Or it is just fine to assume that could happen, and treat each case as a bug fix from the extension code (gRPC in this case)?

In other words, if we assume this is caused by a "bug" in the gRPC code; do you think we should prevent that from happening at all in k6 anyway? Or just consider that this should be fixed in gRPC? cc/ @olegbespalov @mstoykov 🙇🏻