signalfx / signalfx-go

Go client library and instrumentation bindings for SignalFx
https://www.signalfx.com
Apache License 2.0
14 stars 48 forks source link

Close goroutines when client context is cancelled #104

Closed kerbyhughes closed 4 years ago

kerbyhughes commented 4 years ago

Changes:

While using the signalflow client we noticed a resource leak in our application.

Inspection with pprof.Lookup("goroutine") stack traces indicated that goroutines spawned by the Computation and wsConn functions were running indefinitely even though the client context was being cancelled with func (c *Client) Close(). This also prevented the associated data structures from being garbage collected.

Sample stack traces:

goroutine 34 [chan receive, 6 minutes]:
github.com/signalfx/signalfx-go/signalflow.(*Computation).bufferDataMessages(0xc0001e71e0)
    /go/pkg/mod/github.com/signalfx/signalfx-go@v1.7.7-0.20200708132231-05c3bdcd635b/signalflow/computation.go:311 +0x99
created by github.com/signalfx/signalfx-go/signalflow.newComputation
    /go/pkg/mod/github.com/signalfx/signalfx-go@v1.7.7-0.20200708132231-05c3bdcd635b/signalflow/computation.go:81 +0x1a8
goroutine 107110 [chan receive, 6 minutes]:
github.com/signalfx/signalfx-go/signalflow.(*Computation).bufferExpirationMessages(0xc000234dd0)
    /go/pkg/mod/github.com/signalfx/signalfx-go@v1.7.7-0.20200708132231-05c3bdcd635b/signalflow/computation.go:335 +0x9c
created by github.com/signalfx/signalfx-go/signalflow.newComputation
    /go/pkg/mod/github.com/signalfx/signalfx-go@v1.7.7-0.20200708132231-05c3bdcd635b/signalflow/computation.go:82 +0x1ca
goroutine 58735 [chan send, 8 minutes]:
github.com/signalfx/signalfx-go/signalflow.(*wsConn).readNextMessage(0xc0000e8310, 0xc0008109a0)
    /go/src/github.com/acquia/tracker/vendor/github.com/signalfx/signalfx-go/signalflow/conn.go:162 +0x24a
created by github.com/signalfx/signalfx-go/signalflow.(*wsConn).Run
    /go/src/github.com/acquia/tracker/vendor/github.com/signalfx/signalfx-go/signalflow/conn.go:110 +0x56d

In all three cases, the context Done() channel was not being checked in some branches where the goroutine was blocked on sending or receiving from another channel.

cory-signalfx commented 4 years ago

cc @keitwb for a look!

keitwb commented 4 years ago

@cory-signalfx I just enabled CircleCI to run for forked PRs -- it was off before which is why the tests didn't run. I ran them locally though and everything seems good to go.

cory-signalfx commented 4 years ago

Oh, thank you!