honeycombio / libhoney-go

Go library for sending data to Honeycomb
Apache License 2.0
58 stars 40 forks source link

bug: nil pointer dereference with Go 1.17.5 #154

Closed jsha closed 2 years ago

jsha commented 2 years ago

In letsencrypt/boulder, we deployed a build using Go 1.17.5 (upgrading from 1.17). Immediately we started getting panics from our web frontends like this:

2021-12-16T22:57:27.971423+00:00 efn-vm-wfe02 den 3 boulder-wfe2[960437]: panic: runtime error: invalid memory address or nil pointer dereference
2021-12-16T22:57:27.971487+00:00 efn-vm-wfe02 den 3 boulder-wfe2[960437]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x4eebdd]
2021-12-16T22:57:27.971521+00:00 efn-vm-wfe02 den 3 boulder-wfe2[960437]: goroutine 4596858 [running]:
2021-12-16T22:57:27.971642+00:00 efn-vm-wfe02 den 3 boulder-wfe2[960437]: bytes.(*Reader).Read(0x80000, {0xc0055c4000, 0xc0055c4000, 0x0})
2021-12-16T22:57:27.971673+00:00 efn-vm-wfe02 den 3 boulder-wfe2[960437]: #011/opt/boulder-build/golang/1.17.5/go/src/bytes/reader.go:41 +0x1d
2021-12-16T22:57:27.971700+00:00 efn-vm-wfe02 den 3 boulder-wfe2[960437]: net/http.(*readTrackingBody).Read(0x19b1160, {0xc0055c4000, 0xc000be7c68, 0xc002916000})
2021-12-16T22:57:27.971726+00:00 efn-vm-wfe02 den 3 boulder-wfe2[960437]: #011/opt/boulder-build/golang/1.17.5/go/src/net/http/transport.go:634 +0x2a
2021-12-16T22:57:27.971746+00:00 efn-vm-wfe02 den 3 boulder-wfe2[960437]: net/http.(*http2clientStream).writeRequestBody(0xc00024db00, 0xc00288fc00)
2021-12-16T22:57:27.971770+00:00 efn-vm-wfe02 den 3 boulder-wfe2[960437]: #011/opt/boulder-build/golang/1.17.5/go/src/net/http/h2_bundle.go:8104 +0x419
2021-12-16T22:57:27.971793+00:00 efn-vm-wfe02 den 3 boulder-wfe2[960437]: net/http.(*http2clientStream).writeRequest(0xc00024db00, 0xc00288fc00)
2021-12-16T22:57:27.971812+00:00 efn-vm-wfe02 den 3 boulder-wfe2[960437]: #011/opt/boulder-build/golang/1.17.5/go/src/net/http/h2_bundle.go:7841 +0x785
2021-12-16T22:57:27.971830+00:00 efn-vm-wfe02 den 3 boulder-wfe2[960437]: net/http.(*http2clientStream).doRequest(0x6df966, 0xc0040807e0)
2021-12-16T22:57:27.971848+00:00 efn-vm-wfe02 den 3 boulder-wfe2[960437]: #011/opt/boulder-build/golang/1.17.5/go/src/net/http/h2_bundle.go:7727 +0x1e
2021-12-16T22:57:27.971875+00:00 efn-vm-wfe02 den 3 boulder-wfe2[960437]: created by net/http.(*http2ClientConn).RoundTrip
2021-12-16T22:57:27.971894+00:00 efn-vm-wfe02 den 3 boulder-wfe2[960437]: #011/opt/boulder-build/golang/1.17.5/go/src/net/http/h2_bundle.go:7656 +0x30f

libhoney is the only component in our frontends that makes outbound requests with net/http (we also use gRPC, but it has an internal h2 library).

When we downgraded to Go 1.17, the panics went away. There are a number of http related changes between Go 1.17 and Go 1.17.5:

https://github.com/golang/go/commit/01779135c09ac4c2ba0d0bc7b3ae0889ad9ad989 https://github.com/golang/go/commit/ab175939e28df62fee09c33c7d9e008bff6b684c https://github.com/golang/go/commit/f9cb33c7c9d48fd0c38d17a659d5e968de648d40

Looking at the stack trace, The most interesting bit is bytes.(*Reader).Read. The net/http stack is calling Read() on a *bytes.Reader that is nil.

In libhoney, traces are accumulated and then compressed with zstd. For pooling purposes, libhoney wraps a buffer in a pooledReader, which embeds a *bytes.Reader. When the pooledReader gets a Close, it sets the embedded *bytes.Reader to nil. So we know that any call to pooledReader.Read after pooledReader.Close will panic. It's not clear from the interface documentation of Reader and Closer whether such behavior is allowed.

However, net/http's RoundTripper says:

// RoundTrip should not modify the request, except for
// consuming and closing the Request's Body. RoundTrip may
// read fields of the request in a separate goroutine. Callers
// should not mutate or reuse the request until the Response's
// Body has been closed.
//
// RoundTrip must always close the body, including on errors,
// but depending on the implementation may do so in a separate
// goroutine even after RoundTrip returns. This means that
// callers wanting to reuse the body for subsequent requests
// must arrange to wait for the Close call before doing so.

Note that this says the request must not be reused until the Response's Body has been closed. I think the current implementation of pooledReader is violating that. I haven't been able to come up with a minimal repro of a case where net/http calls Read after Close, but the combination of things here makes me suspect an issue in libhoney-go.

One possible fix would be to provide a .rePool() function on pooledReader, and call it explicitly after httpClient.Do(req) returns.

robbkidd commented 2 years ago

@jsha, thank you for the issue and details! We saw that SIGSEGV in some of our internal Go services when we bumped to 1.17.5. This is on the queue to investigate further in libhoney.

lizthegrey commented 2 years ago

I have a fix, I'll put a PR up.

lizthegrey commented 2 years ago

PR is up.

MikeGoldsmith commented 2 years ago

Closed by #156

lizthegrey commented 2 years ago

We're going to smoketest this within Honeycomb a few days; if you need the fix sooner, please run:

go mod edit -replace github.com/honeycombio/libhoney-go=github.com/honeycombio/libhoney-go@main
go mod tidy

Thanks for the report!