Closed tstromberg closed 2 years ago
Capturing a panic we saw in production using OTEL to record trace spans:
2021-12-13T20:25:49.691Z INFO grpc-server/grpc_server.go:197 {"service": "github.com/tinkerbell/hegel", "op": "subscribe", "ip": "#.#.#.#", "client": "#.#.#.#:57404"} panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x47e82d] goroutine 219 [running]: go.opentelemetry.io/otel/sdk/trace.(*span).End(0xc000305200, 0x0, 0x0, 0x0) /go/pkg/mod/go.opentelemetry.io/otel/sdk@v1.0.0-!r!c2/trace/span.go:274 +0x319 panic(0xb99f20, 0x11b8ef0) /usr/local/go/src/runtime/panic.go:969 +0x1b9 sync.(*RWMutex).Lock(0x0) /usr/local/go/src/sync/rwmutex.go:98 +0x2d github.com/tinkerbell/hegel/grpc-server.(*Server).Subscribe(0xc0001b0f00, 0xc0006dc000, 0xd7b8c0, 0xc000313ec0, 0x0, 0x0) /usr/myapp/grpc-server/grpc_server.go:225 +0x885 github.com/tinkerbell/hegel/grpc/protos/hegel._Hegel_Subscribe_Handler(0xc07220, 0xc0001b0f00, 0xd792e0, 0xc000255fb0, 0x15, 0xc000395f60) /usr/myapp/grpc/protos/hegel/hegel.pb.go:311 +0x10b go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.StreamServerInterceptor.func1(0xc07220, 0xc0001b0f00, 0xd79220, 0xc000395e40, 0xc000395c60, 0xcb5c70, 0x0, 0x0) /go/pkg/mod/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc@v0.22.0/interceptor.go:414 +0x6ce github.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1(0xc07220, 0xc0001b0f00, 0xd79220, 0xc000395e40, 0x16, 0xc000314dc0) /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.2.0/chain.go:49 +0x5f github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).StreamServerInterceptor.func1(0xc07220, 0xc0001b0f00, 0xd79160, 0xc000395d00, 0xc000395c60, 0xc000395c80, 0x0, 0x0) /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/server_metrics.go:121 +0xef github.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1(0xc07220, 0xc0001b0f00, 0xd79160, 0xc000395d00, 0x16, 0xc06608f3693642b8) /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.2.0/chain.go:49 +0x5f github.com/grpc-ecosystem/go-grpc-middleware/logging/zap.StreamServerInterceptor.func1(0xc07220, 0xc0001b0f00, 0xd79160, 0xc000395d00, 0xc000395c60, 0xc000395ca0, 0x4, 0x4) /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.2.0/logging/zap/server_interceptors.go:58 +0x164 github.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1(0xc07220, 0xc0001b0f00, 0xd79160, 0xc000395d00, 0xc0000108a8, 0xc00019b620) /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.2.0/chain.go:49 +0x5f github.com/tinkerbell/hegel/xff.GRPCMiddlewares.func3(0xc07220, 0xc0001b0f00, 0xd79460, 0xc0003aa780, 0xc000395c60, 0xc000395cc0, 0xc000395ce0, 0xc000395c60) /usr/myapp/xff/xff.go:150 +0x110 github.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1(0xc07220, 0xc0001b0f00, 0xd79460, 0xc0003aa780, 0xc0005dcc18, 0x410138) /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.2.0/chain.go:49 +0x5f github.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1(0xc07220, 0xc0001b0f00, 0xd79460, 0xc0003aa780, 0xc000395c60, 0xcb5c70, 0xd756a0, 0xc000255950) /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.2.0/chain.go:58 +0xd1 google.golang.org/grpc.(*Server).processStreamingRPC(0xc0002ad500, 0xd7cdc0, 0xc000304900, 0xc000643560, 0xc0006dc090, 0x11bbec0, 0x0, 0x0, 0x0) /go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1548 +0x6b8 google.golang.org/grpc.(*Server).handleStream(0xc0002ad500, 0xd7cdc0, 0xc000304900, 0xc000643560, 0x0) /go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1621 +0xc9c google.golang.org/grpc.(*Server).serveStreams.func1.2(0xc00028aa90, 0xc0002ad500, 0xd7cdc0, 0xc000304900, 0xc000643560) /go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:940 +0xa5 created by google.golang.org/grpc.(*Server).serveStreams.func1 /go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:938 +0x1fd
Allegedly fixed by #71 - will need to try it again to confirm.
I have a test and a fix. PR incoming.
Capturing a panic we saw in production using OTEL to record trace spans: