grpc-ecosystem / go-grpc-middleware

Golang gRPC Middlewares: interceptor chaining, auth, logging, retries and more.
Apache License 2.0
6.25k stars 690 forks source link

Errors which occur before Method Handlers are invoked cannot be intercepted by middlewares #245

Open dackroyd opened 4 years ago

dackroyd commented 4 years ago

We encountered an issue recently where a proto definition mismatch (breaking change) resulted in the gRPC server failing to decode the request payload. This happens before any method handler can be invoked, which also means that the middleware cannot be invoked.

As a result, the server produced no metrics, traces or logs for the failure, making diagnosing the issue more difficult, as the requests didn't appear to be making it out of the client (based on what information we could observe without those things). In reality, the server was returning a codes.Internal error without any of the middleware being able to observe that.

OpenCensus implement their observability handling via a StatsHandler instead, which does get invoked even in cases where the server is unable to decode the request (and other issues). While it probably doesn't make sense for all server side interception capabilities to be implemented this way (e.g auth, rate limiting, validation), for observability (metrics, tracing, logging) this needs to be able to record these failures as well.

Given this library is go-grpc-middlewares, that may not be deemed in scope of the library, but I'm looking for confirmation about whether this would be the case or not. Based on that, whether I need to look at alternative implementations which do use StatsHandler (or roll my own), or if contribution of StatsHandler approaches for these would be accepted (and how that support should be integrated)

johanbrandhorst commented 4 years ago

I don't think we have any intention to implement stats handlers, but we'd be happy to accept PRs which implement this functionality. In this case, is there a reason you're not just using the OpenCensus observability handlers though?

dackroyd commented 4 years ago

Fair point, I can't say I'd really considered it prior to now, as we've been using these middlewares and our monitoring is built on OpenTracing, Prometheus and Zap.

With OpenTracing and OpenCensus merging into OpenTelemetry (and intending on handling all 3 pillars), it's likely we'll need to head in that same direction, but the project is still in Alpha, so expecting a bit of flux with the apis/implementation there.

johanbrandhorst commented 4 years ago

I agree with you, the best we can do now is probably just wait. You can use OpenCensus if you need this today, or OpenTelemetry if you can wait, but it would mostly be a duplicate effort in this repo. Still happy to accept contributions of course.