grpc-ecosystem / grpc-spring

Spring Boot starter module for gRPC framework.
https://grpc-ecosystem.github.io/grpc-spring/
Apache License 2.0
3.52k stars 826 forks source link

Server metrics are tagged with the wrong status codes #789

Open harti2006 opened 1 year ago

harti2006 commented 1 year ago

The context

I'd like to get the metric values for grpc.server.processing.duration and distinguish different status codes, e.g. OK, NOT_FOUND, INVALID_ARGUMENT, UNKNOWN

The bug

If my gRPC service is called three times, and it responds twice with an OK status and once with with an NOT_FOUND status, I'd expect the tags that are captured for the metrics grpc.server.processing.duration to be "OK" and "NOT_FOUND". But instead, the "NOT_FOUND" error seems to be reported as an "UNKNOWN" error.

Stacktrace and logs

I don't know

Steps to Reproduce

I created a very simple example project, with one gRPC service and one GrpcAdvice that translates my "business" exception into a NOT_FOUND status.

Please check out the integration test, that shows how to reproduce the issue: https://github.com/harti2006/grpc-metrics-demo/blob/trunk/src/test/java/com/github/harti2006/grpc/metrics/GrpcMetricsDemoApplicationTests.java

The application's environment

Which versions do you use?

Additional context

ST-DDT commented 1 year ago

Which version of micrometer are you using? Because we removed our implementation if favor of the official/upstream one.

harti2006 commented 1 year ago

Hey, I'm using the version of micrometer, that comes with Spring Boot 2.7.5. It must be related to the recent changes, that you mentioned, because it works as expected in the previous version of your library. I could really use a little help with debugging this, because it could either be a bug in the upstream lib, or maybe the way how the GrpcAdvice integrates and translates errors to status codes. But this is currently just a guess.

ST-DDT @.***> schrieb am Do., 24. Nov. 2022, 19:27:

Which version of micrometer are you using? Because we removed our implementation if favor of the official/upstream one.

— Reply to this email directly, view it on GitHub https://github.com/yidongnan/grpc-spring-boot-starter/issues/789#issuecomment-1326759656, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABU3NBDEAIVR4BQ4Z72OTBDWJ6XQZANCNFSM6AAAAAASKRDJRU . You are receiving this because you authored the thread.Message ID: @.***>

ST-DDT commented 1 year ago

Have a look here: I assume these should be the locations you are looking for:

harti2006 commented 1 year ago

Yeah, I found these places while debugging the problem. And I saw that funny behavior: The responseCode is only updated in the close method, but that method was only called when the grpc handler method completed successfully. On errors, the close method wasn't called at all, so the default value "UNKNOWN" was reported. Tomorrow, I will be able to add another case to my example to verify if this happens only when using the GrpcAdvice approach of sending errors, or if it's a general problem.

ST-DDT commented 1 year ago

I don't have much time to look into this right now, I will try to find some this weekend.

Could you try to change the order of the interceptors:

https://github.com/yidongnan/grpc-spring-boot-starter/blob/3f8f27506542a954a05be71a904cab779118005b/grpc-server-spring-boot-autoconfigure/src/main/java/net/devh/boot/grpc/server/autoconfigure/GrpcServerMetricAutoConfiguration.java#L63-L75

Copy this to your code and change the interceptor order to -1000 (smaller than the grpc advice one)

https://github.com/yidongnan/grpc-spring-boot-starter/blob/3f8f27506542a954a05be71a904cab779118005b/grpc-common-spring-boot/src/main/java/net/devh/boot/grpc/common/util/InterceptorOrder.java#L46-L53

ST-DDT commented 1 year ago

😭 The current grpc advice implementation seems to be broken and might have to be reimplemented. For unary calls it seems to work to a certain extend, but causes an undefined state in the call itself. For unary calls it doesn't work (anymore) as it tries to close the call twice, and logs errors.

harti2006 commented 1 year ago

Thanks for investigating. As promised I added another case to the example project where I send the exception directly via streamObserver:

responseObserver.onError(Status.NOT_FOUND.withDescription("goodbye message not found").asException());

I did the same test as before and with this approach, the metrics were reported correctly. See https://github.com/harti2006/grpc-metrics-demo/blob/933c29641a3c813db649cdd275ba022e08637820/src/test/java/com/github/harti2006/grpc/metrics/GrpcMetricsDemoApplicationTests.java#L72

I also tried your suggestion with changing the order of the MetricCollectingServerInterceptor bean, but it made things even worse: No metrics were reported at all.

pawelKapl commented 1 year ago

Hey, i have the very same issue. Did u managed to fix that problem? I am afraid i will need to downgrade this lib to 2.13.1. Do u need any help in investigation?

ST-DDT commented 1 year ago

A up/downgrade of this library should not have an impact on that as the official metric implementation is the exact same as this one's AFAIK, and I PRed it there.

pawelKapl commented 1 year ago

Hmm OP mentioned that downgrade to 2.13.1 is resolving this issue. What do u suggest then?

ST-DDT commented 1 year ago

I think it doesnt hurt trying. If it does solve the problem, it would be nice if you could debug into it and compare what is different.

pawelKapl commented 1 year ago

@ST-DDT going back to 2.13.1 helped (SpringBoot 2.7.12 here). In a free time will take a look at both versions and try to find issue there.

joaomper-TE commented 3 months ago

Any way of using @GrpcAdvice with Spring Boot 3.X? responseObserver.onError( is not working for us.