micrometer-metrics / micrometer

An application observability facade for the most popular observability tools. Think SLF4J, but for observability.
https://micrometer.io
Apache License 2.0
4.47k stars 990 forks source link

ObservationConventions for Grpc can cause exception in PrometheusMeterRegistry #5609

Open rafal-dudek opened 1 week ago

rafal-dudek commented 1 week ago

Describe the bug DefaultGrpcServerObservationConvention and DefaultGrpcClientObservationConvention does not set 'grpc.status_code' when status code is missing (e.g. at the start of the call).

Micrometer Observation is invoking "addLowCardinalityKeyValues" at the start of the observation and at the end of the observation. At the start, value for status code is null. In such case, the label is not added: https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/grpc/DefaultGrpcServerObservationConvention.java#L51

If the call is interrupted (e.g. cancelled) and does not finish with the status code, the metric is collected with 4 labels, e.g: [tag(error=none),tag(rpc.method=hello),tag(rpc.service=Greeter),tag(rpc.type=UNARY)]

When the next call is finished, the metric is collected with 5 labels e.g.: [tag(error=none),tag(grpc.status_code=OK),tag(rpc.method=hello),tag(rpc.service=Greeter),tag(rpc.type=UNARY)]

For many MeterRegistries this is not a problem, but when using PrometheusMeterRegistry it creates exception:

Prometheus requires that all meters with the same name have the same set of tag keys. There is already an existing meter named 'grpc.server' containing tag keys [error, rpc.method, rpc.service, rpc.type]. The meter you are attempting to register has keys [error, grpc.status_code, rpc.method, rpc.service, rpc.type].

https://github.com/micrometer-metrics/micrometer/blob/main/implementations/micrometer-registry-prometheus/src/main/java/io/micrometer/prometheusmetrics/PrometheusMeterRegistry.java#L583

Different components are reporting "UNKNOWN" value for such cases, instead of omitting label e.g.: https://github.com/spring-projects/spring-framework/blob/main/spring-web/src/main/java/org/springframework/http/server/observation/DefaultServerRequestObservationConvention.java#L125 https://github.com/spring-projects/spring-framework/blob/main/spring-web/src/main/java/org/springframework/http/server/reactive/observation/DefaultServerRequestObservationConvention.java#L126

Environment

To Reproduce How to reproduce the bug: Create request to GrpcService and cancel it before it is finished (artificial Sleep in the Service may be required), then create request and let it be completed.

GreeterGrpc.GreeterFutureStub stub = GreeterGrpc.newFutureStub(inProcChannel);
stub.hello(helloRequest).cancel(true);
stub.hello(helloRequest).get();

Set Debug breakpoint here: https://github.com/micrometer-metrics/micrometer/blob/main/implementations/micrometer-registry-prometheus/src/main/java/io/micrometer/prometheusmetrics/PrometheusMeterRegistry.java#L583 or add MeterRegistrationFailedListener to see the exception.

Expected behavior DefaultGrpcServerObservationConvention and DefaultGrpcClientObservationConvention should not cause error in PrometheusMeterRegistry in any scenario.

ttddyy commented 1 week ago

I think part of the PR(https://github.com/micrometer-metrics/micrometer/pull/3512) fixes this problem. It always sets the status code to UNKNOWN when the value is not available.

@shakuzen @jonatan-ivanov Let me know how you’d like to proceed.