Open rafal-dudek opened 1 week ago
I think part of the PR(https://github.com/micrometer-metrics/micrometer/pull/3512) fixes this problem.
It always sets the status code to UNKNOWN
when the value is not available.
@shakuzen @jonatan-ivanov Let me know how you’d like to proceed.
Describe the bug DefaultGrpcServerObservationConvention and DefaultGrpcClientObservationConvention does not set 'grpc.status_code' when status code is missing (e.g. at the start of the call).
Micrometer Observation is invoking "addLowCardinalityKeyValues" at the start of the observation and at the end of the observation. At the start, value for status code is null. In such case, the label is not added: https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/grpc/DefaultGrpcServerObservationConvention.java#L51
If the call is interrupted (e.g. cancelled) and does not finish with the status code, the metric is collected with 4 labels, e.g:
[tag(error=none),tag(rpc.method=hello),tag(rpc.service=Greeter),tag(rpc.type=UNARY)]
When the next call is finished, the metric is collected with 5 labels e.g.:
[tag(error=none),tag(grpc.status_code=OK),tag(rpc.method=hello),tag(rpc.service=Greeter),tag(rpc.type=UNARY)]
For many MeterRegistries this is not a problem, but when using PrometheusMeterRegistry it creates exception:
https://github.com/micrometer-metrics/micrometer/blob/main/implementations/micrometer-registry-prometheus/src/main/java/io/micrometer/prometheusmetrics/PrometheusMeterRegistry.java#L583
Different components are reporting "UNKNOWN" value for such cases, instead of omitting label e.g.: https://github.com/spring-projects/spring-framework/blob/main/spring-web/src/main/java/org/springframework/http/server/observation/DefaultServerRequestObservationConvention.java#L125 https://github.com/spring-projects/spring-framework/blob/main/spring-web/src/main/java/org/springframework/http/server/reactive/observation/DefaultServerRequestObservationConvention.java#L126
Environment
To Reproduce How to reproduce the bug: Create request to GrpcService and cancel it before it is finished (artificial Sleep in the Service may be required), then create request and let it be completed.
Set Debug breakpoint here: https://github.com/micrometer-metrics/micrometer/blob/main/implementations/micrometer-registry-prometheus/src/main/java/io/micrometer/prometheusmetrics/PrometheusMeterRegistry.java#L583 or add MeterRegistrationFailedListener to see the exception.
Expected behavior DefaultGrpcServerObservationConvention and DefaultGrpcClientObservationConvention should not cause error in PrometheusMeterRegistry in any scenario.