spring-cloud / spring-cloud-circuitbreaker

Spring Cloud Circuit Breaker API and Implementations
Apache License 2.0
329 stars 110 forks source link

Resilience4j circuit breaker metrics triple published each call #126

Closed edwardsre closed 2 years ago

edwardsre commented 2 years ago

Using spring-cloud-starter-circuitbreaker-reactor-resilience4j with default settings is causing the circuit breaker meter volume to be 3 times the expected volume.

Sample Build and run the spring-cloud-circuitbreaker-demo-reactive-resilience4j application in spring cloud samples.

  1. Make an http request to the delay endpoint such as http :8080 /delay/1.
  2. In a browser, go to the actuator metrics endpoint http://localhost:8080/actuator/metrics/resilience4j.circuitbreaker.calls?tag=kind:successful

Result: The count for the delay circuit breaker is 3 instead of 1.

{
  name: "resilience4j.circuitbreaker.calls",
  description: "Total number of successful calls",
  baseUnit: "seconds",
  measurements: [ {
      statistic: "COUNT",
      value: 3
    }, {
      statistic: "TOTAL_TIME",
      value: 5.042378715
    }, {
      statistic: "MAX",
      value: 1.680792905
    }
  ],
  availableTags: [ {
      tag: "name",
      values: [
        "delay"
      ]
    } ]
}

Investigation:

I have found there are three places where the same event listeners are being registered which is causing the metrics to be reported three times for each circuit breaker call.

The first instance is ReactiveResilience4JAutoConfiguration.MicrometerReactiveResilience4JCustomizerConfiguration which uses TaggedCircuitBreakerMetrics in init() method.

The second instance is Resilience4JAutoConfiguration.MicrometerResilience4JCustomizerConfiguration which also uses TaggedCircuitBreakerMetrics in init() method.

While both of these use the correct reactive/non-reactive circuit breaker factory, both of them use the same underlying registry instance. At this point, the event listeners have been doubled.

The third set of event listeners comes from org.springframework.cloud:spring-cloud-circuitbreaker-resilience4j including io.github.resilience4j:resilience4j-spring-boot2 where there is the auto-configuration class CircuitBreakerMetricsAutoConfiguration. This creates a TaggedCircuitBreakerMetricsPublisher which calls addMetrics and registers the third set of even listeners.

All three of these can be controlled by the properties below, but by default, it is unexpected that the metric volume is 3 times what is expected.

This is how I have had to configure my application to get the meter counts to be correct. spring.cloud.circuitbreaker.resilience4j.reactive.enabled=true (disables all reactive circuit breaker auto configuration, including metrics) spring.cloud.circuitbreaker.resilience4j.blocking.enabled=false (disables all blocking circuit breaker auto configuration, including metrics) resilience4j.circuitbreaker.metrics.legacy.enabled=false (only disables metrics)

If an application uses both blocking and reactive circuit breakers (i.e. servlet container using spring-boot-starter-web and spring-boot-starter-webflux), the meter publishing would at least be doubled since the first two properties would have to be enabled.

edwardsre commented 2 years ago

Ideally, the spring cloud starter's auto configuration would remove the metrics configuration and defer to the resilience4j auto configuration for metrics setup. That way, both blocking and reactive circuit breaker factories could be used and metrics would have a single shut off property using resilience4j.circuitbreaker.metrics.legacy.enabled=false.