Open agnihp opened 4 years ago
Resilience4j version: 1.4.0
Java version: 1.8.0_65
I am using resiliance4j circuit breaker with spring boot. in the actuator health point metrics, I am seeing some disparities in slow-calls, a slow failed calls number are coming in negative instead of positive. which is not letting my circuit breaker to open. can anyone help me the significance of this, what these negative values mean?
"endpoint1":{ "status":"UP", "details":{ "failureRate":"0.0%", "failureRateThreshold":"50.0%", "slowCallRate":"0.0%", "slowCallRateThreshold":"50.0%", "bufferedCalls":3500, "slowCalls":0, "slowFailedCalls":-2682, "failedCalls":0, "notPermittedCalls":0, "state":"CLOSED" } }
I am using spring boot version 2.2.0, reactor core version: 3.3.5. I am using an annotation-based circuit breaker. and this issue is coming on higher load.
I'm also facing the same issue when exporting the metrics via Prometheus, the failure rate shows negative numbers for some of my circuit breakers.
More info on this. It seems related to the code block below:
private float getFailureRate(Snapshot snapshot) { int bufferedCalls = snapshot.getTotalNumberOfCalls(); if (bufferedCalls == 0 || bufferedCalls < minimumNumberOfCalls) { return -1.0f; } return snapshot.getFailureRate(); }
I am facing the same issue, has anyone able to solve this?
{"@timestamp":"2020-09-13T18:54:07.436Z", "log.level": "INFO", "message":"CircuitBreaker: CB1 | Successful call count: 0 | Failed call count: 4 | Failure rate %:-1.0 | Slow call count: 4 | Slow rate %:-1.0 | Slow failed call count: -206 | Slow success call count: 210 | State: CLOSED"}
Adding more details to the issue,
I have added a circuit breaker over the mysql database for slow calls. When my database connection is fine, I got the following state on a successful call.
{"@timestamp":"2020-09-13T20:38:47.572Z", "log.level": "INFO", "message":"CircuitBreaker: CB1 | Successful call count: 453 | Failed call count: 3 | Failure rate %:0.65789473 | Slow call count: 85 | Slow rate %:18.64035 | Slow failed call count: 3 | Slow success call count: 82 | State: CLOSED"}
After the above call, my database went down and my vertx application was unable to make a connection to the database, and the connection timeout is of 5sec. And there is only one thread who tries to get the connection from the pool, it timed out after 5 sec and I got the following state after 5 sec. in this 5 sec no other calls are executed as other calls were waiting for getConnectionThread to get free.
{"@timestamp":"2020-09-13T20:38:52.573Z", "log.level": "INFO", "message":"CircuitBreaker: CB1 | Successful call count: 0 | Failed call count: 4 | Failure rate %:-1.0 | Slow call count: 4 | Slow rate %:-1.0 | Slow failed call count: 4 | Slow success call count: 0 | State: CLOSED"}
Something happened in this 5-sec gap, where the metrics went negative. Not able to figure out the reason yet. @dlsrb6342 can you please help here?
Failure rate and slow call rate are shown as -1.0
, if the number of measured calls is below the minimum number of calls.
A failure rate of 0
would be wrong in that acse.
I am facing the same issue.
"details": {
"failureRate": "0.0%",
"failureRateThreshold": "10.0%",
"slowCallRate": "0.0%",
"slowCallRateThreshold": "10.0%",
"bufferedCalls": 150,
"slowCalls": 0,
"slowFailedCalls": -134570,
"failedCalls": 0,
"notPermittedCalls": 0,
"state": "CLOSED"
}
I mind the slowFailedCalls. It is in negative. Although the circuit breaker switched to OPEN as expected during trouble, but once the metrics are reset after state change to HALF_OPEN and CLOSE, then some slow or failed calls are recorded, after some time when the slow and failed calls closes to 0, slowFailedCalls starts to come in negative.
Seems the next opportunity for slowFailedCalls to reset to 0 is not certain. I am also using an annotation-based circuit breaker. Is this behavior expected?
This is in production now and would like to rollback if this is not expected behavior.
@selly-selly Which version are you using?
@RobWin , Thanks for quick response. I'm using resilience4j-spring-boot2 v1.7.0, spring boot v2.2.1
Seems it happens when slowFailedCalls count < slowCalls or failedCalls "slowCalls": 6, "slowFailedCalls": 3, "failedCalls": 6, From above metrics then as success calls come in: "slowCalls": 6 → 5 → 4 → 3 → 2 → 1 → 0 → 0 "slowFailedCalls": 3 → 2 → 1 → 0 → -1 → -2 → -3 → -4 "failedCalls": 6 → 5 → 4 → 3 → 2 → 1 → 0 → 0
Hello~ Any updates about this?
Resilience4j version: 1.4.0
Java version: 1.8.0_65
I am using resiliance4j circuit breaker with spring boot. in the actuator health point metrics, I am seeing some disparities in slow-calls, a slow failed calls number are coming in negative instead of positive. which is not letting my circuit breaker to open. can anyone help me the significance of this, what these negative values mean?
I am using spring boot version 2.2.0, reactor core version: 3.3.5. I am using an annotation-based circuit breaker. and this issue is coming on higher load.