Closed ahus1 closed 4 months ago
@kami619 - could you please have a look? As a workaround, I've disabled the restart in the nightly run.
It might not have happend on every run, but at least the one listed above.
cc: @andyuk1986
An alternative to the hard restart would be a rolling update - then the health check shouldn't trigger.
@ahus1 yeah, I have seen this once as well, but then couldn't reproduce. Thought that perhaps this happens when we run the benchmark for several times and the data is already cached, that's why metrics doesn't show xsite requests in the time range, but as I said couldn't reproduce this.
@ahus1 sure thing, let me see if I can get in a task to re-arm the route53 health check after we do the restart as needed, it would also mean we need to route the traffic as needed to the right cluster by validating which is the primary.
@ahus1
https://github.com/kami619/keycloak-benchmark/actions/runs/8973649052/job/24644308418#step:27:111 this failed again for same reason, even though both the sites are working and route53 health checks are reporting healthy. It succeeded a prior attempt, so not sure if its directly tied to the xsite health checks.
It failed with the same error message, but at a different step: This time it failed after the test "client credential grants" completed. Giving it another thought, this is expected: During "client credential grants", there are no expected. I'm pushing a workaround for this: 9a817d71060203876b0a3338ff7826b4e40d6002
Describe the bug
As part of #776, we added a Keycloak restart to reset Keycloak's memory usage. When doing this, we forgot to add a re-arming of the Route 53 health check, as the AWS lambda might have disabled the primary site while the restart is on its way.
Version
main
Expected behavior
After the restart, the traffic should go to the primary site.
Actual behavior
After the restart, depending on how fast the restart was, the traffic might go to the secondary site. Due to this, all metrics retrieved from the primary site would be irrelevant.
How to Reproduce?
Might happen only sometimes. First noticed https://github.com/keycloak/keycloak-benchmark/actions/runs/8969995290/job/24635509387 reported no xsite messages sent from the first site.
Anything else?
No response