keycloak / keycloak-benchmark

Keycloak Benchmark
https://www.keycloak.org/keycloak-benchmark/
Apache License 2.0
132 stars 74 forks source link

Change worker pool instance type from m5.4xlarge to m5.2xlarge #1002

Closed kami619 closed 3 weeks ago

kami619 commented 1 month ago
### Tasks
- [x] update the workerpool type in all places both IaC and Docs
- [x] run the tests to make sure deployment goes through as expected
- [x] analyze the run results for any issues
- [x] Update Keycloak docs - https://github.com/keycloak/keycloak/blob/316e00bb98fd33bac0af5fdbaf7caac4481bee82/docs/guides/high-availability/concepts-memory-and-cpu-sizing.adoc?plain=1#L130
kami619 commented 1 month ago

https://github.com/kami619/keycloak-benchmark/actions/runs/11217372907 - deployment tests are underway.

kami619 commented 1 month ago

Based on the run there are a few observations,

The pricing would still be 50% less for 4 x m5.2xl than the 3 x m5.4xl nodes, while satisfying the vCPU core count requirement for our deployment.

For 4 m5.2xlarge instances: 4 × $0.384/hour=$1.536/hour

For 3 m5.4xlarge instances: 3 x $0.768/hour=$2.304/hour
Screenshot 2024-10-07 at 16 14 17

@ahus1 let me know and I will make the necessary changes needed.

ahus1 commented 1 month ago

Hi Kamesh, thank you for the analysis. Doing the math again, I think it would save us 33% on the worker node costs.

I would have hoped the cluster would scale fast enough for the deployment to complete without changing the min-nodes. Maybe there is a timeout you can tweak?

Still, let's proceed with the change, and change the min number of nodes to 4 as you suggested, as it will speed up the overall deployment. You might want to change the max nodes from 10 to 15 in case someone does another deployment on that cluster to be on the safe side.

mhajas commented 1 month ago

@kami619 @ahus1 Should we also update this: https://github.com/keycloak/keycloak/blob/316e00bb98fd33bac0af5fdbaf7caac4481bee82/docs/guides/high-availability/concepts-memory-and-cpu-sizing.adoc?plain=1#L130 as part of this issue

kami619 commented 1 month ago

Good point @mhajas, we could do that once we review the results from nightly, and let's propagate that change.