Open deedeeh opened 1 year ago
Thanks for looking into this and opening this issue!
I'm not very familiar with the differences between Step scaling
and Simple scaling
but this documentation from AWS implies that Step scaling
is normally preferable:
In most cases, step scaling policies are a better choice than simple scaling policies, even if you have only a single scaling adjustment.
This documentation, plus the fact that aws-cdk
seems to provide good support for Step scaling
and very limited support for Simple scaling
, makes me wonder if we would be better off migrating to Step scaling
, rather than adding custom support for Simple scaling
via this library? This would help us to standardise our approach with AWS's apparent direction of travel and would also reduce the future maintenance burden for DevX (because we could rely on the underlying AWS library, rather than implementing our own functions/constructs).
IIUC we could achieve the overall goal of the Slab's current CloudFormation using scaleOnCpuUtilization
and scaleOnMetric
(which seems to allow for the use of custom metrics like Lag
), although I appreciate that this would require some tuning and would not be a drop in replacement for the current resources defined via CloudFormation.
Thank you for getting back to me @jacobwinch. I will talk to the Ophan team and get their thoughts on moving to step scaling
and see if there are any blockers that wouldn't allow us to do that change. We tried the scaleOnCpuUtilization
when I paired with Akash on Thursday but the snapshot didn't have all the data we needed that is why we chose the long path. Do you have a timeframe regarding the migration to Step scaling
work?
I will talk to the Ophan team and get their thoughts on moving to step scaling and see if there are any blockers that wouldn't allow us to do that change.
Great, thanks! If there is a particular reason that Simple scaling
is preferred after this discussion then please let us know as it might make a stronger case for adding this feature to GuCDK!
We tried the scaleOnCpuUtilization when I paired with Akash on Thursday but the snapshot didn't have all the data we needed that is why we chose the long path.
I believe @waisingyiu implemented this approach for MAPI - I guess it's beyond the scope of this issue but it could be worth chatting with him about this approach if you decide to migrate.
Do you have a timeframe regarding the migration to
Step scaling
work?
No, I don't think there is any rush to migrate if you were already able to get something working using Simple scaling
; I think it could be prioritised against your team's other health work.
No, I don't think there is any rush to migrate if you were already able to get something working using Simple scaling; I think it could be prioritised against your team's other health work.
We talked about it in Standup and we decided that we will migrate to CDK with simple-scaling
and what we have at the moment and then we will see what changes need to be done to move from simple-scaling
to step-scaling
in our autoscaling policies.
I believe @waisingyiu implemented this approach for MAPI - I guess it's beyond the scope of this issue but it could be worth chatting with him about this approach if you decide to migrate.
The Slab PR is in review now so if we need to take that approach as part of migrating to CDK we will contact @waisingyiu
Thank you @jacobwinch for your help
The Ophan team is halfway through the transition from CFN Yaml to GuCDK for one of their components, The Slab, and that's when we encountered lots of code to be written in Typescript for autoscaling rules (ScaleUpPolicy, ScaleDownPolicy) and 3 Cloudwatch alarms and each one of them references one of the autoscaling rules as an action. Also I want to mention that
Lag
is a custom metric. For more details check https://github.com/guardian/ophan/blob/c6321d02bc50723f0d46efb022e2305709fc826f/cloudformation/the-slab.cfn.yaml#L155-L220It would have been easier if we are using
Step scaling
but unfortunately we are usingSimple scaling
which required lots of AWS CDK code.I paired with @akash1810 to find the best way to accomplish that and this is how we did it https://github.com/guardian/ophan/pull/5326/commits/72d00fba5b2ba2f3dfa5ef72420a65683c1a8955 but I asked if such scenarios could be done in GuCDK so that is why I wanted to add visibility to that requirement and if it is something that DevX team would consider.