Closed ioannakok closed 2 months ago
Implemented a step scaling policy in https://github.com/guardian/dotcom-rendering/pull/10127
Moving to backlog after merging https://github.com/guardian/dotcom-rendering/pull/10127 since there are further opportunities to improve the scaling logic and these can be addressed in future health weeks
Re-opening here: https://github.com/guardian/dotcom-rendering/pull/11715
Ticket created from https://github.com/guardian/dotcom-rendering/issues/8345#issuecomment-1647502598
This ticket is to:
DCR Scaling Policy
DCR is currently using Simple Scaling Policy[^1]. AWS have essentially deprecated this style of scaling and are promoting their alternative solutions as a best practice:
Do we want to consider adopting one of the other alternatives?
Some examples in other Guardian repos:
TargetTrackingScaling
in ARStepScaling
in OphanDCR Scaling Policy Metric
We're currently scaling based on latency:
https://github.com/guardian/dotcom-rendering/blob/83927738d0a45719cd156b7fb756feaa2267155d/dotcom-rendering/cdk/lib/dotcom-rendering.ts#L296-L301
and scaling up by doubling our capacity every 10 minutes:
https://github.com/guardian/dotcom-rendering/blob/83927738d0a45719cd156b7fb756feaa2267155d/dotcom-rendering/cdk/lib/dotcom-rendering.ts#L272-L277
whilst scaling down by removing an instance once every 2 minutes:
https://github.com/guardian/dotcom-rendering/blob/83927738d0a45719cd156b7fb756feaa2267155d/dotcom-rendering/cdk/lib/dotcom-rendering.ts#L278-L283
Do we want to consider other scaling strategies? Apps-rendering, for example, via
guardian/cdk
, scales based on a target CPU utilisation:https://github.com/guardian/dotcom-rendering/blob/e234547e28cb1071b83c526fdbae7df5c361f522/apps-rendering/cdk/lib/mobile-apps-rendering.ts#L104-L106
[^1]: Simple scaling is the default, so if
PolicyType
is unspecified,SimpleScaling
is used.