guardian / dotcom-rendering

The Guardian web rendering service (aka DCR/DCAR)
https://www.theguardian.com
Apache License 2.0
252 stars 29 forks source link

Review DCR Scaling Policy #9311

Closed ioannakok closed 2 months ago

ioannakok commented 11 months ago

Ticket created from https://github.com/guardian/dotcom-rendering/issues/8345#issuecomment-1647502598

This ticket is to:

  1. Experiment with alternative solutions on the DCR scaling policy
  2. Review the metric upon which we want to base our scaling on.

DCR Scaling Policy

DCR is currently using Simple Scaling Policy[^1]. AWS have essentially deprecated this style of scaling and are promoting their alternative solutions as a best practice:

image

Do we want to consider adopting one of the other alternatives?

Some examples in other Guardian repos:

DCR Scaling Policy Metric

We're currently scaling based on latency:

https://github.com/guardian/dotcom-rendering/blob/83927738d0a45719cd156b7fb756feaa2267155d/dotcom-rendering/cdk/lib/dotcom-rendering.ts#L296-L301

and scaling up by doubling our capacity every 10 minutes:

https://github.com/guardian/dotcom-rendering/blob/83927738d0a45719cd156b7fb756feaa2267155d/dotcom-rendering/cdk/lib/dotcom-rendering.ts#L272-L277

whilst scaling down by removing an instance once every 2 minutes:

https://github.com/guardian/dotcom-rendering/blob/83927738d0a45719cd156b7fb756feaa2267155d/dotcom-rendering/cdk/lib/dotcom-rendering.ts#L278-L283

Do we want to consider other scaling strategies? Apps-rendering, for example, via guardian/cdk, scales based on a target CPU utilisation:

https://github.com/guardian/dotcom-rendering/blob/e234547e28cb1071b83c526fdbae7df5c361f522/apps-rendering/cdk/lib/mobile-apps-rendering.ts#L104-L106

[^1]: Simple scaling is the default, so if PolicyType is unspecified, SimpleScaling is used.

### Tasks
- [ ] https://github.com/guardian/dotcom-rendering/issues/10273
ioannakok commented 11 months ago

Progress so far: https://docs.google.com/document/d/1FjDANYnddGrB3JnXdMjoHnie2kB7fL3MgJachCVZHVg/edit

cemms1 commented 8 months ago

Implemented a step scaling policy in https://github.com/guardian/dotcom-rendering/pull/10127

cemms1 commented 8 months ago

Moving to backlog after merging https://github.com/guardian/dotcom-rendering/pull/10127 since there are further opportunities to improve the scaling logic and these can be addressed in future health weeks

arelra commented 3 months ago

Re-opening here: https://github.com/guardian/dotcom-rendering/pull/11715

arelra commented 2 months ago

Completed by: https://github.com/guardian/dotcom-rendering/pull/11837 https://github.com/guardian/dotcom-rendering/pull/11928