ministryofjustice / cloud-platform

Documentation on the MoJ cloud platform
MIT License
87 stars 44 forks source link

Review Disaster Recovery Scenario: Losing the whole cluster #5999

Open sj-williams opened 3 months ago

sj-williams commented 3 months ago

Background

This DR scenario documentation is now out of date - it relies on using the obsolete create-cluster ruby script (no longer working since we introduced the core terraform layer and has not been maintained since we switched to CP CLI).

Approach

Run through the steps for simulating total loss of a test cluster and restore using the CLI create cluster command.

Once testing is complete, update the runbook.

Finally, now would be a good time to remove the old ruby cluster scripts from the infra repo.

Questions / Assumptions

Has any part of the DR process changed other than the move to the CLI create cluster?

Definition of done

Reference

How to write good user stories

FolarinOyenuga commented 1 month ago

Will be done with this in sprint 15.

FolarinOyenuga commented 2 weeks ago

Recreation Process

Test Cluster Creation

Errors encountered have been penned in an issue here

Successfully created one using the concourse pipeline instead.

Recreating a Recovery Scenario

FolarinOyenuga commented 4 days ago

Had a blocker on this last week before taking down the initial test cluster. But i seem to have found a fix.