ministryofjustice / cloud-platform

Documentation on the MoJ cloud platform
MIT License
84 stars 44 forks source link

Upgrade RDS Module Storage Class to current generation #5873

Open timckt opened 2 months ago

timckt commented 2 months ago

Background

Currently our RDS module is using previous generation of storage class which is gp2 or io1.

We need to update the module to use the current generation of storage class (gp3 or io2) by default.

This would save money - https://aws.amazon.com/blogs/storage/migrate-your-amazon-ebs-volumes-from-gp2-to-gp3-and-save-up-to-20-on-costs/

Bug in current branch

There is a bug if we directly use the working branch to deploy gp3 type RDS.

We can run use Concourse to deploy gp3 RDS with the working branch. Example PR here

However, after completion and when that namespace is in the apply pipeline, we will have the below error.

FATA[5477] error running terraform on namespace hmpps-prison-person-api-prod: unable to apply Terraform: exit status 1

Error: updating RDS DB Instance (cloud-platform-e6f0c67759ffad65): operation error RDS: ModifyDBInstance, https response error StatusCode: 400, RequestID: 00426d08-34f6-4579-82bc-0badb2da3f6d, api error InvalidParameterCombination: You can't specify IOPS or storage throughput for engine postgres and a storage size less than 400.

https://concourse.cloud-platform.service.justice.gov.uk/teams/main/pipelines/environments-live/jobs/apply-live-c/builds/3014#L66bcb001:25407

When we run terraform plan manually in that namespace, we can find below update

  # module.rds.aws_db_instance.rds will be updated in-place
  ~ resource "aws_db_instance" "rds" {
        id                                    = "cloud-platform-e6f0c67759ffad65"
      ~ iops                                  = 3000 -> 0
        name                                  = "dbe6f0c67759ffad65"
        tags                                  = {
            "application"            = "hmpps-prison-person-api"
            "business-unit"          = "HMPPS"
            "environment-name"       = "prod"
            "infrastructure-support" = "dps-hmpps@digital.justice.gov.uk"
            "is-production"          = "true"
            "namespace"              = "hmpps-prison-person-api-prod"
            "owner"                  = "connect-dps"
        }
        # (54 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

The change in iops ~ iops = 3000 -> 0 is the fatal reason.

According to RDS guide, we cannot change IOPS or storage throughput for engine postgres and a storage size less than 400.

However, in our RDS current module version logic, the default value of variable db_iops is 0 and if we change it to non 0 value, the module will deploy io2 type instead of gp3 type RDS.

  storage_type                 = var.db_iops == 0 ? "gp3" : "io2"
  iops                         = var.db_iops

Fix this logic so

Proposed user journey

Approach

Which part of the user docs does this impact

Communicate changes

Questions / Assumptions

Definition of done

Reference

How to write good user stories

timckt commented 2 months ago

We have a user would like to upgrade to gp3 for their RDS. Created this branch for them and its working in their namespace.

We may continue to use this branch for further testing/development.

tom-j-smith commented 1 month ago

The cost per GB is the same for gp2 or gp3 in RDS (unlike in EC2 where gp3 is cheaper). However, gp3 provides more baseline IOPS (for volumes up to 4,000 GB) without the limitation of bursting...In almost all cases, your default choice should be gp3. Simple change to update the module and no need to force update to existing RDS instances would only affect new RDS releases.