mitodl / salt-ops

Repository for building, managing and deploying salt-based infrastructure
BSD 3-Clause "New" or "Revised" License
47 stars 7 forks source link

Create read replicas for databases used in BI #981

Closed blarghmatey closed 1 year ago

blarghmatey commented 5 years ago

In order to avoid load issues from BI and problems with locks that block migrations we want to have read replicas configured (at least until we get some ETL set up).

blarghmatey commented 5 years ago

As a corollary to this, we might look at adding support in our orchestrate scripts for RDS deployment to specify when to create a read replica.

blarghmatey commented 5 years ago

As part of this work, it would be useful to get a ballpark number on cost for the read replicas so that we can use that when planning next steps for analytical data storage.

markbreedlove commented 5 years ago

Starting to look at costs.

markbreedlove commented 5 years ago

Cost analysis

I am avoiding putting detailed information about RDS instances into this issue, which is in a public tracker. Please contact me if you want details.

There are six RDS instances that are probably candidates for read replicas. One RDS instance, for MicroMasters, already has a read replica, which may serve as an example for sizing. Its master is db.m4.large and its replica is db.t2.medium.

The candidate RDS instances range in size, and probably want different-sized replicas. Hypothetically, the three of them are m5.large or m4.large could have replicas of t2.medium. Two are t2.small, and could have replicas of t2.small, and one is t2.micro, and could have a t2.micro replica. This means:

The existing read replica for MicroMasters is paid for by an RI that I can't track down, because there are no RIs listed in our EC2 control panel. It's possible that some of the replicas above could be handled by RIs. However, the monthly on-demand costs would be as follows:

Total: $224 / month

(The fees above are reported by ec2instances.info.)

markbreedlove commented 5 years ago

By the way, there's a MySQL datasource in Redash named "TechTV". TechTV is a thing of the past -- do we want a replica of that database, as well?

blarghmatey commented 5 years ago

That is a copy of the database that had been backing TechTV so no, there's no need for a read replica of that instance.

markbreedlove commented 4 years ago

I've merged https://github.com/mitodl/salt-ops/pull/1047

The next step is to create a commit for adding an RDS replica and a parameter group to an existing QA database, with the intention of reverting this commit after we've done our testing.

markbreedlove commented 4 years ago

I've made this commit for testing: https://github.com/mitodl/salt-ops/commit/713a0cf268b965e819933911269fa7fac6f585c8

markbreedlove commented 4 years ago

I've reverted that commit and will try a different one with an environment that has only one RDS instance.

markbreedlove commented 4 years ago

This is the new commit for testing, this time with MITx Pro QA: ca489ef

markbreedlove commented 4 years ago

I've made a number of commits recently to fix things up and the remaining issue that I'm aware of appears to be a bug in Saltstack, described in this ticket: https://github.com/saltstack/salt/issues/52159

I get this error trying to run boto_rds.replica_present, which is in our rds.sls state:

----------
          ID: create_mitxpro-qa_mitxproqa_rds_replica
    Function: boto_rds.replica_present
        Name: mitxpro-qa-rds-mariadb-mitxproqa-replica
      Result: False
     Comment: An exception occurred in this state: Traceback (most recent call last):
                File "/usr/lib/python2.7/dist-packages/salt/state.py", line 1933, in call
                  **cdata['kwargs'])
                File "/usr/lib/python2.7/dist-packages/salt/loader.py", line 1951, in wrapper
                  return f(*args, **kwargs)
                File "/usr/lib/python2.7/dist-packages/salt/states/boto_rds.py", line 399, in replica_present
                  keyid, profile)
                File "/var/cache/salt/master/extmods/modules/boto_rds.py", line 337, in create_read_replica
                  if not backup_retention_period:
              NameError: global name 'backup_retention_period' is not defined
     Started: 15:36:43.486771
    Duration: 43.794 ms
     Changes:

That issue was filed in May, but nobody's been acting on getting its PR merged. The reviewer requested tests to be added, but the contributor has not followed up.

@blarghmatey has informed me that we're running our own fork of this module, so we will be able to fix it ourselves. We'll also have to investigate what changes we can sync from their upstream.

markbreedlove commented 4 years ago

I've created https://github.com/mitodl/salt-extensions/pull/47 to fix the issue above.

markbreedlove commented 4 years ago

So far, we have an almost-ideal RDS state in orchestrate.aws.rds. It will create a new read replica and a new parameter group. It will update an existing parameter group with custom properties. The only thing that it fails to do (presumably an issue with the boto_rds.present state) is to apply a different parameter group to an existing RDS database.

shaidar commented 4 years ago

@blarghmatey Is this considered done?

blarghmatey commented 4 years ago

Not yet because we have some inconsistencies in naming due to legacy issues which requires some planning for migrating the names to be in line with the current standards. It also coincides with the need to update some of the Vault mounts.