Closed blarghmatey closed 1 year ago
As a corollary to this, we might look at adding support in our orchestrate scripts for RDS deployment to specify when to create a read replica.
As part of this work, it would be useful to get a ballpark number on cost for the read replicas so that we can use that when planning next steps for analytical data storage.
Starting to look at costs.
I am avoiding putting detailed information about RDS instances into this issue, which is in a public tracker. Please contact me if you want details.
There are six RDS instances that are probably candidates for read replicas. One RDS instance, for MicroMasters, already has a read replica, which may serve as an example for sizing. Its master is db.m4.large and its replica is db.t2.medium.
The candidate RDS instances range in size, and probably want different-sized replicas. Hypothetically, the three of them are m5.large or m4.large could have replicas of t2.medium. Two are t2.small, and could have replicas of t2.small, and one is t2.micro, and could have a t2.micro replica. This means:
The existing read replica for MicroMasters is paid for by an RI that I can't track down, because there are no RIs listed in our EC2 control panel. It's possible that some of the replicas above could be handled by RIs. However, the monthly on-demand costs would be as follows:
Total: $224 / month
(The fees above are reported by ec2instances.info.)
By the way, there's a MySQL datasource in Redash named "TechTV". TechTV is a thing of the past -- do we want a replica of that database, as well?
That is a copy of the database that had been backing TechTV so no, there's no need for a read replica of that instance.
I've merged https://github.com/mitodl/salt-ops/pull/1047
The next step is to create a commit for adding an RDS replica and a parameter group to an existing QA database, with the intention of reverting this commit after we've done our testing.
I've made this commit for testing: https://github.com/mitodl/salt-ops/commit/713a0cf268b965e819933911269fa7fac6f585c8
I've reverted that commit and will try a different one with an environment that has only one RDS instance.
This is the new commit for testing, this time with MITx Pro QA: ca489ef
I've made a number of commits recently to fix things up and the remaining issue that I'm aware of appears to be a bug in Saltstack, described in this ticket: https://github.com/saltstack/salt/issues/52159
I get this error trying to run boto_rds.replica_present
, which is in our rds.sls state:
----------
ID: create_mitxpro-qa_mitxproqa_rds_replica
Function: boto_rds.replica_present
Name: mitxpro-qa-rds-mariadb-mitxproqa-replica
Result: False
Comment: An exception occurred in this state: Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/salt/state.py", line 1933, in call
**cdata['kwargs'])
File "/usr/lib/python2.7/dist-packages/salt/loader.py", line 1951, in wrapper
return f(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/salt/states/boto_rds.py", line 399, in replica_present
keyid, profile)
File "/var/cache/salt/master/extmods/modules/boto_rds.py", line 337, in create_read_replica
if not backup_retention_period:
NameError: global name 'backup_retention_period' is not defined
Started: 15:36:43.486771
Duration: 43.794 ms
Changes:
That issue was filed in May, but nobody's been acting on getting its PR merged. The reviewer requested tests to be added, but the contributor has not followed up.
@blarghmatey has informed me that we're running our own fork of this module, so we will be able to fix it ourselves. We'll also have to investigate what changes we can sync from their upstream.
I've created https://github.com/mitodl/salt-extensions/pull/47 to fix the issue above.
So far, we have an almost-ideal RDS state in orchestrate.aws.rds
. It will create a new read replica and a new parameter group. It will update an existing parameter group with custom properties. The only thing that it fails to do (presumably an issue with the boto_rds.present
state) is to apply a different parameter group to an existing RDS database.
@blarghmatey Is this considered done?
Not yet because we have some inconsistencies in naming due to legacy issues which requires some planning for migrating the names to be in line with the current standards. It also coincides with the need to update some of the Vault mounts.
In order to avoid load issues from BI and problems with locks that block migrations we want to have read replicas configured (at least until we get some ETL set up).