hackoregon / civic-devops

Master collection point for issues, procedures, and code to manage the HackOregon Civic platform
MIT License
11 stars 4 forks source link

Create jumpbox on EC2 to facilitate database restores from S3 #233

Closed MikeTheCanuck closed 5 years ago

MikeTheCanuck commented 5 years ago

Instructions: Replace text below with details corresponding to your story

Summary

Enable data managers on 2019 projects to load data to their databases to their project's staging RDS instance from the S3 hacko-data-archives bucket.

Tasks

Definition of Done

Each project's data manager has the SSH keys necessary to login to the jumpbox and run whatever commands they'd like to use for data loading to their RDS instance.

MikeTheCanuck commented 5 years ago

Proposed process for data managers:

DingoEatingFuzz commented 5 years ago

I've got some questions and concerns.

  1. Long-lived ssh keys. Should there be a rotation policy for these? How do we plan on distributing the keys?
  2. Jumpbox clean up. The proposed process doesn't include deleting the db backup from the jumpbox.
  3. Repeatability. How is the jumpbox configured? How do we recreate the jumpbox? Will this be in Cloud Formation?
  4. Performance. Does the db backup need to be copied from s3 to the jumpbox or can the postgres restore command restore directly from s3?
  5. Principle of least authority. Is there anyway to not require ssh access to an ec2 vm for this? I'd feel much more comfortable if this used IAM somehow. It's easier to revoke an individual's privilege with IAM vs. generate and distributing new ssh keys.
znmeb commented 5 years ago

On 4., can a Linux command line tool mount an S3 bucket onto its filesystem or does it need to copy the file?

Also, can this be done via a Lambda? Does a Lambda Python function have access to a reasonable Linux underbelly - psql and gzip are all we'd need.

znmeb commented 5 years ago

Now that I think of this, we could just fire up a container to do the restore! All we need to do is figure out how to get the secrets (PostgreSQL and S3 credentials) into the container. There's no need for a jump box, right??

znmeb commented 5 years ago

I'll take on the documentation / testing of the PostgreSQL backup creation and restore process.

MikeTheCanuck commented 5 years ago

How was the jumpbox created, configured?

Create

Launch an EC2 instance from the default Amazon Linux 2 AMI, using default settings except as follows:

Configure

znmeb commented 5 years ago

What version of PostgreSQL is on Amazon Linux now? It needs to be 11 to be compatible with RDS and the backup files. If it's lower than 11, it might be easier to install Docker hosting and run the restores from a container running PostgreSQL 11 than it is to build another box with Debian or install PostgreSQL 11 from PGDG.

znmeb commented 5 years ago

I just finished testing / upgrading the jump box. Original version (default) is PostgreSQL 10 - we need 11. Here's the script:

#! /bin/bash -v

# see https://stackoverflow.com/questions/55798856/deploy-postgres11-to-elastic-beanstalk-requires-etc-redhat-release
# for this - the third answer is what we use
rpm -Uvh --nodeps https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-6-x86_64/pgdg-redhat-repo-latest.noarch.rpm
sed -i "s/rhel-\$releasever-\$basearch/rhel-7.6-x86_64/g" "/etc/yum.repos.d/pgdg-redhat-all.repo"
# verify that the repository is live
yum update
# which version is installed?
pg_restore --version
yum list installed | grep postgresql
# get rid of the old one
yum remove postgresql postgresql-libs
# install PostgreSQL 11
yum install postgresql11 postgresql11-libs
# list again
yum list installed | grep postgresql
# the binaries aren't on the search PATH! Fix that by adding
# a script that runs when you log in
echo 'PATH=$PATH:/usr/pgsql-11/bin/; export PATH' > /etc/profile.d/postgresql11.sh
# source it and test it
. /etc/profile.d/postgresql11.sh
which pg_restore 
pg_restore --version
echo "Final test - log out and in again and do 'pg_restore --version'"

Our jump box now goes to 11!