Create jumpbox on EC2 to facilitate database restores from S3

MikeTheCanuck commented 5 years ago

Instructions: Replace text below with details corresponding to your story

Summary

Enable data managers on 2019 projects to load data to their databases to their project's staging RDS instance from the S3 hacko-data-archives bucket.

Tasks

[x] Create and test the EC2 jumpbox manually, installing postgresql so that typical tools can be used, and assigning a reasonably-sized secondary data volume to which users can copy their database backups
[x] Generate and apply an associated IAM Policy and IAM Role to transparently allow users on the jumpbox to read (aws s3 cp) data from the hacko-data-archive bucket
[x] distribute a shared SSH key to the data managers for each 2019 project
[x] document the essential procedure for getting to the point of being able to load data to a project's RDS instance

Definition of Done

Each project's data manager has the SSH keys necessary to login to the jumpbox and run whatever commands they'd like to use for data loading to their RDS instance.

MikeTheCanuck commented 5 years ago

Proposed process for data managers:

Download the pem then run mv hackoregon-2019-db-restore-jumpbox.pem ~/.ssh
Run chmod 400 ~/.ssh/hackoregon-2019-db-restore-jumpbox.pem
Run ssh -i ~/.ssh/hackoregon-2019-db-restore-jumpbox.pem ec2-user@ec2-34-220-186-62.us-west-2.compute.amazonaws.com to get into the jumpbox
Copy your backup from the S3 location with a command of the form aws s3 cp s3://hacko-data-archive/(team folder)/(backups folder)/(backup file name) /backups
- e.g. s3://hacko-data-archive/2017-team-budget/database-backup/budget.sql.gz /backups
Run (your Postgres restore command to the RDS instance for your team)
Cleanup your backup! rm -rf /backups/*.*

DingoEatingFuzz commented 5 years ago

I've got some questions and concerns.

Long-lived ssh keys. Should there be a rotation policy for these? How do we plan on distributing the keys?
Jumpbox clean up. The proposed process doesn't include deleting the db backup from the jumpbox.
Repeatability. How is the jumpbox configured? How do we recreate the jumpbox? Will this be in Cloud Formation?
Performance. Does the db backup need to be copied from s3 to the jumpbox or can the postgres restore command restore directly from s3?
Principle of least authority. Is there anyway to not require ssh access to an ec2 vm for this? I'd feel much more comfortable if this used IAM somehow. It's easier to revoke an individual's privilege with IAM vs. generate and distributing new ssh keys.

znmeb commented 5 years ago

On 4., can a Linux command line tool mount an S3 bucket onto its filesystem or does it need to copy the file?

Also, can this be done via a Lambda? Does a Lambda Python function have access to a reasonable Linux underbelly - psql and gzip are all we'd need.

znmeb commented 5 years ago

Now that I think of this, we could just fire up a container to do the restore! All we need to do is figure out how to get the secrets (PostgreSQL and S3 credentials) into the container. There's no need for a jump box, right??

znmeb commented 5 years ago

I'll take on the documentation / testing of the PostgreSQL backup creation and restore process.

MikeTheCanuck commented 5 years ago

How was the jumpbox created, configured?

Create

Launch an EC2 instance from the default Amazon Linux 2 AMI, using default settings except as follows:

Instance Type: t2.nano
Network: vpc-b841bedf | public-database (default)
IAM role: S3-read-data-buckets-from-EC2-jumpbox
Termination protection: true
Add storage: EBS, /dev/sdb, Size = 50 GB
Select an existing security group: SSH access

Configure

Install PostgreSQL tools by running amazon-linux-extras install postgresql
Create a filesystem on the backups volume as sudo mkfs -t ext4 /dev/sdb
Make a folder for the backups volume with sudo mkdir /backups
Mount the secondary volume as /backups with sudo mount -t ext4 /dev/sdb /backups
Grant permissions for ec2-user to the /backups folder with sudo chown -R ec2-user:ec2-user /backups, then sudo chmod 700 /backups

znmeb commented 5 years ago

What version of PostgreSQL is on Amazon Linux now? It needs to be 11 to be compatible with RDS and the backup files. If it's lower than 11, it might be easier to install Docker hosting and run the restores from a container running PostgreSQL 11 than it is to build another box with Debian or install PostgreSQL 11 from PGDG.

znmeb commented 5 years ago

I just finished testing / upgrading the jump box. Original version (default) is PostgreSQL 10 - we need 11. Here's the script:

#! /bin/bash -v

# see https://stackoverflow.com/questions/55798856/deploy-postgres11-to-elastic-beanstalk-requires-etc-redhat-release
# for this - the third answer is what we use
rpm -Uvh --nodeps https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-6-x86_64/pgdg-redhat-repo-latest.noarch.rpm
sed -i "s/rhel-\$releasever-\$basearch/rhel-7.6-x86_64/g" "/etc/yum.repos.d/pgdg-redhat-all.repo"
# verify that the repository is live
yum update
# which version is installed?
pg_restore --version
yum list installed | grep postgresql
# get rid of the old one
yum remove postgresql postgresql-libs
# install PostgreSQL 11
yum install postgresql11 postgresql11-libs
# list again
yum list installed | grep postgresql
# the binaries aren't on the search PATH! Fix that by adding
# a script that runs when you log in
echo 'PATH=$PATH:/usr/pgsql-11/bin/; export PATH' > /etc/profile.d/postgresql11.sh
# source it and test it
. /etc/profile.d/postgresql11.sh
which pg_restore 
pg_restore --version
echo "Final test - log out and in again and do 'pg_restore --version'"

Our jump box now goes to 11!

hackoregon / civic-devops

Create jumpbox on EC2 to facilitate database restores from S3 #233

Create

Configure