Closed MikeTheCanuck closed 5 years ago
Proposed process for data managers:
mv hackoregon-2019-db-restore-jumpbox.pem ~/.ssh
chmod 400 ~/.ssh/hackoregon-2019-db-restore-jumpbox.pem
ssh -i ~/.ssh/hackoregon-2019-db-restore-jumpbox.pem ec2-user@ec2-34-220-186-62.us-west-2.compute.amazonaws.com
to get into the jumpboxaws s3 cp s3://hacko-data-archive/(team folder)/(backups folder)/(backup file name) /backups
s3://hacko-data-archive/2017-team-budget/database-backup/budget.sql.gz /backups
rm -rf /backups/*.*
I've got some questions and concerns.
On 4., can a Linux command line tool mount an S3 bucket onto its filesystem or does it need to copy the file?
Also, can this be done via a Lambda? Does a Lambda Python function have access to a reasonable Linux underbelly - psql and gzip are all we'd need.
Now that I think of this, we could just fire up a container to do the restore! All we need to do is figure out how to get the secrets (PostgreSQL and S3 credentials) into the container. There's no need for a jump box, right??
I'll take on the documentation / testing of the PostgreSQL backup creation and restore process.
How was the jumpbox created, configured?
Launch an EC2 instance from the default Amazon Linux 2 AMI, using default settings except as follows:
amazon-linux-extras install postgresql
sudo mkfs -t ext4 /dev/sdb
sudo mkdir /backups
sudo mount -t ext4 /dev/sdb /backups
sudo chown -R ec2-user:ec2-user /backups
, then sudo chmod 700 /backups
What version of PostgreSQL is on Amazon Linux now? It needs to be 11 to be compatible with RDS and the backup files. If it's lower than 11, it might be easier to install Docker hosting and run the restores from a container running PostgreSQL 11 than it is to build another box with Debian or install PostgreSQL 11 from PGDG.
I just finished testing / upgrading the jump box. Original version (default) is PostgreSQL 10 - we need 11. Here's the script:
#! /bin/bash -v
# see https://stackoverflow.com/questions/55798856/deploy-postgres11-to-elastic-beanstalk-requires-etc-redhat-release
# for this - the third answer is what we use
rpm -Uvh --nodeps https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-6-x86_64/pgdg-redhat-repo-latest.noarch.rpm
sed -i "s/rhel-\$releasever-\$basearch/rhel-7.6-x86_64/g" "/etc/yum.repos.d/pgdg-redhat-all.repo"
# verify that the repository is live
yum update
# which version is installed?
pg_restore --version
yum list installed | grep postgresql
# get rid of the old one
yum remove postgresql postgresql-libs
# install PostgreSQL 11
yum install postgresql11 postgresql11-libs
# list again
yum list installed | grep postgresql
# the binaries aren't on the search PATH! Fix that by adding
# a script that runs when you log in
echo 'PATH=$PATH:/usr/pgsql-11/bin/; export PATH' > /etc/profile.d/postgresql11.sh
# source it and test it
. /etc/profile.d/postgresql11.sh
which pg_restore
pg_restore --version
echo "Final test - log out and in again and do 'pg_restore --version'"
Our jump box now goes to 11!
Instructions: Replace text below with details corresponding to your story
Summary
Enable data managers on 2019 projects to load data to their databases to their project's staging RDS instance from the S3 hacko-data-archives bucket.
Tasks
postgresql
so that typical tools can be used, and assigning a reasonably-sized secondary data volume to which users can copy their database backupsaws s3 cp
) data from the hacko-data-archive bucketDefinition of Done
Each project's data manager has the SSH keys necessary to login to the jumpbox and run whatever commands they'd like to use for data loading to their RDS instance.