hackoregon / civic-devops

Master collection point for issues, procedures, and code to manage the HackOregon Civic platform
MIT License
11 stars 4 forks source link

Attach a data volume to store postgres database dumps #174

Closed MikeTheCanuck closed 5 years ago

MikeTheCanuck commented 6 years ago

We've been running extremely lean on the storage volumes attached to the Postgres database server in EC2 - leaving only a few GB for incremental data volume increases, not enough room on either the root or data volumes for caching large (multi-GB) upload of database dumps while we drop & restore a later backup of the database (submitted to us by developers).

I'd tried using a "streaming restore" technique, whereby we pull in the database dump directly as a stdin input to the psql command, and that too was a failure. This command is the one that was closest to succeeding:

aws s3 cp s3://hacko-data-archive/2018-housing-affordability/database-backup/housing-affordability.sql.gz - | gzip -dc | sudo -u postgres psql

But the output included tons of errors that don't give me hope that we're getting a fully-intact database replica at the end:

$ aws s3 cp s3://hacko-data-archive/2018-housing-affordability/database-backup/housing-affordability.sql.gz - | gzip -dc | sudo -u postgres psql
SET
SET
SET
SET
SET
SET
SET
SET
ERROR:  database "housing-affordability" is being accessed by other users
DETAIL:  There are 8 other sessions using the database.
ERROR:  database "housing-affordability" already exists
ALTER DATABASE
You are now connected to database "housing-affordability" as user "postgres".
SET
SET
SET
SET
SET
SET
SET
SET
CREATE EXTENSION
COMMENT
CREATE EXTENSION
COMMENT
SET
SET
SET
ERROR:  relation "api_hudhicdata" already exists
ALTER TABLE
ERROR:  relation "api_hudhicdata_id_seq" already exists
ALTER TABLE
ALTER SEQUENCE
ERROR:  relation "api_hudpitdata" already exists
ALTER TABLE
ERROR:  relation "api_hudpitdata_id_seq" already exists
ALTER TABLE
ALTER SEQUENCE
ERROR:  relation "api_jchsdata" already exists
ALTER TABLE
ERROR:  relation "api_jchsdata_id_seq" already exists
ALTER TABLE
ALTER SEQUENCE
ERROR:  relation "api_permitdata" already exists
ALTER TABLE
ERROR:  relation "api_permitdata_id_seq" already exists
ALTER TABLE
ALTER SEQUENCE
ERROR:  relation "api_policy" already exists
ALTER TABLE
ERROR:  relation "api_program" already exists
ALTER TABLE
ERROR:  relation "api_program_id_seq" already exists
ALTER TABLE
ALTER SEQUENCE
ERROR:  relation "api_taxlotdata" already exists
ALTER TABLE
ERROR:  relation "api_taxlotdata_id_seq" already exists
ALTER TABLE
ALTER SEQUENCE
ERROR:  relation "api_urbaninstituterentalcrisisdata" already exists
ALTER TABLE
ERROR:  relation "api_urbaninstituterentalcrisisdata_id_seq" already exists
ALTER TABLE
ALTER SEQUENCE
ERROR:  relation "auth_group" already exists
ALTER TABLE
ERROR:  relation "auth_group_id_seq" already exists
ALTER TABLE
ALTER SEQUENCE
ERROR:  relation "auth_group_permissions" already exists
ALTER TABLE
ERROR:  relation "auth_group_permissions_id_seq" already exists
ALTER TABLE
ALTER SEQUENCE
ERROR:  relation "auth_permission" already exists
ALTER TABLE
ERROR:  relation "auth_permission_id_seq" already exists
ALTER TABLE
ALTER SEQUENCE
ERROR:  relation "auth_user" already exists
ALTER TABLE
ERROR:  relation "auth_user_groups" already exists
ALTER TABLE
ERROR:  relation "auth_user_groups_id_seq" already exists
ALTER TABLE
ALTER SEQUENCE
ERROR:  relation "auth_user_id_seq" already exists
ALTER TABLE
ALTER SEQUENCE
ERROR:  relation "auth_user_user_permissions" already exists
ALTER TABLE
ERROR:  relation "auth_user_user_permissions_id_seq" already exists
ALTER TABLE
ALTER SEQUENCE
ERROR:  relation "django_content_type" already exists
ALTER TABLE
ERROR:  relation "django_content_type_id_seq" already exists
ALTER TABLE
ALTER SEQUENCE
ERROR:  relation "django_migrations" already exists
ALTER TABLE
ERROR:  relation "django_migrations_id_seq" already exists
ALTER TABLE
ALTER SEQUENCE
ERROR:  relation "django_session" already exists
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
COPY 288363
 setval 
--------
 288363
(1 row)

COPY 117898
 setval 
--------
 117898
(1 row)

COPY 50858
 setval 
--------
  50858
(1 row)

COPY 27794
 setval 
--------
  27794
(1 row)

ERROR:  duplicate key value violates unique constraint "api_policy_pkey"
DETAIL:  Key (policy_id)=(ADU) already exists.
CONTEXT:  COPY api_policy, line 1
COPY 208
 setval 
--------
    623
(1 row)

ERROR:  column "percent_change" of relation "api_taxlotdata" does not exist
invalid command \N
invalid command \N
invalid command \N
invalid command \N
...(hundreds of thousands of times)...
invalid command \N
ERROR:  syntax error at or near "1"

Give up and add a volume

So I've finally given in, and attached a new 20GB volume to the server by hand, per these instructions: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-attaching-volume.html

Placing the volume in us-west-2c - same AZ as the database server...

Then mounting it from an ssh shell per these instructions: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html

MikeTheCanuck commented 6 years ago

Then made the volume world-writeable so that I can just wget the database dump files directly to the /upload directory:

sudo chmod 777 /upload