Closed MikeTheCanuck closed 5 years ago
Maybe the only workable solution is to take a .backup
to my laptop via PGAdmin from each database, then restore to a new PostgreSQL server, then generate the plain-text compressed backup from there?
My recollection is that these apps were deployed with some combination of Vagrant and Ansible and custom shell scripts. All of that should be somewhere in GitHub and could in theory be duplicated with fresh credentials.
I think there's a repo with code for building local services for Transportation; I was running it for a couple of weeks on my laptop.
So I believe i have a backup of the emergency response pre-demo day last year, which should be around 9.6.6 on my old mac. Haven't actually updated alot on this, though not sure if it is plain text.
it was run through pg_dump command to export a db which i could use within a docker container, running 9.6.6. there was a change to schema within last year but i do believe this was a sql command. provided we have a system to run sql updates on our database, this should be an option? thinking maybe we check sql scripts into github, move to s3 and run in similar fashion to the initial database loads?
Can check when home.
(Will willfully ignore the PostGIS dependencies for moment in both the db and API - https://github.com/hackoregon/emergency-response-backend/blob/master/data/models.py)
@bhgrant8 As long as it's a lower release number than 9.6.8 I can probably ingest it with the PostGIS container and reformat it so the AWS instance can deal with it. The PostGIS issue could be solved by a "DROP EXTENSION postgis CASCADE;" which would remove all geometry columns and indexes.
I have the geocoder database running; I hit whatever the Census Bureau sets for a blacklist yesterday but I can run it in a coffee shop and hand you the backup file. The last I heard Disaster Resilience wants to do geocoding but I don't know how they want to deploy it.
I can probably re-create last year's Transportation database as well. I know where the code is, and I had a local development environment running in the repo. Most of it we pulled at startup time from Portland Maps APIs.
Well, may be a few months pre demo, but:
-- Dumped from database version 9.5.5 -- Dumped by pg_dump version 9.6.1
-- Started on 2017-03-15 09:29:43 PDT
have 2 dump files, one gives me gobblygook in mac's textedit, one is readable, thinking the readable is a plain text dump? about 525mb
Try pg_restore --list <file>
on the one that the text editor doesn't show. If it's in pg_dump
format you should get a list of what's on it.
ok, based on pg_restore --list <file>
got one from April:
; Archive created at 2017-04-01 16:26:09 PDT ; dbname: fire ; TOC Entries: 278 ; Compression: -1 ; Dump Version: 1.12-0 ; Format: CUSTOM ; Integer: 4 bytes ; Offset: 8 bytes ; Dumped from database version: 9.5.5 ; Dumped by pg_dump version: 9.6.1
Then it should be restoreable with pg_restore
. Try pg_restore --no-owner -f <file>
. Or just upload it to Google Drive and send me a link. ;-)
At this point I have been unable to get a copy of the SSH keys for the legacy PostgreSQL machines, so my next-best option for a database dump is to connect to each database service as a user with privilege to dump. The postgres
user works great for every database service - and I've been able to grab dumps for four of the five database servers - except, of course, for emergency-response, where someone has locked down pg_hba.conf
for the postgres
user so that when authenticating via PGAdmin4 to the "ER" instance, I get this error:
Unable to connect to server:
FATAL: no pg_hba.conf entry for host "67.5.167.3", user "postgres", database "postgres", SSL on
FATAL: no pg_hba.conf entry for host "67.5.167.3", user "postgres", database "postgres", SSL off
I can authenticate as the eruser
and eradmin
users, but unfortunately neither of those accounts possess the "Can initiate streaming replication and backups" privilege.
Theoretically we could take copies of each table one by one - if only these databases (disaster
, fire
and police
) weren't so well-normalized, this would be trivial (10, 23 and 48 tables respectively).
@MikeTheCanuck Can you log in with psql
and do \copy
to a CSV file on your laptop? If you can do that, you can list all the tables and \copy
them.
The Emergency Response database server is now accessible. Meghan was able to dig up a copy of the hackoregon-postgres-pem.pem
that's used to SSH into the EC2 machine hosting the database; and notes she dug up hinted that the configured login user for that machine is ubuntu
not ec2-user
. With that, I'm in, and could reconfigure pg_hba.conf
to give the postgres
DB user the ability to login remotely (e.g. using PGAdmin).
So now I'm ready to be able to take database dumps of the three database instances on that server.
I can probably re-create the database from last year's transportation application. I have some scripts that build a local development environment in the GitHub repo; they import the data with some Python code. It may require PostGIS though.
I tried and failed to restore the database backups that I'd previously acquired, so after some experiments I determined that it was best for me to do this the same way Ed has instructed our 2018 developers to backup & restore: https://github.com/hackoregon/civic-devops/blob/master/docs/HOWTO-create-backup-for-new-database-creation.md
So here's what I did for e.g. the Budget database:
budget_user
(that owns the legacy budget
database) on the 2018 DB server sudo -u postgres createuser --encrypted --pwprompt --no-createdb --no-createrole --no-superuser --no-replication budget_user
ssh -i ~/.ssh/hackoregon-postgres-pem.pem ubuntu@(legacy-EC2-host-IP)
pg_dump -Fp -v -C -c --if-exists -d budget | gzip -c > budget.sql.gz
exit
scp -i ~/.ssh/hackoregon-postgres-pem.pem ubuntu@(legacy-EC2-host-IP):~/budget.sql.gz .
scp -i ~/.ssh/hackoregon-2018-database-dev-env.pem budget.sql.gz ec2-user@(current-EC2-host-IP):~
ssh -i ~/.ssh/hackoregon-2018-database-dev-env.pem ec2-user@(current-EC2-host-IP)
gzip -dc budget.sql.gz | sudo -u postgres psql
Then swap in the new DB host IP into each 2017 project's project_config.py
:
and Restart Build for each project's last build in Travis (to acquire the updated project_config.py file):
However, moving the budget database to the new PostgreSQL server ran into a snag, because the team-budget
project's tests depend on the presence of a test_budget
database, and the Travis job fails without it:
Using existing test database for alias 'default'...
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/django/db/backends/base/base.py", line 199, in ensure_connection
self.connect()
File "/usr/local/lib/python3.5/site-packages/django/db/backends/base/base.py", line 171, in connect
self.connection = self.get_new_connection(conn_params)
File "/usr/local/lib/python3.5/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection
connection = Database.connect(**conn_params)
File "/usr/local/lib/python3.5/site-packages/psycopg2/__init__.py", line 130, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: FATAL: database "test_budget" does not exist
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "manage.py", line 22, in <module>
execute_from_command_line(sys.argv)
File "/usr/local/lib/python3.5/site-packages/django/core/management/__init__.py", line 367, in execute_from_command_line
utility.execute()
File "/usr/local/lib/python3.5/site-packages/django/core/management/__init__.py", line 359, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/local/lib/python3.5/site-packages/django/core/management/commands/test.py", line 29, in run_from_argv
super(Command, self).run_from_argv(argv)
File "/usr/local/lib/python3.5/site-packages/django/core/management/base.py", line 294, in run_from_argv
self.execute(*args, **cmd_options)
File "/usr/local/lib/python3.5/site-packages/django/core/management/base.py", line 345, in execute
output = self.handle(*args, **options)
File "/usr/local/lib/python3.5/site-packages/django/core/management/commands/test.py", line 72, in handle
failures = test_runner.run_tests(test_labels)
File "/usr/local/lib/python3.5/site-packages/django/test/runner.py", line 549, in run_tests
old_config = self.setup_databases()
File "/usr/local/lib/python3.5/site-packages/django/test/runner.py", line 499, in setup_databases
self.parallel, **kwargs
File "/usr/local/lib/python3.5/site-packages/django/test/runner.py", line 743, in setup_databases
serialize=connection.settings_dict.get("TEST", {}).get("SERIALIZE", True),
File "/usr/local/lib/python3.5/site-packages/django/db/backends/base/creation.py", line 70, in create_test_db
run_syncdb=True,
File "/usr/local/lib/python3.5/site-packages/django/core/management/__init__.py", line 130, in call_command
return command.execute(*args, **defaults)
File "/usr/local/lib/python3.5/site-packages/django/core/management/base.py", line 345, in execute
output = self.handle(*args, **options)
File "/usr/local/lib/python3.5/site-packages/django/core/management/commands/migrate.py", line 83, in handle
executor = MigrationExecutor(connection, self.migration_progress_callback)
File "/usr/local/lib/python3.5/site-packages/django/db/migrations/executor.py", line 20, in __init__
self.loader = MigrationLoader(self.connection)
File "/usr/local/lib/python3.5/site-packages/django/db/migrations/loader.py", line 52, in __init__
self.build_graph()
File "/usr/local/lib/python3.5/site-packages/django/db/migrations/loader.py", line 203, in build_graph
self.applied_migrations = recorder.applied_migrations()
File "/usr/local/lib/python3.5/site-packages/django/db/migrations/recorder.py", line 65, in applied_migrations
self.ensure_schema()
File "/usr/local/lib/python3.5/site-packages/django/db/migrations/recorder.py", line 52, in ensure_schema
if self.Migration._meta.db_table in self.connection.introspection.table_names(self.connection.cursor()):
File "/usr/local/lib/python3.5/site-packages/django/db/backends/base/base.py", line 233, in cursor
cursor = self.make_cursor(self._cursor())
File "/usr/local/lib/python3.5/site-packages/django/db/backends/base/base.py", line 204, in _cursor
self.ensure_connection()
File "/usr/local/lib/python3.5/site-packages/django/db/backends/base/base.py", line 199, in ensure_connection
self.connect()
File "/usr/local/lib/python3.5/site-packages/django/db/utils.py", line 94, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/usr/local/lib/python3.5/site-packages/django/utils/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.5/site-packages/django/db/backends/base/base.py", line 199, in ensure_connection
self.connect()
File "/usr/local/lib/python3.5/site-packages/django/db/backends/base/base.py", line 171, in connect
self.connection = self.get_new_connection(conn_params)
File "/usr/local/lib/python3.5/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection
connection = Database.connect(**conn_params)
File "/usr/local/lib/python3.5/site-packages/psycopg2/__init__.py", line 130, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: FATAL: database "test_budget" does not exist
The command "./budget_proj/bin/test-proj.sh -t" exited with 1.
Done. Your build exited with 1.
I had a look at the rest of the 2017 API projects' databases and both Emergency Response and Homelessness depend on a test database; Transportation does no testing, and I couldn't figure what Housing does. Going to have to check on those three 2017 projects, or perhaps look at the database-less approach that Brian concocted this year, to get around this dependency.
See https://github.com/hackoregon/emergency-response-backend/pull/119 for a great attempt (perhaps successful, too soon to tell) at getting rid of the test DB requirement.
See #205 for issues I'm having trying to restore the database backups to PostgreSQL 9.5 server (testing the process and data before attempting this in production).
Each of the 2017 projects' databases are running in their own relatively-costly separate EC2 instances. We intend to migrate these databases to a single PostgreSQL service running on a single
t2.large
EC2 instance.To migrate to the 2018 PostgreSQL service we'll need backups in compressed plain-text format:
to get this type of backup from each database, we'll need a remote shell running on each EC2 instance (to run
pg_dump
with very specific parameters documented here)To get a remote shell we'll need the SSH keys originally injected into the instance, which is
hackoregon-postgres-pem
- however, this is not something I personally have a copy of, nor do I know who might have a copyOne way around this is to "clone" the server using the "Launch more like this" Action from the EC2 console - however, when I attempt this it returns an error saying, "We cannot proceed with your requested configuration. You cannot use this AMI (ami-db00bdbb). This AMI has been deregistered, or you do not have permission to use it. Try again with another AMI for which you have permissions, or request permission to use this AMI from its owner."
Help Wanted
At this point I'll either need:
A copy of the original
hackoregon-postgres-pem
Permission to the original AMI (which may be something that AWS has just stopped supporting, or might be something HackOregon had and later deleted from its registry)
an equivalent SQL command I can run from PGAdmin that creates a plain-text compressed backup