hackoregon / data-science-pet-containers

Pet Containers for Data Scientists
MIT License
11 stars 1 forks source link

Dump file restore failures #34

Closed znmeb closed 6 years ago

znmeb commented 6 years ago

Documentation TBD

bhgrant8 commented 6 years ago

@znmeb, @MikeTheCanuck might have further details but it appears restoring the pg_dump files of the non-postgres database provided onto the Amazon EC2 development database ran into this error:

pg_restore: [archiver] unsupported version (1.13) in file header

A quick search pointed to this recent stackoverflow discussion:

https://stackoverflow.com/questions/49064209/getting-archiver-unsupported-version-1-13-in-file-header-when-running-pg-r

which led to this security update post: https://www.postgresql.org/about/news/1834/

As per Mike, we are currently on PostgreSQL 9.6.6 based on what is in the Amazon AMI, so this seems to be explanation of issue. Question is whether we can create a 9.6.6 compatible pg_dump to allow progress in data hosting to continue while we wait("hope?") that Amazon upgrades the AMI.

MikeTheCanuck commented 6 years ago

As of last night, Amazon's repo is still distributing version 9.6.6 of PostgreSQL to the Amazon Linux AMIs. I am unfamiliar at this point with Amazon's SLA for incorporating new versions into their repos.

znmeb commented 6 years ago

Interesting - I set up a local Amazon Linux 2 PostgreSQL 9.6.6 container and got a different error message. I did get raw SQL dumps from 10.3 to restore to 9.6.6.

@bhgrant8 @MikeTheCanuck My plan going forward is to switch from using custom format to plain text format in the pg_dump operations. plain text format is a sequence of SQL commands; custom format is something designed for pg_dump / pg_restore operations. That will require documentation of two facts:

  1. The restore can't be done with pg_restore or with pgAdmin, which uses pg_restore. You have to restore the file by piping it to psql using the postgres Linux account at the command line. This is probably the least impact on DevOps; only the database restore scripts need to be changed.
  2. Various compatibilities must exist between the source database for the dump and the destination database for the restore:
    • If an object in the source is owned by a role, that same role must exist in the destination. The creator of the backup must make sure the objects are owned by the correct role in the source before executing pg_dump. The project teams will need to ask DevOps for the database owner role names and create their databases with the same owners. pg_restore doesn't have this restriction because it has the option to ignore object ownership during the restore.
    • The SQL in the dump must not include functionality that does not exist in the destination database. The tests I tried on the two Transportation Systems databases did not run into this, but I don't as yet have a list of any potential SQL-level conflicts between source 10.3 and destination 9.6.6.
    • If you generate a compressed backup on the source, the equivalent uncompress facility must exist on the destination server. My plan is to use gzip; if that's not in Amazon Linux 2 it will need to be installed.

I am also planning to include a base-level Amazon Linux 2 / PostgreSQL 9.6 image in this toolset for testing of scripts, dumps, etc. I am experimenting with conversion scripts that run on the Docker host to convert dumps between the two versions; essentially you do a docker exec -u postgres with the appropriate pg_dump and pg_restore commands, shuffling the files around with docker cp.

As far as Amazon's SLA for Amazon Linux 2 is concerned, I don't think anything that wasn't there last night can be expected to appear before demo day. It took them a couple of days just to catch up with the March 1 security update!

Plan B is to drop back to PostgreSQL 9.6 for data-science-pet-containers and our other database container in the API. That works for Transportation Systems but I don't know what the impact will be on other teams using this toolset. If we go this route it will take me a full day to test everything, and we may lose some functionality.

I think dropping back to PostgreSQL 9.6 is the right approach for Transportation Systems but I can't comment on other teams. And we'll need to come up with a plan for the API with @kiniadit no matter how we deal with this issue. The effort for dropping back to 9.6.6 is a bit more than reformatting the backups, but there's more risk of incompatibilities I haven't outlined above with the reformatted backups.

MikeTheCanuck commented 6 years ago

Hi Ed, given the results you reported:

Interesting - I set up a local Amazon Linux 2 PostgreSQL 9.6.6 container and got a different error message. I did get raw SQL dumps from 10.3 to restore to 9.6.6.

Could you outline the exact command sequence that got you to the different error message? This script is the one we use to get a database service running, and this one generates the database instance

znmeb commented 6 years ago

Hmpf ... now I'm getting the same error message you got. Last night I got the message that pg_restore couldn't restore a 10.3 backup to a 9.6.6 server, which is the behavior I expected.

Here's the sequence that works:

# On the Docker host:

# Copy the 10.3 backup into the PostGIS (PostgreSQL 10.3) container
$ docker cp /d/Sync/Hack_Oregon/odot_crash_data_nogis.backup containers_postgis_1:/

# log in to the PostGIS container
$ docker exec -it -u postgres containers_postgis_1 /bin/bash

# now we're in the postgis (10.3 + PostGIS) container

postgres@f4e5e1085a33:/$ cd /var/lib/postgresql # postgres can write here!
postgres@f4e5e1085a33:/$ createdb odot_crash_data_nogis

# restore the original database
postgres@f4e5e1085a33:/$ pg_restore -d odot_crash_data_nogis --no-owner /odot_crash_data_nogis.backup

# create a plain-text backup
postgres@f4e5e1085a33:/$ pg_dump -Fp --no-owner -d odot_crash_data_nogis > odot_crash_data_nogis.sql

# exit back to Docker host
postgres@f4e5e1085a33:/$ exit

# back on Docker host
# copy the SQL dump out
$ docker cp containers_postgis_1:/var/lib/postgresql/odot_crash_data_nogis.sql .
# copy it to the Amazon Linux 2 / PostgreSQL 9.6.6 container
$ docker cp odot_crash_data_nogis.sql containers_amazon_1:/

# log into the Amazon container from Docker host
docker exec -it -u postgres containers_amazon_1 /bin/bash

# now we're `postgres` inside the Amazon container

# restore the plain-text dump
bash-4.2$ psql < /odot_crash_data_nogis.sql
SET
SET
SET
SET
SET
 set_config 
------------

(1 row)

[snip]
ALTER TABLE
ALTER TABLE
ALTER TABLE

# done - go back to the Docker host
bash-4.2$ 

The plan is to

  1. Create the transportation-systems role in the 10,3 container.
  2. Restore the databases with that owner.
  3. Create the plain-text dumps and test them in the Amazon Linux / PostgreSQL 9.6 container.
  4. Hand them off to DevOps with a script to restore them. It will take me a couple of hours to make sure all the permissions and ownership details are correct; the above example did everything as the database superuser.
  5. Rewrite the data ingestion code to create the SQL dumps directly, probably using the Amazon container as the server.
znmeb commented 6 years ago

Dropping back to 9.6 for the saves and switching to plain-text format both seem to be necessary for restores to the Amazon Linux 2 PostgreSQL 9.6.6 service. I'll give a custom format backup one last chance on the Docker image, but we need to move on to troubleshooting our own software, not Amazon's.

znmeb commented 6 years ago

I'm closing this given that compressed plain-text backups (.sql.gz) are working on the Amazon Linux 2 image and at least one has restored on the live server. We are continuing to track this issue - see https://github.com/hackoregon/civic-devops/pull/44 for the documentation effort.