kobotoolbox / kobo-docker

Stack of docker containers to all server-side components of KoBoToolbox (kpi, kobocat, enketo-express).
179 stars 203 forks source link

Split the OCHA production database in two #265

Closed jnm closed 4 years ago

jnm commented 4 years ago

Now that we've gotten through #259, it's time for large-and-in-charge OCHA.

Preparing the in-use database

  1. Installed pglogical and updated Postgres configuration:

    ubuntu@ip-10-0-1-223:~/kobo-docker⟫ pwd
    /home/ubuntu/kobo-docker
    ubuntu@ip-10-0-1-223:~/kobo-docker⟫ git diff
    diff --git a/postgres/shared/init_02_set_postgres_config.sh b/postgres/shared/init_02_set_postgres_config.sh
    index b8fc1bb..9a0dde6 100644
    --- a/postgres/shared/init_02_set_postgres_config.sh
    +++ b/postgres/shared/init_02_set_postgres_config.sh
    @@ -1,6 +1,12 @@
     #!/usr/bin/env bash
    
    -
    +# make sure pglogical is installed each time the container starts
    +test -e /usr/lib/postgresql/9.5/lib/pglogical.so || (
    +    apt-get update
    +    apt-get install -y curl
    +    curl https://access.2ndquadrant.com/api/repository/dl/default/release/deb | bash
    +    apt-get install -y postgresql-9.5-pglogical
    +)
    
     if [ ! -f "$POSTGRES_CONFIG_FILE.orig" ]; then
         echo "Let's keep a copy of current configuration file!"
    @@ -24,4 +30,4 @@ echo "Applying new client authentication configuration file..."
     cp $KOBO_DOCKER_SCRIPTS_DIR/shared/pg_hba.conf "$POSTGRES_CLIENT_AUTH_FILE"
    
     echo "Creating hg_hba config file..."
    -sed -i "s/KOBO_POSTGRES_REPLICATION_USER/${KOBO_POSTGRES_REPLICATION_USER//\"/}/g" "$POSTGRES_CLIENT_AUTH_FILE"
    \ No newline at end of file
    +sed -i "s/KOBO_POSTGRES_REPLICATION_USER/${KOBO_POSTGRES_REPLICATION_USER//\"/}/g" "$POSTGRES_CLIENT_AUTH_FILE"
    diff --git a/postgres/shared/pg_hba.conf b/postgres/shared/pg_hba.conf
    index af2cea2..48bf35e 100644
    --- a/postgres/shared/pg_hba.conf
    +++ b/postgres/shared/pg_hba.conf
    @@ -14,3 +14,5 @@ host    all             all             ::1/128                 trust
    
     host    all all all md5
     host    replication KOBO_POSTGRES_REPLICATION_USER all md5
    +# temporary for two-database split; the superuser has to be able to replicate
    +host    replication kobo all md5
    diff --git a/postgres/shared/postgres.conf b/postgres/shared/postgres.conf
    index 89fe222..0337ab3 100644
    --- a/postgres/shared/postgres.conf
    +++ b/postgres/shared/postgres.conf
    @@ -1,13 +1,24 @@
     #------------------------------------------------------------------------------------
     # REPLICATION
     #------------------------------------------------------------------------------------
    -wal_level = hot_standby
    +#wal_level = hot_standby
    
    -max_wal_senders = 2
    +#max_wal_senders = 2
     wal_keep_segments = 512
    
     hot_standby = on
    
    +### UNCOMMENT ABOVE AND REMOVE BELOW TO RETURN TO NORMALCY ###
    +
    +wal_level = 'logical'
    +max_worker_processes = 10   # one per database needed on provider node
    +                            # one per node needed on subscriber node
    +max_replication_slots = 10  # one per node needed on provider node
    +max_wal_senders = 10        # one per node needed on provider node
    +shared_preload_libraries = 'pglogical'
    +
    +##################### END OF NON-NORMAL STUFF ################
    +
     #------------------------------------------------------------------------------------
     # LOGS
     #------------------------------------------------------------------------------------
  2. Stopped in-use Postgres server with kill -INT per https://www.postgresql.org/docs/9.5/server-shutdown.html
  3. Brought it back up: ubuntu@ip-10-0-1-223:~/kobo-docker⟫ docker-compose -f docker-compose.backend.master.yml -f docker-compose.backend.master.override.yml up -d postgres
  4. Checked https://kobo.humanitarianresponse.info/service_health/; OK
  5. :warning: The reconfiguration broke the secondary; the container is in a restart loop. Problem: hot standby is not possible because max_worker_processes = 8 is a lower setting than on the master server (its value was 10)
    1. Changing this setting on the master didn't help; do the WALs contain the value at the time of their creation?
    2. Upping the setting on the secondary worked (according to select * from pg_stat_replication on the master)
  6. What gives? We set this to 10:
    postgres=# select name, setting, sourcefile from pg_settings where name='max_wal_senders';
          name       | setting |                  sourcefile                   
    -----------------+---------+-----------------------------------------------
     max_wal_senders | 2       | /var/lib/postgresql/data/postgresql.auto.conf
    (1 row)
    1. Apparently /var/lib/postgresql/data/postgresql.auto.conf is created by ALTER SYSTEM? The file was last modified in November of 2018 :unamused:
    2. Trashed it! max_wal_senders was the only setting there. Now the value is 10 as expected (and I figure we might need at least 3)
  7. Created the pglogical node and replication set on the in-use database:

    root@postgres:/# psql -U kobo kobotoolbox
    psql (9.5.16)
    Type "help" for help.
    
    kobotoolbox=# create extension pglogical;
    CREATE EXTENSION
    kobotoolbox=# select pglogical.create_node(
    kobotoolbox(#     node_name := 'combined_db_provider',
    kobotoolbox(#     dsn := 'host=localhost port=5432 dbname=kobotoolbox'
    kobotoolbox(# );
     create_node 
    -------------
       315837889
    (1 row)
    
    kobotoolbox=# select pglogical.create_replication_set('kpi', true, true, true, true);
     create_replication_set 
    ------------------------
                  483997336
    (1 row)
  8. Added all the KPI tables to the replication set:
    select pglogical.replication_set_add_table('kpi', 'spatial_ref_sys');
    select pglogical.replication_set_add_table('kpi', 'django_migrations');
    select pglogical.replication_set_add_table('kpi', 'django_content_type');
    select pglogical.replication_set_add_table('kpi', 'auth_user');
    select pglogical.replication_set_add_table('kpi', 'auth_group');
    select pglogical.replication_set_add_table('kpi', 'auth_permission');
    select pglogical.replication_set_add_table('kpi', 'auth_group_permissions');
    select pglogical.replication_set_add_table('kpi', 'auth_user_groups');
    select pglogical.replication_set_add_table('kpi', 'auth_user_user_permissions');
    select pglogical.replication_set_add_table('kpi', 'constance_config');
    select pglogical.replication_set_add_table('kpi', 'django_celery_beat_periodictasks');
    select pglogical.replication_set_add_table('kpi', 'django_celery_beat_crontabschedule');
    select pglogical.replication_set_add_table('kpi', 'django_celery_beat_intervalschedule');
    select pglogical.replication_set_add_table('kpi', 'django_celery_beat_periodictask');
    select pglogical.replication_set_add_table('kpi', 'django_celery_beat_solarschedule');
    select pglogical.replication_set_add_table('kpi', 'django_admin_log');
    select pglogical.replication_set_add_table('kpi', 'authtoken_token');
    select pglogical.replication_set_add_table('kpi', 'django_digest_partialdigest');
    select pglogical.replication_set_add_table('kpi', 'taggit_tag');
    select pglogical.replication_set_add_table('kpi', 'taggit_taggeditem');
    select pglogical.replication_set_add_table('kpi', 'kpi_collection');
    select pglogical.replication_set_add_table('kpi', 'kpi_asset');
    select pglogical.replication_set_add_table('kpi', 'reversion_revision');
    select pglogical.replication_set_add_table('kpi', 'reversion_version');
    select pglogical.replication_set_add_table('kpi', 'kpi_assetversion');
    select pglogical.replication_set_add_table('kpi', 'kpi_importtask');
    select pglogical.replication_set_add_table('kpi', 'kpi_authorizedapplication');
    select pglogical.replication_set_add_table('kpi', 'kpi_taguid');
    select pglogical.replication_set_add_table('kpi', 'kpi_objectpermission');
    select pglogical.replication_set_add_table('kpi', 'kpi_assetsnapshot');
    select pglogical.replication_set_add_table('kpi', 'kpi_onetimeauthenticationkey');
    select pglogical.replication_set_add_table('kpi', 'kpi_usercollectionsubscription');
    select pglogical.replication_set_add_table('kpi', 'kpi_exporttask');
    select pglogical.replication_set_add_table('kpi', 'kpi_assetfile');
    select pglogical.replication_set_add_table('kpi', 'hub_sitewidemessage');
    select pglogical.replication_set_add_table('kpi', 'hub_configurationfile');
    select pglogical.replication_set_add_table('kpi', 'hub_formbuilderpreference');
    select pglogical.replication_set_add_table('kpi', 'hub_extrauserdetail');
    select pglogical.replication_set_add_table('kpi', 'hub_perusersetting');
    select pglogical.replication_set_add_table('kpi', 'oauth2_provider_application');
    select pglogical.replication_set_add_table('kpi', 'django_session');
    select pglogical.replication_set_add_table('kpi', 'oauth2_provider_accesstoken');
    select pglogical.replication_set_add_table('kpi', 'oauth2_provider_grant');
    select pglogical.replication_set_add_table('kpi', 'django_digest_usernonce');
    select pglogical.replication_set_add_table('kpi', 'oauth2_provider_refreshtoken');
    select pglogical.replication_set_add_table('kpi', 'registration_registrationprofile');
    select pglogical.replication_set_add_table('kpi', 'hook_hook');
    select pglogical.replication_set_add_table('kpi', 'hook_hooklog');
    select pglogical.replication_set_add_table('kpi', 'external_integrations_corsmodel');
    select pglogical.replication_set_add_table('kpi', 'help_inappmessage');
    select pglogical.replication_set_add_table('kpi', 'help_inappmessagefile');
    select pglogical.replication_set_add_table('kpi', 'help_inappmessageuserinteractions');
    select pglogical.replication_set_add_table('kpi', 'kpi_assetuserpartialpermission');
  9. Added all the sequences:
    select pglogical.replication_set_add_sequence('kpi', 'auth_group_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'auth_group_permissions_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'auth_permission_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'auth_user_groups_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'auth_user_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'auth_user_user_permissions_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'constance_config_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'django_admin_log_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'django_celery_beat_crontabschedule_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'django_celery_beat_intervalschedule_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'django_celery_beat_periodictask_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'django_celery_beat_solarschedule_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'django_content_type_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'django_digest_partialdigest_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'django_digest_usernonce_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'django_migrations_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'external_integrations_corsmodel_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'help_inappmessage_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'help_inappmessagefile_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'help_inappmessageuserinteractions_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'hook_hook_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'hook_hooklog_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'hub_configurationfile_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'hub_extrauserdetail_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'hub_formbuilderpreference_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'hub_perusersetting_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'hub_sitewidemessage_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'kpi_asset_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'kpi_assetfile_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'kpi_assetsnapshot_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'kpi_assetuserpartialpermission_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'kpi_assetversion_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'kpi_authorizedapplication_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'kpi_collection_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'kpi_exporttask_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'kpi_importtask_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'kpi_objectpermission_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'kpi_onetimeauthenticationkey_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'kpi_taguid_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'kpi_usercollectionsubscription_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'oauth2_provider_accesstoken_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'oauth2_provider_application_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'oauth2_provider_grant_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'oauth2_provider_refreshtoken_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'registration_registrationprofile_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'reversion_revision_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'reversion_version_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'taggit_tag_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'taggit_taggeditem_id_seq', true);
    select pglogical.replication_set_add_sequence('kpi', 'topology_id_seq', true);

Prepare the new database

  1. Made new 1666 GiB EBS volume (old one was 1333)
  2. Attached to EC2 instance
  3. mkfs.ext4, mounted to /mnt/new
  4. Wrote a minimal Dockerfile that includes pglogical:
    FROM mdillon/postgis:9.5
    RUN apt-get update \
        && apt-get install -y curl \
        && (curl https://access.2ndquadrant.com/api/repository/dl/default/release/deb | bash) \
        && apt-get install -y postgresql-9.5-pglogical
  5. Wrote a minimal Docker Compose file:

    version: '2.2'
    
    services:
      postgres:
        volumes:
          - ./dbvol:/var/lib/postgresql/data
        hostname: newpg
        build: .
  6. docker-compose up -d; docker-compose exec postgres bash
  7. Configured for pglogical:

    root@newpg:/# psql -U postgres
    psql (9.5.16)
    Type "help" for help.
    
    postgres=# alter system set wal_level to 'logical';
    ALTER SYSTEM
    postgres=# alter system set max_worker_processes to 10;
    ALTER SYSTEM
    postgres=# alter system set max_replication_slots to 10;
    ALTER SYSTEM
    postgres=# alter system set max_wal_senders to 10;
    ALTER SYSTEM
    postgres=# alter system set shared_preload_libraries to 'pglogical';
    ALTER SYSTEM
  8. docker-compose restart for the configuration to take effect
  9. Created the new database with necessary extensions, and created the role kobo to save some work later;
    postgres=# create database koboform;
    CREATE DATABASE
    postgres=# \c koboform
    You are now connected to database "koboform" as user "postgres".
    koboform=# create extension postgis;
    CREATE EXTENSION
    koboform=# create extension postgis_topology;
    CREATE EXTENSION
    koboform=# create extension pglogical;
    CREATE EXTENSION
    koboform=# create role kobo;
    CREATE ROLE
  10. Since I didn't map any ports for KPI to connect to my new database and create its tables, I cheated by getting the KPI schema from HHI PROD with pg_dump --schema-only --username=kobo --dbname=koboform | bzip2 | base64 and loading it into my new database with base64 -d | bunzip2 | psql -U postgres koboform
  11. Let's try some replication:

    root@newpg:/# psql -U postgres koboform
    psql (9.5.16)
    Type "help" for help.
    
    koboform=# select pglogical.create_node(
    koboform(#     node_name := 'kpi_db_subscriber',
    koboform(#     dsn := 'host=localhost port=5432 dbname=koboform'
    koboform(# );
     create_node 
    -------------
      1580755673
    (1 row)
    
    koboform=# select pglogical.create_subscription(
    koboform(#     subscription_name := 'kpi_db_subscription',
    koboform(#     replication_sets := '{kpi}',
    koboform(#     provider_dsn := 'host=172.17.0.1 port=5432 dbname=kobotoolbox user=kobo password=ahem'
    koboform(# );
     create_subscription 
    ---------------------
              3132487805
    (1 row)
  12. df -h /mnt/new shows the data pouring in. docker-compose logs -f has only this repeating complaint, which I think is okay given the absurdly large amount of writing the replication process is doing:
    LOG:  checkpoints are occurring too frequently (24 seconds apart)
    HINT:  Consider increasing the configuration parameter "max_wal_size".
  13. :sleeping:

I dunno: https://geohipster.com/2016/08/01/regina-obe-people-spend-much-time-learning-programming-language-forget-programming-language/

jnm commented 4 years ago

Failed:

postgres_1  | ERROR:  duplicate key value violates unique constraint "taggit_tag_name_key"
postgres_1  | DETAIL:  Key (name)=(1) already exists.
postgres_1  | CONTEXT:  COPY taggit_tag, line 419
postgres_1  | STATEMENT:  COPY "public"."taggit_tag" ("id","name","slug") FROM stdin
postgres_1  | ERROR:  current transaction is aborted, commands ignored until end of transaction block
postgres_1  | STATEMENT:  COPY "public"."kpi_importtask" ("id","data","status","uid","date_created","user_id","messages") FROM stdin
postgres_1  | ERROR:  table copy failed
postgres_1  | DETAIL:  Query 'COPY "public"."kpi_importtask" ("id","data","status","uid","date_created","user_id","messages") FROM stdin': 
postgres_1  | LOG:  apply worker [92] at slot 2 generation 1 exiting with error
postgres_1  | LOG:  worker process: pglogical apply 20623:3132487805 (PID 92) exited with exit code 1
postgres_1  | LOG:  starting apply for subscription kpi_db_subscription
postgres_1  | ERROR:  subscriber kpi_db_subscription initialization failed during nonrecoverable step (d), please try the setup again
jnm commented 4 years ago

Let's follow https://github.com/2ndQuadrant/pglogical/issues/156#issuecomment-401816291 and synchronize the tables individually. On the source, we'll create a new provider node and empty replication set:

select pglogical.create_node(
    node_name := 'combined_db_provider',
    dsn := 'host=localhost port=5432 dbname=kobotoolbox'
);

select pglogical.create_replication_set('kpi', true, true, true, true);

On the destination, we'll create a new subscriber node and subscription:

select pglogical.create_node(
    node_name := 'kpi_db_subscriber',
    dsn := 'host=localhost port=5432 dbname=koboform'
);

select pglogical.create_subscription(
    subscription_name := 'kpi_db_subscription',
    replication_sets := '{kpi}',
    provider_dsn := 'host=172.17.0.1 port=5432 dbname=kobotoolbox user=kobo password=ahem'
);

Then, on the source again, we'll add the tables individually (or in small groups)—using synchronize_data := true—and watch the synchronization status:

select pglogical.replication_set_add_table('kpi', 'spatial_ref_sys', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_migrations', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_content_type', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'auth_user', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'auth_group', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'auth_permission', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'auth_group_permissions', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'auth_user_groups', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'auth_user_user_permissions', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'constance_config', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_celery_beat_periodictasks', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_celery_beat_crontabschedule', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_celery_beat_intervalschedule', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_celery_beat_periodictask', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_celery_beat_solarschedule', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_admin_log', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'authtoken_token', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_digest_partialdigest', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'taggit_tag', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'taggit_taggeditem', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_collection', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_asset', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'reversion_revision', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'reversion_version', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_assetversion', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_importtask', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_authorizedapplication', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_taguid', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_objectpermission', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_assetsnapshot', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_onetimeauthenticationkey', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_usercollectionsubscription', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_exporttask', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_assetfile', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'hub_sitewidemessage', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'hub_configurationfile', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'hub_formbuilderpreference', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'hub_extrauserdetail', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'hub_perusersetting', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'oauth2_provider_application', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_session', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'oauth2_provider_accesstoken', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'oauth2_provider_grant', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_digest_usernonce', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'oauth2_provider_refreshtoken', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'registration_registrationprofile', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'hook_hook', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'hook_hooklog', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'external_integrations_corsmodel', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'help_inappmessage', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'help_inappmessagefile', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'help_inappmessageuserinteractions', synchronize_data := true);

Checking the status on the source database, kpi_db_subscription_copy in the startup state indicates the initial copy is still ongoing:

kobotoolbox=# \x
Expanded display is on.
kobotoolbox=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid              | 106
usesysid         | 136335772
usename          | kobo_replication
application_name | walreceiver
client_addr      | 10.0.1.251
client_hostname  | 
client_port      | 44422
backend_start    | 2020-01-14 07:19:20.40829+00
backend_xmin     | 461708255
state            | streaming
sent_location    | AF7/EF66B430
write_location   | AF7/EF66B430
flush_location   | AF7/EF66B430
replay_location  | AF7/EF66B398
sync_priority    | 0
sync_state       | async
-[ RECORD 2 ]----+------------------------------
pid              | 29988
usesysid         | 16384
usename          | kobo
application_name | kpi_db_subscription
client_addr      | 172.18.0.1
client_hostname  | 
client_port      | 56304
backend_start    | 2020-01-15 01:46:32.440657+00
backend_xmin     | 
state            | streaming
sent_location    | AF7/EF66B430
write_location   | AF7/EF66B430
flush_location   | AF7/EF66B430
replay_location  | AF7/EF66B430
sync_priority    | 0
sync_state       | async
-[ RECORD 3 ]----+------------------------------
pid              | 11636
usesysid         | 16384
usename          | kobo
application_name | kpi_db_subscription_copy
client_addr      | 172.18.0.1
client_hostname  | 
client_port      | 56874
backend_start    | 2020-01-15 02:17:15.151641+00
backend_xmin     | 461712515
state            | startup
sent_location    | 
write_location   | 
flush_location   | 
replay_location  | 
sync_priority    | 0
sync_state       | async

On the destination, we want all the sync_statuses to be r:

koboform=# select * from pglogical.local_sync_status ;
 sync_kind | sync_subid | sync_nspname |            sync_relname             | sync_status | sync_statuslsn 
-----------+------------+--------------+-------------------------------------+-------------+----------------
 d         | 3132487805 | public       | auth_group                          | r           | AF7/CEF8F720
 d         | 3132487805 |              |                                     | r           | 0/0
 d         | 3132487805 | public       | auth_user_groups                    | r           | AF7/CEFEF118
 d         | 3132487805 | public       | spatial_ref_sys                     | r           | AF7/CECCD4B8
 d         | 3132487805 | public       | auth_permission                     | r           | AF7/CEF932F8
 d         | 3132487805 | public       | django_celery_beat_periodictask     | r           | AF7/CF43CAA0
 d         | 3132487805 | public       | django_migrations                   | r           | AF7/CED39E20
 d         | 3132487805 | public       | kpi_collection                      | r           | AF7/D1EF4AD8
 d         | 3132487805 | public       | authtoken_token                     | r           | AF7/CEFAD048
 d         | 3132487805 | public       | auth_user_user_permissions          | r           | AF7/CF3A9628
 d         | 3132487805 | public       | django_content_type                 | r           | AF7/CED4D758
 d         | 3132487805 | public       | constance_config                    | r           | AF7/CF3CF9F8
 d         | 3132487805 | public       | django_digest_partialdigest         | r           | AF7/CEFEEED8
 d         | 3132487805 | public       | auth_user                           | r           | AF7/CED881F8
 d         | 3132487805 | public       | django_celery_beat_periodictasks    | r           | AF7/CF3F5698
 d         | 3132487805 | public       | django_celery_beat_solarschedule    | r           | AF7/CF43D448
 d         | 3132487805 | public       | auth_group_permissions              | r           | AF7/CEFEEFF8
 d         | 3132487805 | public       | django_celery_beat_crontabschedule  | r           | AF7/CF401B28
 d         | 3132487805 | public       | reversion_version                   | d           | AF7/D3397DB8
 d         | 3132487805 | public       | django_admin_log                    | r           | AF7/CF43FF80
 d         | 3132487805 | public       | taggit_tag                          | r           | AF7/CF3ABAD0
 d         | 3132487805 | public       | reversion_revision                  | r           | AF7/D3397D80
 d         | 3132487805 | public       | kpi_asset                           | i           | 0/0
 d         | 3132487805 | public       | django_celery_beat_intervalschedule | r           | AF7/CF409538
 d         | 3132487805 | public       | taggit_taggeditem                   | r           | AF7/D1E92790
(25 rows)
jnm commented 4 years ago

We can get an idea of how the initial copy is progressing by using pgstattuple(), which tells us the number of (uncommitted) rows in the table even when count(*) returns zero:

koboform=# select count(*) from kpi_assetversion;
 count 
-------
     0
(1 row)

koboform=# select tuple_count from pgstattuple('kpi_assetversion');
 tuple_count 
-------------
     1229244
(1 row)
jnm commented 4 years ago

Success. Unsure why the taggit_tag unique constraint violation didn't trip us up again. :warning: taggit_tag did fail and is empty :bomb: must deal with this later.

koboform=# select * from pglogical.local_sync_status ;
 sync_kind | sync_subid | sync_nspname |            sync_relname             | sync_status | sync_statuslsn 
-----------+------------+--------------+-------------------------------------+-------------+----------------
 d         | 3132487805 | public       | auth_group                          | r           | AF7/CEF8F720
 d         | 3132487805 |              |                                     | r           | 0/0
 d         | 3132487805 | public       | kpi_asset                           | r           | AF7/FFE4B3C8
 d         | 3132487805 | public       | auth_user_groups                    | r           | AF7/CEFEF118
 d         | 3132487805 | public       | spatial_ref_sys                     | r           | AF7/CECCD4B8
 d         | 3132487805 | public       | auth_permission                     | r           | AF7/CEF932F8
 d         | 3132487805 | public       | django_celery_beat_periodictask     | r           | AF7/CF43CAA0
 d         | 3132487805 | public       | django_migrations                   | r           | AF7/CED39E20
 d         | 3132487805 | public       | kpi_collection                      | r           | AF7/D1EF4AD8
 d         | 3132487805 | public       | authtoken_token                     | r           | AF7/CEFAD048
 d         | 3132487805 | public       | auth_user_user_permissions          | r           | AF7/CF3A9628
 d         | 3132487805 | public       | django_content_type                 | r           | AF7/CED4D758
 d         | 3132487805 | public       | constance_config                    | r           | AF7/CF3CF9F8
 d         | 3132487805 | public       | django_digest_partialdigest         | r           | AF7/CEFEEED8
 d         | 3132487805 | public       | auth_user                           | r           | AF7/CED881F8
 d         | 3132487805 | public       | django_celery_beat_periodictasks    | r           | AF7/CF3F5698
 d         | 3132487805 | public       | django_celery_beat_solarschedule    | r           | AF7/CF43D448
 d         | 3132487805 | public       | kpi_usercollectionsubscription      | r           | AFA/97C08B90
 d         | 3132487805 | public       | kpi_authorizedapplication           | r           | AFA/95C9AC70
 d         | 3132487805 | public       | auth_group_permissions              | r           | AF7/CEFEEFF8
 d         | 3132487805 | public       | django_celery_beat_crontabschedule  | r           | AF7/CF401B28
 d         | 3132487805 | public       | django_admin_log                    | r           | AF7/CF43FF80
 d         | 3132487805 | public       | taggit_tag                          | r           | AF7/CF3ABAD0
 d         | 3132487805 | public       | reversion_revision                  | r           | AF7/D3397D80
 d         | 3132487805 | public       | django_celery_beat_intervalschedule | r           | AF7/CF409538
 d         | 3132487805 | public       | kpi_assetversion                    | r           | AFA/76308F80
 d         | 3132487805 | public       | taggit_taggeditem                   | r           | AF7/D1E92790
 d         | 3132487805 | public       | kpi_objectpermission                | r           | AFA/97BE5988
 d         | 3132487805 | public       | reversion_version                   | r           | AF7/F6B580E8
 d         | 3132487805 | public       | kpi_onetimeauthenticationkey        | r           | AFA/A41DEC20
 d         | 3132487805 | public       | kpi_importtask                      | r           | AFA/91972DB8
 d         | 3132487805 | public       | kpi_taguid                          | r           | AFA/96414898
 d         | 3132487805 | public       | kpi_exporttask                      | r           | AFA/AD144F08
 d         | 3132487805 | public       | kpi_assetsnapshot                   | r           | AFA/A41D5380
 d         | 3132487805 | public       | hub_configurationfile               | r           | AFA/AD17C1B8
 d         | 3132487805 | public       | hub_formbuilderpreference           | r           | AFA/AD17C1F0
 d         | 3132487805 | public       | hub_extrauserdetail                 | r           | AFA/AD1B20C8
 d         | 3132487805 | public       | hub_perusersetting                  | r           | AFA/AD1C9900
 d         | 3132487805 | public       | kpi_assetfile                       | r           | AFA/AD1CEC90
 d         | 3132487805 | public       | hub_sitewidemessage                 | r           | AFA/AD1E7DD0
 d         | 3132487805 | public       | oauth2_provider_application         | r           | AFA/AD1E7E30
 d         | 3132487805 | public       | oauth2_provider_accesstoken         | r           | AFA/AD6C8248
 d         | 3132487805 | public       | oauth2_provider_grant               | r           | AFA/AD6D3108
 d         | 3132487805 | public       | django_digest_usernonce             | r           | AFA/AD6DC950
 d         | 3132487805 | public       | help_inappmessage                   | r           | AFA/AEFBCB98
 d         | 3132487805 | public       | django_session                      | r           | AFA/AD6ADBE8
 d         | 3132487805 | public       | help_inappmessagefile               | r           | AFA/AEFBE960
 d         | 3132487805 | public       | oauth2_provider_refreshtoken        | r           | AFA/AD6DF0B8
 d         | 3132487805 | public       | registration_registrationprofile    | r           | AFA/AD725760
 d         | 3132487805 | public       | help_inappmessageuserinteractions   | r           | AFA/AEFC1C18
 d         | 3132487805 | public       | hook_hook                           | r           | AFA/AEEA8F60
 d         | 3132487805 | public       | hook_hooklog                        | r           | AFA/AEF5DF18
 d         | 3132487805 | public       | external_integrations_corsmodel     | r           | AFA/AEFB5B00
(53 rows)
jnm commented 4 years ago

Added the KPI sequences per above.

jnm commented 4 years ago

Let's try KoBoCAT.

Source

select pglogical.create_replication_set('kobocat', true, true, true, true);

Destination

select pglogical.create_node(
    node_name := 'kobocat_db_subscriber',
    dsn := 'host=localhost port=5432 dbname=kobocat'
);

select pglogical.create_subscription(
    subscription_name := 'kobocat_db_subscription',
    replication_sets := '{kobocat}',
    provider_dsn := 'host=172.17.0.1 port=5432 dbname=kobotoolbox user=kobo password=ahem'
);

Source

(ordered to avoid violating FK constraints)

select pglogical.replication_set_add_table('kobocat', 'django_migrations', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'auth_user', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'auth_group', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'auth_permission', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'auth_user_user_permissions', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'auth_group_permissions', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'auth_user_groups', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'authtoken_token', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'corsheaders_corsmodel', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_admin_log', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_celery_beat_crontabschedule', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_celery_beat_intervalschedule', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_celery_beat_periodictask', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_celery_beat_periodictasks', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_celery_beat_solarschedule', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_content_type', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_digest_partialdigest', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_digest_usernonce', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_session', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_site', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'oauth2_provider_accesstoken', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'oauth2_provider_application', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'oauth2_provider_grant', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'oauth2_provider_refreshtoken', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'registration_registrationprofile', synchronize_data := true);

select pglogical.replication_set_add_table('kobocat', 'guardian_groupobjectpermission', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'guardian_userobjectpermission', synchronize_data := true);

select pglogical.replication_set_add_table('kobocat', 'api_organizationprofile', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'api_project', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'api_project_user_stars', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'api_team', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'api_team_projects', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'taggit_tag', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'taggit_taggeditem', synchronize_data := true);

select pglogical.replication_set_add_table('kobocat', 'logger_xform', synchronize_data := true);

select pglogical.replication_set_add_table('kobocat', 'stats_statscount', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'viewer_columnrename', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'viewer_export', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'api_projectxform', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'logger_surveytype', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'main_metadata', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'main_userprofile', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'main_tokenstoragemodel', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'restservice_restservice', synchronize_data := true);

select pglogical.replication_set_add_table('kobocat', 'logger_instance', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'logger_instancehistory', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'logger_attachment', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'logger_note', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'logger_ziggyinstance', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'viewer_instancemodification', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'viewer_parsedinstance', synchronize_data := true);

select pglogical.replication_set_add_table('kobocat', 'reversion_revision', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'reversion_version', synchronize_data := true);
jnm commented 4 years ago

Need to compare row counts (and maximum PKs?) for all tables. Plus, there are already some errors to investigate:

ERROR:  duplicate key value violates unique constraint "taggit_tag_name_key"
ERROR:  duplicate key value violates unique constraint "viewer_export_xform_id_7a553159ed21a101_uniq"
ERROR:  duplicate key value violates unique constraint "logger_surveytype_slug_key"
jnm commented 4 years ago

Taking taggit_tag as an example, we can see that tags added since we configured pglogical are replicated, but none of the tags before that were copied over:

root@newpg:/# psql -U postgres -c 'select * from taggit_tag' kobocat
  id  |  name  | slug 
------+--------+------
 2983 | សាលា   | _22
 2985 | សៀមរាប | _23
(2 rows)

Let's fix that by copying the old data, using --inserts to give each row its own statement since we know in advance that some rows will fail:

pg_dump -U kobo -h 172.17.0.1 --data-only --inserts --table=taggit_tag kobotoolbox | psql -U postgres kobocat

The result is a lot of INSERT 0 1 (indicating success), but also a lot of failures, e.g.

ERROR:  duplicate key value violates unique constraint "taggit_tag_name_key"
DETAIL:  Key (name)=(demographic) already exists.
jnm commented 4 years ago

Unfortunately, some of these duplicates that couldn't be added are actually referenced by tagged items:

kobocat=# select count(distinct tag_id) from taggit_taggeditem where tag_id not in (select id from taggit_tag as _);
 count 
-------
   247
(1 row)

Our job is to split a database, not clean up inconsistency. Let's drop the taggit_tag constraints and the FK from taggit_taggeditem so that we can truncate taggit_tag and start over:

kobocat=# alter table taggit_tag drop constraint taggit_tag_name_key;
ALTER TABLE
kobocat=# alter table taggit_tag drop constraint taggit_tag_slug_key;
ALTER TABLE
kobocat=# alter table taggit_taggeditem drop constraint taggit_taggeditem_tag_id_6318217c0d95e0d2_fk_taggit_tag_id;
ALTER TABLE
kobocat=# truncate table taggit_tag;
TRUNCATE TABLE

Let's try copying the (inconsistent) data again:

root@newpg:/# pg_dump -U kobo -h 172.17.0.1 --data-only --inserts --table=taggit_tag kobotoolbox | psql -U postgres kobocat | grep -v '^INSERT 0 1$'
Password: 
SET
SET
SET
SET
 set_config 
------------

(1 row)

SET
SET
SET
 setval 
--------
   2985
(1 row)
jnm commented 4 years ago

This was very dissatisfying because I couldn't put the constraints back with such trash data. I decided just to clean it up (code ran in KoBoCAT's shell_plus):

from __future__ import print_function, unicode_literals
from collections import Counter
from django.db import models

tt = list(Tag.objects.values_list('id', 'name', 'slug'))
dupenames = [x for x, count in Counter([x[1] for x in tt]).items() if count > 1]
dupeslugs = [x for x, count in Counter([x[2] for x in tt]).items() if count > 1]

class TagUid(models.Model):
    """ hack to access KPI model """
    tag = models.OneToOneField(Tag, on_delete=models.CASCADE)
    class Meta:
        app_label = 'kpi'

def clean_up(pks):
    """
    reassign all related objects to use the first pk,
    then delete every pk except the first one
    """
    update_count = TaggedItem.objects.filter(tag_id__in=pks[1:]).update(tag_id=pks[0])
    print('{}\t{}'.format(Tag.objects.filter(pk=pks[0]).values('pk', 'name', 'slug')[0], update_count))
    # prune the duplicates
    with transaction.atomic():
        trash = Tag.objects.filter(pk__in=pks[1:]).select_for_update()
        # annoying: KPI has an extra model related to Tag
        TagUid.objects.filter(tag=trash).delete()
        trash.delete()

for n in dupenames:
    # `Tag.objects.filter(name=n)` will return only one result!
    pks = sorted([x[0] for x in tt if x[1] == n])
    clean_up(pks)

for s in dupeslugs:
    pks = sorted([x[0] for x in tt if x[2] == s])
    clean_up(pks)

Next, I truncated taggit_tag on the destination database, re-ran pg_dump -U kobo -h 172.17.0.1 --data-only --inserts --table=taggit_tag kobotoolbox | psql -U postgres kobocat | grep -v '^INSERT 0 1$', and re-added the constraints:

ALTER TABLE ONLY public.taggit_tag ADD CONSTRAINT taggit_tag_name_key UNIQUE (name);
ALTER TABLE ONLY public.taggit_tag ADD CONSTRAINT taggit_tag_slug_key UNIQUE (slug);
ALTER TABLE ONLY public.taggit_taggeditem
    ADD CONSTRAINT taggit_taggeditem_tag_id_6318217c0d95e0d2_fk_taggit_tag_id
    FOREIGN KEY (tag_id) REFERENCES public.taggit_tag(id) DEFERRABLE INITIALLY DEFERRED;
jnm commented 4 years ago

Yikes, trying to copy over viewer_export with pg_dump -U kobo -h 172.17.0.1 --data-only --inserts --table=viewer_export kobotoolbox | psql -U postgres kobocat complains a lot about

HINT:  You will need to rewrite or cast the expression.
ERROR:  column "created_on" is of type timestamp with time zone but expression is of type integer
LINE 1: INSERT INTO public.viewer_export VALUES (132050, 94878, '201...

Evidently the columns are not in the same order!?!?! image

This looks more promising:

root@newpg:/# pg_dump --help
pg_dump dumps a database as a text file or to other formats.
…
Options controlling the output content:
  --column-inserts             dump data as INSERT commands with column names
…

Let's truncate and try again:

psql -U postgres kobocat -c 'truncate table viewer_export;'
pg_dump -U kobo -h 172.17.0.1 --data-only --column-inserts --table=viewer_export kobotoolbox | psql -U postgres kobocat | grep -v '^INSERT 0 1$'

Result:

SET
SET
SET
SET
 set_config 
------------

(1 row)

SET
SET
SET
ERROR:  duplicate key value violates unique constraint "viewer_export_xform_id_7a553159ed21a101_uniq"
DETAIL:  Key (xform_id, filename)=(162955, ACTED_RRM_Registration_2018_01_04_23_24_30.xlsx) already exists.
ERROR:  duplicate key value violates unique constraint "viewer_export_xform_id_7a553159ed21a101_uniq"
DETAIL:  Key (xform_id, filename)=(249631, PDM_DFID_FSL_03072018_2018_07_15_09_36_59.xlsx) already exists.
ERROR:  duplicate key value violates unique constraint "viewer_export_xform_id_7a553159ed21a101_uniq"
DETAIL:  Key (xform_id, filename)=(277414, aewU7nheb6fhvErZH7ZGJS_2018_10_01_12_43_26.xlsx) already exists.
 setval 
--------
 801166
(1 row)

I don't think there's any need to worry about these rows. By definition, if the identical xform-filename pair already exists, the user can already access the export.

Somehow, despite the errors, I get identical counts and maximum PKs from both databases:

kobotoolbox=# select count(*) from viewer_export;
 count  
--------
 171825
(1 row)

kobotoolbox=# select max(id) from viewer_export;
  max   
--------
 801169
(1 row)
kobocat=# select count(*) from viewer_export;
 count  
--------
 171825
(1 row)

kobocat=# select max(id) from viewer_export;
  max   
--------
 801169
(1 row)
jnm commented 4 years ago

Now we must turn to logger_surveytype… which is related to big, bad logger_instance. It looks doubtful that this table is really used for anything:

kobocat$ ag -G py --ignore south_migrations survey_type
onadata/apps/logger/models/__init__.py
3:from onadata.apps.logger.models.survey_type import SurveyType

onadata/apps/logger/models/instance.py
19:from onadata.apps.logger.models.survey_type import SurveyType
118:    survey_type = models.ForeignKey(SurveyType)
225:    def _set_survey_type(self):
226:        self.survey_type, created = \
375:        self._set_survey_type()

onadata/apps/logger/migrations/0001_initial.py
128:            name='survey_type',

onadata/libs/utils/common_tags.py
10:SURVEY_TYPE = u'_survey_type_slug'

The health of this table in the source database is questionable:

kobotoolbox=# select id from logger_surveytype where slug='CRF_tuky';
  id   
-------
 73560
(1 row)

kobotoolbox=# select id from logger_surveytype where concat(slug,'') ='CRF_tuky';
  id   
-------
 54836
 73560
(2 rows)
jnm commented 4 years ago

Cleaning up logger_surveytype, again using KoBoCAT's shell_plus:

from __future__ import print_function, unicode_literals
from collections import Counter

sts = list(SurveyType.objects.values_list('pk', 'slug'))
dupes = [x for x, count in Counter([x[1] for x in sts]).items() if count > 1]

counter = 0
for dupe in dupes:
    counter += 1
    # `SurveyType.objects.filter(slug=dupe)` will return only one result!
    pks = sorted([x[0] for x in sts if x[1] == dupe])
    # reassign all related objects to use the first pk, then delete every pk
    # except the first one
    update_count = Instance.objects.filter(survey_type_id__in=pks[1:]).update(
        survey_type_id=pks[0]
    )
    print('({}/{})\t{}\t{}'.format(counter, len(dupes), dupe, update_count))
    # prune the duplicates
    SurveyType.objects.filter(pk__in=pks[1:]).delete()
jnm commented 4 years ago

:warning: Because Postgres' UPDATE rewrites entire rows, mass updating survey_type_id from logger_instance ballooned disk consumption. After expanding the disk volume used by Postgres, the script completed successfully. I'm not going to worry about bloat in the source database (since we plan to discard it), but for future reference: https://www.keithf4.com/checking-for-postgresql-bloat/

jnm commented 4 years ago

Some tables from the original database haven't made it into either the new kobocat or koboform databases:

Table Explanation
celery_taskmeta Outdated? Current DBs use django_celery_… tables
celery_taskmeta_id_seq Outdated? Current DBs use django_celery_… tables
celery_tasksetmeta Outdated? Current DBs use django_celery_… tables
celery_tasksetmeta_id_seq Outdated? Current DBs use django_celery_… tables
djcelery_crontabschedule Outdated? Current DBs use django_celery_… tables
djcelery_crontabschedule_id_seq Outdated? Current DBs use django_celery_… tables
djcelery_intervalschedule Outdated? Current DBs use django_celery_… tables
djcelery_intervalschedule_id_seq Outdated? Current DBs use django_celery_… tables
djcelery_periodictask Outdated? Current DBs use django_celery_… tables
djcelery_periodictask_id_seq Outdated? Current DBs use django_celery_… tables
djcelery_periodictasks Outdated? Current DBs use django_celery_… tables
djcelery_taskstate Outdated? Current DBs use django_celery_… tables
djcelery_taskstate_id_seq Outdated? Current DBs use django_celery_… tables
djcelery_workerstate Outdated? Current DBs use django_celery_… tables
djcelery_workerstate_id_seq Outdated? Current DBs use django_celery_… tables
koboform_surveydraft Old dkobo table; we should back it up
koboform_surveydraft_id_seq Old dkobo table; we should back it up
koboform_surveypreview Old dkobo table; we should back it up
koboform_surveypreview_id_seq Old dkobo table; we should back it up
main_sitewidemessage Empty table; replaced by hub_sitewidemessage?
main_sitewidemessage_id_seq Empty table; replaced by hub_sitewidemessage?
south_migrationhistory Relic of the days when Django did not have built-in schema migration
south_migrationhistory_id_seq Relic of the days when Django did not have built-in schema migration

Backup of old dkobo stuff:

root@newpg:/var/lib/postgresql/data# du -hs .
863G    .
root@newpg:/var/lib/postgresql/data# mkdir old_dkobo_surveydraft_surveypreview_tables
root@newpg:/var/lib/postgresql/data# cd old_dkobo_surveydraft_surveypreview_tables
root@newpg:/var/lib/postgresql/data/old_dkobo_surveydraft_surveypreview_tables# pg_dump -U kobo -h 172.17.0.1 --table=koboform_surveydraft --table=koboform_surveypreview kobotoolbox | bzip2 > dump.sql.bz2
Password: 
root@newpg:/var/lib/postgresql/data/old_dkobo_surveydraft_surveypreview_tables# du -h dump.sql.bz2 
397M    dump.sql.bz2
jnm commented 4 years ago

Python utility script to check the counts and maximum PKs of tables in the old and new databases:

import psycopg2
import sys
import threading

class DbThreadThing:
    def __init__(self, dsn):
        self.conn = psycopg2.connect(dsn)

    def __del__(self):
        self.conn.close()

    def get_table_count_and_max_id(self, table):
        with self.conn.cursor() as cur:
            cur.execute("select count(*), max(id) from " + table)
            self.result = cur.fetchone()

    def start_query(self, table):
        self.thread = threading.Thread(
            target=self.get_table_count_and_max_id, args=(table,)
        )
        self.thread.start()

    def wait_for_result(self):
        self.thread.join()
        return self.result

old = DbThreadThing("host=172.17.0.1 user=kobo password=righto dbname=kobotoolbox")

if sys.argv[1] == "kobocat":
    new = DbThreadThing("user=postgres dbname=kobocat")
elif sys.argv[1] == "koboform":
    new = DbThreadThing("user=postgres dbname=koboform")

table = sys.argv[2]
old.start_query(table)
new.start_query(table)
old_count, old_max = old.wait_for_result()
new_count, new_max = new.wait_for_result()

print(
    " ".join(
        [
            str(x)
            for x in [
                table,
                ' ' * max(0, 50 - len(table)),
                new_count - old_count,
                new_max - old_max,
                new_count,
                old_count,
                new_max,
                old_max,
            ]
        ]
    )
)
jnm commented 4 years ago

KoBoCAT's tables stopped replicating for some reason :confounded:

Results for most KPI tables below. We need to deal with authtoken_token separately, because it has no id column. A handful of django_celery_beat_… stuff I just didn't consider.

:warning: Something is off with taggit_tag:

auth_group_permissions                              0 0 1 1 1 1
auth_permission                                     0 0 280 280 286 286
constance_config                                    0 0 8 8 8 8
django_admin_log                                    0 0 1783 1783 1879 1879
django_content_type                                 0 0 86 86 88 88
django_digest_partialdigest                         0 0 292472 292472 384253 384253
django_digest_usernonce                             0 0 150 150 10692085 10692085
django_migrations                                   0 0 166 166 199 199
external_integrations_corsmodel                     0 0 3 3 3 3
help_inappmessage                                   0 0 3 3 3 3
help_inappmessagefile                               0 0 1 1 1 1
help_inappmessageuserinteractions                   0 0 19014 19014 19198 19198
hook_hook                                           0 0 596 596 785 785
hook_hooklog                                        0 0 712214 712214 716328 716328
hub_configurationfile                               0 0 0 0 None None
hub_sitewidemessage                                 0 0 3 3 8 8
kpi_assetuserpartialpermission                      0 0 1956 1956 19220 19220
kpi_authorizedapplication                           0 0 0 0 None None
kpi_onetimeauthenticationkey                        0 0 0 0 None None
oauth2_provider_accesstoken                         0 0 915 915 34839 34839
oauth2_provider_application                         0 0 57 57 96 96
oauth2_provider_grant                               0 0 176 176 381 381
oauth2_provider_refreshtoken                        0 0 915 915 34839 34839
auth_group                                          0 0 3 3 4 4
auth_user_groups                                    0 0 2 2 2 2
hub_extrauserdetail                                 0 0 135814 135814 136017 136017
hub_formbuilderpreference                           0 0 36728 36728 36864 36864
hub_perusersetting                                  0 0 1 1 1 1
kpi_collection                                      0 0 6457 6457 7221 7221
kpi_taguid                                          0 0 1889 1889 2265 2265
kpi_usercollectionsubscription                      0 0 2779 2779 4038 4038
registration_registrationprofile                    0 0 146089 146089 146597 146597
auth_user                                           0 0 146113 146113 147331 147331
auth_user_user_permissions                          0 0 7415960 7415960 7428499 7428499
kpi_assetfile                                       0 0 313 313 510 510
taggit_tag                                          -2014 0 31 2045 3036 3036
taggit_taggeditem                                   0 0 78328 78328 265517 265517
kpi_asset                                           0 0 593178 593178 2350930 2350930
kpi_exporttask                                      0 0 311251 311251 546123 546123
kpi_importtask                                      0 0 422642 422642 755604 755604
kpi_assetsnapshot                                   0 0 772098 772098 1754232 1754232
kpi_assetversion                                    0 0 3750118 3750118 7837019 7837019
kpi_objectpermission                                0 0 4104601 4104601 48305011 48305011
reversion_revision                                  0 0 49910339 49910339 84127042 84127042
reversion_version                                   0 0 49958377 49958377 84251592 84251592

real    13m7.476s
user    0m2.348s
sys     0m0.448s
jnm commented 4 years ago

Manual, row-by-row copy of taggit_tag:

pg_dump -U kobo -h 172.17.0.1 --data-only --column-inserts --table=taggit_tag kobotoolbox | psql -U kobo koboform | grep -v '^INSERT 0 1$'
<various expected duplicates>

Checking again:

python3 count.py koboform taggit_tag
taggit_tag                                          0 0 -1 -1 3036 3036

:ok:

jnm commented 4 years ago

Changed the frontends' config to use the new koboform database, but not the new kobocat database (which hasn't replicated properly).

Dropped the subscription and node on the new database:

koboform=# select pglogical.drop_subscription('kpi_db_subscription');
koboform=# select pglogical.drop_node('kpi_db_subscriber');

Dropped the replication set on the source:

select pglogical.drop_replication_set('kpi');

Mangled the source database so any frontend using it will fail immediately:

kobotoolbox=# select pglogical.drop_replication_set('kpi');
kobotoolbox=# alter table kpi_asset rename to DO_NOT_USE_kpi_asset;
jnm commented 4 years ago

Had to give up on the initial KoBoCAT replication effort. According to pg_stat_replication, kobocat_db_subscription would never get out of the catchup state, even once sent_location, write_location, flush_location, and replay_location were all equal. After sitting for a while, the pglogical process would die, kobocat_db_subscription would disappear from pg_stat_replication, and the process would repeat, with sent_location always jumping back to the same location. This kind of stuff appeared repeatedly in the destination database's log:

LOG:  starting apply for subscription kobocat_db_subscription
LOG:  worker process: pglogical apply 50039343:725963785 (PID 44) exited with exit code 1

I removed all tables except logger_instance from the replication set on the source database and dropped them from the destination; even then, it would not catch up. I gave up, dropped the subscription on the destination, truncated the incomplete logger_instance, and restarted replication for logger_instance alone. If it succeeds, I'll add the other tables.

jnm commented 4 years ago

Woohoo, we're streaming logger_instance again! This took roughly 40 hours.

kobotoolbox=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid              | 10895
usesysid         | 16384
usename          | kobo
application_name | kobocat_db_subscription
client_addr      | 172.18.0.1
client_hostname  | 
client_port      | 35572
backend_start    | 2020-01-30 23:09:37.212477+00
backend_xmin     | 
state            | streaming
sent_location    | B6C/79C06AA0
write_location   | B6C/79C06AA0
flush_location   | B6C/79C06AA0
replay_location  | B6C/79C06AA0
sync_priority    | 0
sync_state       | async
jnm commented 4 years ago

taggit_tag and viewer_export need attention:

api_project                                         0 0 3 3 3 3
api_project_user_stars                              0 0 0 0 None None
api_projectxform                                    0 0 0 0 None None
api_team_projects                                   0 0 0 0 None None
corsheaders_corsmodel                               0 0 0 0 None None
django_admin_log                                    0 0 1783 1783 1879 1879
django_content_type                                 0 0 86 86 88 88
django_migrations                                   0 0 166 166 199 199
django_site                                         0 0 2 2 2 2
oauth2_provider_accesstoken                         0 0 915 915 34839 34839
oauth2_provider_application                         0 0 57 57 96 96
oauth2_provider_grant                               0 0 176 176 381 381
oauth2_provider_refreshtoken                        0 0 915 915 34839 34839
stats_statscount                                    0 0 118896 118896 118896 118896
auth_group                                          0 0 3 3 4 4
auth_group_permissions                              0 0 1 1 1 1
auth_permission                                     0 0 280 280 286 286
auth_user_groups                                    0 0 2 2 2 2
django_digest_partialdigest                         0 0 294197 294197 387352 387352
django_digest_usernonce                             0 0 123 123 10811021 10811021
guardian_groupobjectpermission                      0 0 0 0 None None
logger_ziggyinstance                                0 0 0 0 None None
main_metadata                                       0 0 617539 617539 1170046 1170046
restservice_restservice                             0 0 494 494 1886 1886
taggit_tag                                          -2045 0 0 2045 None 3036
taggit_taggeditem                                   0 0 78328 78328 265517 265517
auth_user                                           0 0 146962 146962 148979 148979
auth_user_user_permissions                          0 0 7446560 7446560 7459099 7459099
logger_note                                         0 0 15096 15096 22395 22395
logger_surveytype                                   0 0 213373 213373 230971 230971
main_userprofile                                    0 0 114244 114244 115040 115040
registration_registrationprofile                    0 0 146091 146091 146599 146599
viewer_columnrename                                 0 0 0 0 None None
guardian_userobjectpermission                       0 0 5682070 5682070 6856957 6856957
logger_attachment                                   0 0 29808595 29808595 30387456 30387456
logger_instancehistory                              0 0 936073 936073 987852 987852
logger_xform                                        0 0 242753 242753 486723 486723
viewer_export                                       -172609 0 647 173256 805236 805236
viewer_instancemodification                         0 0 0 0 None None
logger_instance                                     0 0 80195164 80195164 83078465 83078465
reversion_revision                                  0 0 50550247 50550247 84766950 84766950
reversion_version                                   0 0 50600598 50600598 84893813 84893813
viewer_parsedinstance                               0 0 80197715 80197715 83017168 83017168

This comparison of counts and largest IDs for each table took about 30 minutes to run:

real    28m36.392s
user    0m2.320s
sys     0m0.348s
jnm commented 4 years ago

Manually copied taggit_tag data with:

pg_dump -U kobo -h 172.17.0.1 --data-only --column-inserts --table=taggit_tag kobotoolbox | psql -U kobo kobocat | grep -v '^INSERT 0 1$'

…but saw some familiar issues:

ERROR:  duplicate key value violates unique constraint "taggit_tag_name_key"

Counts were slightly off as a result:

# python3 count.py kobocat taggit_tag
taggit_tag                                          -19 -39 2026 2045 2997 3036

Re-ran clean up script from https://github.com/kobotoolbox/kobo-docker/issues/265#issuecomment-577534340; now counts match:

# python3 count.py kobocat taggit_tag
taggit_tag                                          0 0 2026 2026 2997 2997
jnm commented 4 years ago

Manually copying viewer_export:

pg_dump -U kobo -h 172.17.0.1 --data-only --column-inserts --table=viewer_export kobotoolbox | psql -U kobo kobocat | grep -v '^INSERT 0 1$'

…revealed expected duplicates from replication:

ERROR:  duplicate key value violates unique constraint "viewer_export_pkey"

…but also some unexpected unique constraint violations:

ERROR:  duplicate key value violates unique constraint "viewer_export_xform_id_7a553159ed21a101_uniq"
DETAIL:  Key (xform_id, filename)=(162955, <redacted>.xlsx) already exists.

Unsurprisingly, the counts did not match:

root@postgres:/var/lib/postgresql/data/jnm_zomg# python3 count.py kobocat viewer_export
viewer_export                                       -3 0 174519 174522 808994 808994

The source database was showing inconsistencies that you don't like to see: :broken_heart:

kobotoolbox=# select id, xform_id, filename from viewer_export where xform_id=162955 and filename='<redacted>.xlsx';
   id   | xform_id |                    filename                     
--------+----------+-------------------------------------------------
 226812 |   162955 | <redacted>.xlsx
(1 row)

kobotoolbox=# select id, xform_id, filename from viewer_export where xform_id=162955 and concat(filename, '')='<redacted>.xlsx';
   id   | xform_id |                    filename                     
--------+----------+-------------------------------------------------
 226812 |   162955 | <redacted>.xlsx
 226813 |   162955 | <redacted>.xlsx
(2 rows)

There were only three of these, so I cleaned them manually by deleting the duplicates. Previously, I didn't care about them, but I became worried they might interfere with replication. Counts were still off by one, so I adapted the count/max(id) Python script to find the missing ID:

import psycopg2
import sys
import threading

class DbThreadThing:
    def __init__(self, dsn):
        self.conn = psycopg2.connect(dsn)

    def __del__(self):
        self.conn.close()

    def get_all_ids_from_table(self, table):
        with self.conn.cursor() as cur:
            cur.execute("select id from " + table)
            self.result = cur.fetchall()

    def start_query(self, table):
        self.thread = threading.Thread(
            target=self.get_all_ids_from_table, args=(table,)
        )
        self.thread.start()

    def wait_for_result(self):
        self.thread.join()
        return self.result

old = DbThreadThing("host=172.17.0.1 user=kobo password=righto dbname=kobotoolbox")

if sys.argv[1] == "kobocat":
    new = DbThreadThing("user=postgres dbname=kobocat")
elif sys.argv[1] == "koboform":
    new = DbThreadThing("user=postgres dbname=koboform")

table = sys.argv[2]
old.start_query(table)
new.start_query(table)
old_ids = old.wait_for_result()
new_ids = new.wait_for_result()

def unwrap_ids(ids):
    return [t[0] for t in ids]

print(set(unwrap_ids(old_ids)).symmetric_difference(unwrap_ids(new_ids)))

…and manually added it to the destination database.

jnm commented 4 years ago

Needed to add sequences for KC (list obtained from \ds in the new KC database; commands below executed on the source database):

select pglogical.replication_set_add_sequence('kobocat', 'api_project_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'api_project_user_stars_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'api_projectxform_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'api_team_projects_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'auth_group_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'auth_group_permissions_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'auth_permission_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'auth_user_groups_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'auth_user_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'auth_user_user_permissions_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'corsheaders_corsmodel_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'django_admin_log_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'django_celery_beat_crontabschedule_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'django_celery_beat_intervalschedule_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'django_celery_beat_periodictask_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'django_celery_beat_solarschedule_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'django_content_type_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'django_digest_partialdigest_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'django_digest_usernonce_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'django_migrations_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'django_site_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'guardian_groupobjectpermission_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'guardian_userobjectpermission_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'logger_attachment_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'logger_instance_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'logger_instancehistory_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'logger_note_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'logger_surveytype_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'logger_xform_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'logger_ziggyinstance_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'main_metadata_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'main_userprofile_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'oauth2_provider_accesstoken_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'oauth2_provider_application_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'oauth2_provider_grant_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'oauth2_provider_refreshtoken_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'registration_registrationprofile_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'restservice_restservice_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'reversion_revision_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'reversion_version_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'stats_statscount_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'taggit_tag_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'taggit_taggeditem_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'viewer_columnrename_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'viewer_export_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'viewer_instancemodification_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'viewer_parsedinstance_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'topology_id_seq', synchronize_data := true);
jnm commented 4 years ago

Cut KoBoCAT over to new database successfully: https://github.com/kobotoolbox/kobo-deployments/commit/55d908a5382ddf2da4a123342c01c46c5670f201 https://github.com/kobotoolbox/kobo-deployments/commit/9e68590de84ffde1902ca5c0fd0ad14821e075c0

Dropped node and subscription from new database (warning isn't surprising given that the source database has been stopped):

kobocat=# select pglogical.drop_subscription('kobocat_db_subscription');
WARNING:  could not drop slot "pgl_kobocat_combined12d34dc_kobocat_2b45540" on provider, you will probably have to drop it manually
 drop_subscription 
-------------------
                 1
(1 row)

kobocat=# select pglogical.drop_node('kobocat_db_subscriber');
 drop_node 
-----------
 t
(1 row)
jnm commented 4 years ago

I'd removed some constraints from the new KoBoCAT database so that bad data in the source wouldn't break replication. I didn't restore the constraints, however, and people started getting 500 errors when trying to submit because get_or_create() isn't atomic: https://github.com/kobotoolbox/kobocat/blob/cdfe325670cc7ba86b87311792309693891750fe/onadata/apps/logger/models/instance.py#L225-L227

MultipleObjectsReturned/{username}/submission
error
get() returned more than one SurveyType -- it returned 2!

Fixed by cleaning up the database with the Python code in http://r.tmoj.net/https://github.com/kobotoolbox/kobo-docker/issues/265#issuecomment-577833875 and adding the unique constraint:

kobocat=# ALTER TABLE ONLY public.logger_surveytype
    ADD CONSTRAINT logger_surveytype_slug_key UNIQUE (slug);
ALTER TABLE
kobocat=# 

A diff of the pg_dump --schema-only between the new OCHA KoBoCAT database and a fresh one shows that some other constraints also need to be added:

ALTER TABLE ONLY public.logger_instance
    ADD CONSTRAINT logger__survey_type_id_5a58f1cfec1771cd_fk_logger_surveytype_id FOREIGN KEY (survey_type_id) REFERENCES public.logger_surveytype(id) DEFERRABLE INITIALLY DEFERRED;

ALTER TABLE ONLY public.logger_instance
    ADD CONSTRAINT logger_instance_user_id_3a3b162ef6bedc0e_fk_auth_user_id FOREIGN KEY (user_id) REFERENCES public.auth_user(id) DEFERRABLE INITIALLY DEFERRED;

ALTER TABLE ONLY public.logger_instance
    ADD CONSTRAINT logger_instance_xform_id_546567e950972f94_fk_logger_xform_id FOREIGN KEY (xform_id) REFERENCES public.logger_xform(id) DEFERRABLE INITIALLY DEFERRED;
jnm commented 4 years ago

Adding those constraints locks the table (ouch), so we can make them NOT VALID—where they apply only to new records—and then VALIDATE CONSTRAINT as a non-locking job later.

Adding the three NOT VALID constraints:

kobocat=# \timing
Timing is on.
kobocat=# ALTER TABLE ONLY public.logger_instance
    ADD CONSTRAINT logger__survey_type_id_5a58f1cfec1771cd_fk_logger_surveytype_id FOREIGN KEY (survey_type_id) REFERENCES public.logger_surveytype(id) DEFERRABLE INITIALLY DEFERRED NOT VALID;
ALTER TABLE
Time: 64.423 ms
kobocat=# ALTER TABLE ONLY public.logger_instance
    ADD CONSTRAINT logger_instance_user_id_3a3b162ef6bedc0e_fk_auth_user_id FOREIGN KEY (user_id) REFERENCES public.auth_user(id) DEFERRABLE INITIALLY DEFERRED NOT VALID;
ALTER TABLE
Time: 1.371 ms
kobocat=# ALTER TABLE ONLY public.logger_instance
kobocat-#     ADD CONSTRAINT logger_instance_xform_id_546567e950972f94_fk_logger_xform_id FOREIGN KEY (xform_id) REFERENCES public.logger_xform(id) DEFERRABLE INITIALLY DEFERRED
kobocat-#     NOT VALID;
ALTER TABLE
Time: 1.907 ms

Validating them, which is slow but does not require blocking writes to the table:

kobocat=# ALTER TABLE ONLY public.logger_instance VALIDATE CONSTRAINT logger_instance_xform_id_546567e950972f94_fk_logger_xform_id;
ALTER TABLE
Time: 669835.189 ms
kobocat=# ALTER TABLE ONLY public.logger_instance
kobocat-#     VALIDATE CONSTRAINT logger_instance_user_id_3a3b162ef6bedc0e_fk_auth_user_id;
ALTER TABLE
Time: 642934.770 ms
kobocat=# ALTER TABLE ONLY public.logger_instance
kobocat-#     VALIDATE CONSTRAINT logger__survey_type_id_5a58f1cfec1771cd_fk_logger_surveytype_id;
ALTER TABLE
Time: 627803.682 ms

A diff of pg_dump --schema-only between the OCHA KoBoCAT database and a fresh KC database now looks good.