kobotoolbox / kobocat

Our (backend) server for providing blank forms to Collect and Enketo and for receiving and storing submissions.
BSD 2-Clause "Simplified" License
117 stars 127 forks source link

Database index for `Instance.uuid` #393

Open jnm opened 6 years ago

jnm commented 6 years ago

@noliveleger, I've worked on https://github.com/kobotoolbox/kobocat/pull/368 and will proceed with merging it. Would you be able to investigate having an index on the logger_instance.uuid column? On both HHI PROD and OCHA, Postgres is telling me that there's already an index uuid:

Indexes:
    <snip>
    "odk_logger_instance_uuid_idx" btree (uuid)

...but I can't find anything in the Django code that would cause this index to be created. In fact, looking at my local development machine, the uuid index is not present. Ideally we should:

noliveleger commented 6 years ago

A pull request has been created. #395 This migration can lock tables for a long period if index does not already exist.

Creating an index can interfere with regular operation of a database. Normally PostgreSQL locks the table to be indexed against writes and performs the entire index build with a single scan of the table. Other transactions can still read the table, but if they try to insert, update, or delete rows in the table they will block until the index build is finished. This could have a severe effect if the system is a live production database. Very large tables can take many hours to be indexed, and even for smaller tables, an index build can lock out writers for periods that are unacceptably long for a production system.

Servers with lots of data and no indexes could run the query manually with CONCURRENTLY before running this migration. It can't be run in a transaction. https://www.postgresql.org/docs/9.5/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY

noliveleger commented 6 years ago

@jnm I've updated last migration I've created. It nows deletes old indexes from logger_xform and logger_instance and creates new ones. As we've already discussed, this migration can lock table for a while. Applying it on OCHA db on my laptop (3.1 Ghz i5) took 17 minutes. AWS instances are way more powerful than my laptop but it can lock tables for several minutes either. I would go with your suggestion about running the migration with --fake option and creating indexes manually.

./manage.py migrate --noinput --fake

Indexes in DB look like that when running migration without --fake option:

    "logger_instance_uuid_3a47421a9432ac55_like" btree (uuid varchar_pattern_ops)
    "logger_instance_uuid_3a47421a9432ac55_uniq" btree (uuid)
    "logger_xform_uuid_6e2ade741dd6b58b_like" btree (uuid varchar_pattern_ops)
    "logger_xform_uuid_6e2ade741dd6b58b_uniq" btree (uuid)
noliveleger commented 5 years ago

Running the command manually within postgres container does not seem to help on my VM. ( 4vCPU - 8 GB RAM, 8 GB swap)

I've tested only logger_instance so far. Because I've already applied the migration, I deleted the indexes manually on logger_instance table.

kobotoolbox=# CREATE INDEX CONCURRENTLY logger_instance_uuid_3a47421a9432ac55_uniq
kobotoolbox-#     ON public.logger_instance USING btree
kobotoolbox-#     (uuid COLLATE pg_catalog."default")
kobotoolbox-#     TABLESPACE pg_default;
CREATE INDEX
Time: 2614727.038 ms

kobotoolbox=# CREATE INDEX CONCURRENTLY logger_instance_uuid_3a47421a9432ac55_like
kobotoolbox-#     ON public.logger_instance USING btree
kobotoolbox-#     (uuid COLLATE pg_catalog."default" varchar_pattern_ops)
kobotoolbox-#     TABLESPACE pg_default;
CREATE INDEX
Time: 1686893.948 ms

During the creation of these indexes, web app was really slow (ie. Deploying a form took ~55s) and service_health reports long postgres response (~45s).

noliveleger commented 5 years ago

Running the same command without CONCURRENTLY is faster (~2.5x) but does lock tables. I could not submit any data from enketo

kobotoolbox=# CREATE INDEX logger_instance_uuid_3a47421a9432ac55_uniq
    ON public.logger_instance USING btree
    (uuid COLLATE pg_catalog."default")
    TABLESPACE pg_default;
CREATE INDEX
Time: 1039838.028 ms

kobotoolbox=# CREATE INDEX logger_instance_uuid_3a47421a9432ac55_like
kobotoolbox-#     ON public.logger_instance USING btree
kobotoolbox-#     (uuid COLLATE pg_catalog."default" varchar_pattern_ops)
kobotoolbox-#     TABLESPACE pg_default;
CREATE INDEX
Time: 674924.637 ms
jnm commented 5 years ago

Fake the migration

root@kobocat:/srv/src/kobocat# ./manage.py migrate --fake logger 0011
Your environment is:"onadata.settings.kc_environ"
DEBUG 2019-02-13 16:25:03,759 base 45 140120381163264 Configuring Raven for host: <raven.conf.remote.RemoteConfig object at 0x7f7021ae6f50>
Operations to perform:
  Target specific migration: 0011_add-index-to-instance-uuid_and_xform_uuid, from logger
Running migrations:
  Rendering model states... DONE
  Applying logger.0011_add-index-to-instance-uuid_and_xform_uuid... FAKED
The following content types are stale and need to be deleted:

    auth | user_user_permissions

Any objects related to these content types by a foreign key will also
be deleted. Are you sure you want to delete these content types?
If you're unsure, answer 'no'.

    Type 'yes' to continue, or 'no' to cancel: no

Get Django to print the SQL the migration would've run

root@kobocat:/srv/src/kobocat# ./manage.py sqlmigrate logger 0011
Your environment is:"onadata.settings.kc_environ"
DEBUG 2019-02-13 20:36:37,155 base 8850 139809463703296 Configuring Raven for host: <raven.conf.remote.RemoteConfig object at 0x7f27bd8e7f50>
BEGIN;
DROP INDEX IF EXISTS odk_logger_xform_uuid_idx;
DROP INDEX IF EXISTS odk_logger_instance_uuid_idx;
CREATE INDEX "logger_instance_uuid_3a47421a9432ac55_uniq" ON "logger_instance" ("uuid");
CREATE INDEX "logger_instance_uuid_3a47421a9432ac55_like" ON "logger_instance" ("uuid" varchar_pattern_ops);
CREATE INDEX "logger_xform_uuid_6e2ade741dd6b58b_uniq" ON "logger_xform" ("uuid");
CREATE INDEX "logger_xform_uuid_6e2ade741dd6b58b_like" ON "logger_xform" ("uuid" varchar_pattern_ops);

COMMIT;

Add CONCURRENTLY to the CREATE statements

CREATE INDEX CONCURRENTLY "logger_instance_uuid_3a47421a9432ac55_uniq" ON "logger_instance" ("uuid");
CREATE INDEX CONCURRENTLY "logger_instance_uuid_3a47421a9432ac55_like" ON "logger_instance" ("uuid" varchar_pattern_ops);
CREATE INDEX CONCURRENTLY "logger_xform_uuid_6e2ade741dd6b58b_uniq" ON "logger_xform" ("uuid");
CREATE INDEX CONCURRENTLY "logger_xform_uuid_6e2ade741dd6b58b_like" ON "logger_xform" ("uuid" varchar_pattern_ops);

Use renice and ionice to (attempt to) keep the server responsive

Do this from the Docker host, not inside the container (where it will fail). NB: The same process probably won't be reused for all CREATE statements. I ran only the first one—and am still waiting for it to finish—so I didn't have to worry about that.

ubuntu@docker-host:~/kobo-docker$ pgrep -f CREATE\ INDEX
12259
ubuntu@docker-host:~/kobo-docker$ sudo renice -n 20 12259
12259 (process ID) old priority 0, new priority 19
ubuntu@docker-host:~/kobo-docker$ ionice -p 12259
none: prio 0
ubuntu@docker-host:~/kobo-docker$ sudo ionice -c 3 -p 12259
ubuntu@docker-host:~/kobo-docker$ ionice -p 12259
idle

Finishing up

Q: Does the Postgres worker process stick around after it's finished CREATEing? If so, its CPU and IO priority should be reset to the default. A: If it does stick around, it hasn't been long enough for me to catch it. Still, checking to make sure all the Postgres' processes nicenesses are correct is a good idea (or just restart the Postgres container).