Open jnm opened 6 years ago
A pull request has been created. #395 This migration can lock tables for a long period if index does not already exist.
Creating an index can interfere with regular operation of a database. Normally PostgreSQL locks the table to be indexed against writes and performs the entire index build with a single scan of the table. Other transactions can still read the table, but if they try to insert, update, or delete rows in the table they will block until the index build is finished. This could have a severe effect if the system is a live production database. Very large tables can take many hours to be indexed, and even for smaller tables, an index build can lock out writers for periods that are unacceptably long for a production system.
Servers with lots of data and no indexes could run the query manually with CONCURRENTLY
before running this migration. It can't be run in a transaction.
https://www.postgresql.org/docs/9.5/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY
@jnm I've updated last migration I've created. It nows deletes old indexes from logger_xform and logger_instance and creates new ones.
As we've already discussed, this migration can lock table for a while. Applying it on OCHA db on my laptop (3.1 Ghz i5) took 17 minutes. AWS instances are way more powerful than my laptop but it can lock tables for several minutes either.
I would go with your suggestion about running the migration with --fake
option and creating indexes manually.
./manage.py migrate --noinput --fake
Indexes in DB look like that when running migration without --fake
option:
"logger_instance_uuid_3a47421a9432ac55_like" btree (uuid varchar_pattern_ops)
"logger_instance_uuid_3a47421a9432ac55_uniq" btree (uuid)
"logger_xform_uuid_6e2ade741dd6b58b_like" btree (uuid varchar_pattern_ops)
"logger_xform_uuid_6e2ade741dd6b58b_uniq" btree (uuid)
Running the command manually within postgres
container does not seem to help on my VM. ( 4vCPU - 8 GB RAM, 8 GB swap)
I've tested only logger_instance
so far.
Because I've already applied the migration, I deleted the indexes manually on logger_instance
table.
kobotoolbox=# CREATE INDEX CONCURRENTLY logger_instance_uuid_3a47421a9432ac55_uniq
kobotoolbox-# ON public.logger_instance USING btree
kobotoolbox-# (uuid COLLATE pg_catalog."default")
kobotoolbox-# TABLESPACE pg_default;
CREATE INDEX
Time: 2614727.038 ms
kobotoolbox=# CREATE INDEX CONCURRENTLY logger_instance_uuid_3a47421a9432ac55_like
kobotoolbox-# ON public.logger_instance USING btree
kobotoolbox-# (uuid COLLATE pg_catalog."default" varchar_pattern_ops)
kobotoolbox-# TABLESPACE pg_default;
CREATE INDEX
Time: 1686893.948 ms
During the creation of these indexes, web app was really slow (ie. Deploying a form took ~55s) and service_health
reports long postgres
response (~45s).
Running the same command without CONCURRENTLY
is faster (~2.5x) but does lock tables. I could not submit any data from enketo
kobotoolbox=# CREATE INDEX logger_instance_uuid_3a47421a9432ac55_uniq
ON public.logger_instance USING btree
(uuid COLLATE pg_catalog."default")
TABLESPACE pg_default;
CREATE INDEX
Time: 1039838.028 ms
kobotoolbox=# CREATE INDEX logger_instance_uuid_3a47421a9432ac55_like
kobotoolbox-# ON public.logger_instance USING btree
kobotoolbox-# (uuid COLLATE pg_catalog."default" varchar_pattern_ops)
kobotoolbox-# TABLESPACE pg_default;
CREATE INDEX
Time: 674924.637 ms
root@kobocat:/srv/src/kobocat# ./manage.py migrate --fake logger 0011
Your environment is:"onadata.settings.kc_environ"
DEBUG 2019-02-13 16:25:03,759 base 45 140120381163264 Configuring Raven for host: <raven.conf.remote.RemoteConfig object at 0x7f7021ae6f50>
Operations to perform:
Target specific migration: 0011_add-index-to-instance-uuid_and_xform_uuid, from logger
Running migrations:
Rendering model states... DONE
Applying logger.0011_add-index-to-instance-uuid_and_xform_uuid... FAKED
The following content types are stale and need to be deleted:
auth | user_user_permissions
Any objects related to these content types by a foreign key will also
be deleted. Are you sure you want to delete these content types?
If you're unsure, answer 'no'.
Type 'yes' to continue, or 'no' to cancel: no
root@kobocat:/srv/src/kobocat# ./manage.py sqlmigrate logger 0011
Your environment is:"onadata.settings.kc_environ"
DEBUG 2019-02-13 20:36:37,155 base 8850 139809463703296 Configuring Raven for host: <raven.conf.remote.RemoteConfig object at 0x7f27bd8e7f50>
BEGIN;
DROP INDEX IF EXISTS odk_logger_xform_uuid_idx;
DROP INDEX IF EXISTS odk_logger_instance_uuid_idx;
CREATE INDEX "logger_instance_uuid_3a47421a9432ac55_uniq" ON "logger_instance" ("uuid");
CREATE INDEX "logger_instance_uuid_3a47421a9432ac55_like" ON "logger_instance" ("uuid" varchar_pattern_ops);
CREATE INDEX "logger_xform_uuid_6e2ade741dd6b58b_uniq" ON "logger_xform" ("uuid");
CREATE INDEX "logger_xform_uuid_6e2ade741dd6b58b_like" ON "logger_xform" ("uuid" varchar_pattern_ops);
COMMIT;
CONCURRENTLY
to the CREATE
statementsCREATE INDEX CONCURRENTLY "logger_instance_uuid_3a47421a9432ac55_uniq" ON "logger_instance" ("uuid");
CREATE INDEX CONCURRENTLY "logger_instance_uuid_3a47421a9432ac55_like" ON "logger_instance" ("uuid" varchar_pattern_ops);
CREATE INDEX CONCURRENTLY "logger_xform_uuid_6e2ade741dd6b58b_uniq" ON "logger_xform" ("uuid");
CREATE INDEX CONCURRENTLY "logger_xform_uuid_6e2ade741dd6b58b_like" ON "logger_xform" ("uuid" varchar_pattern_ops);
renice
and ionice
to (attempt to) keep the server responsiveDo this from the Docker host, not inside the container (where it will fail). NB: The same process probably won't be reused for all CREATE
statements. I ran only the first one—and am still waiting for it to finish—so I didn't have to worry about that.
ubuntu@docker-host:~/kobo-docker$ pgrep -f CREATE\ INDEX
12259
ubuntu@docker-host:~/kobo-docker$ sudo renice -n 20 12259
12259 (process ID) old priority 0, new priority 19
ubuntu@docker-host:~/kobo-docker$ ionice -p 12259
none: prio 0
ubuntu@docker-host:~/kobo-docker$ sudo ionice -c 3 -p 12259
ubuntu@docker-host:~/kobo-docker$ ionice -p 12259
idle
Q: Does the Postgres worker process stick around after it's finished CREATE
ing? If so, its CPU and IO priority should be reset to the default.
A: If it does stick around, it hasn't been long enough for me to catch it. Still, checking to make sure all the Postgres' processes nice
nesses are correct is a good idea (or just restart the Postgres container).
@noliveleger, I've worked on https://github.com/kobotoolbox/kobocat/pull/368 and will proceed with merging it. Would you be able to investigate having an index on the
logger_instance.uuid
column? On both HHI PROD and OCHA, Postgres is telling me that there's already an indexuuid
:...but I can't find anything in the Django code that would cause this index to be created. In fact, looking at my local development machine, the
uuid
index is not present. Ideally we should:uuid
for theInstance
model (logger_xform
table)TODO
note when complete: https://github.com/kobotoolbox/kobocat/pull/368/files#diff-018f1a90f318bf1748393be886fee2c0R257