Closed dongbohu closed 5 years ago
Because the node table is so small ( < 50,000 nodes), there will be very little cost to indexing them right? So if this could even theoretically speed up search, I think we should add it.
@dhimmel: Yes, these indexes can be added/deleted/updated in a few seconds.
I'd agree. Seems like it'd help. As long as it doesn't hurt, I'm for it.
Here is a SQL script that adds the extra indexes. It should be run after the database has been populated.
/* Add extra indexes to speed up searching in "node" and "metanode" tables. */
-- Enable "pg_trgm" extension
CREATE EXTENSION IF NOT EXISTS pg_trgm;
-- Create GIN index for substring and trigram searches of "name" field in "node" table
CREATE INDEX name_trgm_idx ON dj_hetmech_app_node USING GIN (name gin_trgm_ops);
-- Create GIN index for prefix search of "identifier" field in "Node" table
CREATE INDEX identifier_trgm_idx ON dj_hetmech_app_node USING GIN (identifier gin_trgm_ops);
-- Create default (B-tree) index for exact match of "abbreviation" field in "metanode" table
CREATE INDEX abbr_idx on dj_hetmech_app_metanode (abbreviation);
It can be run by:
psql -U <username> -h <hostname> -f <this_script> -d <db_name>
I added these indexes on production database.
@dongbohu is this not something that can be integrated into models.py
? All of the other index specification occurs there. Would be great if this could be integrated into the django workflow somehow, right?
Is this related: https://stackoverflow.com/a/51880653/4651668
It's possible to integrate these indexes in models.py
, but usually the indexes should be added after database is populated. See a discussion here:
https://dba.stackexchange.com/questions/66182/index-creation-before-or-after-loading-data
I had bad experience defining indexes before loading the data in.
The main reason to adding the index after the database has been created seems to be speed to create them. However, since the Nodes table is small that doesn't seem to be an issue here?
My worry is not whether the indexes are creating during db population or afterwards. My worry is to have their creation be completely separate from the hetmech-backend codebase. Is there an automated way to build these indexes? Should these be part of the populate_database management command?
When index was generated while the table is constantly updated, not only is the loading speed slowed down, but the index bloated and became non-optimal. That's why Postgres also provides reindex
command. (Since Node
table is small, I confess that maybe I am a little paranoid.)
populate_database
management command sounds a good place to add these indexes.
@dhimmel and @vincerubinetti: I added a few indexes in
identifier
andname
fields inNode
table on a clone of current backend DB. These indexes are supposed to make prefix search onidentifier
and substring/trigram searches onname
field up to ten times faster (from a few hundred milliseconds to ~10 milliseconds). I am running the optimized DB on test AWS EC2 instance: http://35.175.113.38/v1/nodes/?search=xxxPlease replace
xxx
with whatever string you want to search and compare its performance with the production server: https://search-api.het.io/v1/nodes/?search=xxx and tell me whether you feel any difference. If you do, I will apply these indexes on the production DB. Thanks.