mbleigh / acts-as-taggable-on

A tagging plugin for Rails applications that allows for custom tagging along dynamic contexts.
http://mbleigh.lighthouseapp.com/projects/10116-acts-as-taggable-on
MIT License
4.97k stars 1.2k forks source link

Cache miss on PostgreSQL when using forced lowercase tags #657

Open denschub opened 9 years ago

denschub commented 9 years ago

Pull request #498 introduced a lowercased tag checking by using database functions. However, this causes a miss of index_tags_on_name on PostgreSQL, which results in a very bad performance. Creating an index on lowercased tag names (CREATE UNIQUE INDEX index_tags_on_lower_name on tags (lower(name));) solves the problem.

Tag loading without LOWER()

# EXPLAIN ANALYZE SELECT "tags".* FROM "tags" WHERE name = 'gesellschaft' OR name = 'mensch' OR name = 'antifa' OR name = 'antira' OR name = 'anarchismus';
                                                                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tags  (cost=22.16..41.71 rows=5 width=20) (actual time=0.124..0.129 rows=5 loops=1)
   Recheck Cond: (((name)::text = 'gesellschaft'::text) OR ((name)::text = 'mensch'::text) OR ((name)::text = 'antifa'::text) OR ((name)::text = 'antira'::text) OR ((name)::text = 'anarchismus'::text))
   ->  BitmapOr  (cost=22.16..22.16 rows=5 width=0) (actual time=0.116..0.116 rows=0 loops=1)
         ->  Bitmap Index Scan on index_tags_on_name  (cost=0.00..4.43 rows=1 width=0) (actual time=0.043..0.043 rows=1 loops=1)
               Index Cond: ((name)::text = 'gesellschaft'::text)
         ->  Bitmap Index Scan on index_tags_on_name  (cost=0.00..4.43 rows=1 width=0) (actual time=0.022..0.022 rows=1 loops=1)
               Index Cond: ((name)::text = 'mensch'::text)
         ->  Bitmap Index Scan on index_tags_on_name  (cost=0.00..4.43 rows=1 width=0) (actual time=0.022..0.022 rows=1 loops=1)
               Index Cond: ((name)::text = 'antifa'::text)
         ->  Bitmap Index Scan on index_tags_on_name  (cost=0.00..4.43 rows=1 width=0) (actual time=0.014..0.014 rows=1 loops=1)
               Index Cond: ((name)::text = 'antira'::text)
         ->  Bitmap Index Scan on index_tags_on_name  (cost=0.00..4.43 rows=1 width=0) (actual time=0.014..0.014 rows=1 loops=1)
               Index Cond: ((name)::text = 'anarchismus'::text)
 Total runtime: 0.148 ms
(14 rows)

Tag loading with LOWER()

# EXPLAIN ANALYZE SELECT "tags".* FROM "tags" WHERE (LOWER(name) = LOWER('gesellschaft') OR LOWER(name) = LOWER('mensch') OR LOWER(name) = LOWER('antifa') OR LOWER(name) = LOWER('antira') OR LOWER(name) = LOWER('anarchismus'));
                                                                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on tags  (cost=0.00..20896.03 rows=12184 width=20) (actual time=1.188..12205.770 rows=5 loops=1)
   Filter: ((lower((name)::text) = 'gesellschaft'::text) OR (lower((name)::text) = 'mensch'::text) OR (lower((name)::text) = 'antifa'::text) OR (lower((name)::text) = 'antira'::text) OR (lower((name)::text) = 'anarchismus'::text))
   Rows Removed by Filter: 475655
 Total runtime: 12205.788 ms
(4 rows)

Tag loading with LOWER() and added index

# EXPLAIN ANALYZE SELECT "tags".* FROM "tags" WHERE (LOWER(name) = LOWER('gesellschaft') OR LOWER(name) = LOWER('mensch') OR LOWER(name) = LOWER('antifa') OR LOWER(name) = LOWER('antira') OR LOWER(name) = LOWER('anarchismus'));
                                                                                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tags  (cost=306.02..4389.24 rows=11773 width=20) (actual time=0.225..0.236 rows=5 loops=1)
   Recheck Cond: ((lower((name)::text) = 'gesellschaft'::text) OR (lower((name)::text) = 'mensch'::text) OR (lower((name)::text) = 'antifa'::text) OR (lower((name)::text) = 'antira'::text) OR (lower((name)::text) = 'anarchismus'::text))
   ->  BitmapOr  (cost=306.02..306.02 rows=11892 width=0) (actual time=0.218..0.218 rows=0 loops=1)
         ->  Bitmap Index Scan on index_tags_on_lower_name  (cost=0.00..58.26 rows=2378 width=0) (actual time=0.067..0.067 rows=1 loops=1)
               Index Cond: (lower((name)::text) = 'gesellschaft'::text)
         ->  Bitmap Index Scan on index_tags_on_lower_name  (cost=0.00..58.26 rows=2378 width=0) (actual time=0.056..0.056 rows=1 loops=1)
               Index Cond: (lower((name)::text) = 'mensch'::text)
         ->  Bitmap Index Scan on index_tags_on_lower_name  (cost=0.00..58.26 rows=2378 width=0) (actual time=0.040..0.040 rows=1 loops=1)
               Index Cond: (lower((name)::text) = 'antifa'::text)
         ->  Bitmap Index Scan on index_tags_on_lower_name  (cost=0.00..58.26 rows=2378 width=0) (actual time=0.026..0.026 rows=1 loops=1)
               Index Cond: (lower((name)::text) = 'antira'::text)
         ->  Bitmap Index Scan on index_tags_on_lower_name  (cost=0.00..58.26 rows=2378 width=0) (actual time=0.027..0.027 rows=1 loops=1)
               Index Cond: (lower((name)::text) = 'anarchismus'::text)
 Total runtime: 0.268 ms
(14 rows)
seuros commented 9 years ago

can you send a PR with this migration ?

denschub commented 9 years ago

Sure, I could simply add a unique index to the lowercased tag names, but this will create issues with setups not using enforced lowercased tag names. What's the best approach to add setup-dependent indexes in your setup?

seuros commented 9 years ago

I think PG will ignore the index if the setup don't use it.

denschub commented 9 years ago

Creating a unique index on non-unique datasets will raise a bunch of errors.

seuros commented 9 years ago

Right, good catch. What about adding a commented migration and we explain the issue ?