mbleigh / acts-as-taggable-on

A tagging plugin for Rails applications that allows for custom tagging along dynamic contexts.
http://mbleigh.lighthouseapp.com/projects/10116-acts-as-taggable-on
MIT License
4.97k stars 1.2k forks source link

Use exists clause over IN Tag.for_context, Tag.for_tenant to improve performance #1087

Closed aovertus closed 2 years ago

aovertus commented 2 years ago

Looking at documentation and articles it seem that IN is only beneficial on really small dataset.

IN:

Returns true if a specified value matches any value in a subquery or a list.

Exists:

Returns true if a subquery contains any rows.

If anyone as deeper understanding of IN over EXISTS. It looks like more recent engine handle those query the same way.

The same strategy exists for https://github.com/mbleigh/acts-as-taggable-on/blob/master/lib/acts_as_taggable_on/taggable/tagged_with_query/any_tags_query.rb#L20-L22

Here is the example we ran into production which generated performance issues

Engine:

MySQL 5.7

Dataset

~ 7_000_000 taggins records ~ 1300 tags

Current implementation

3.0.2 :003 > ActsAsTaggableOn::Tag.for_context('context').to_sql
=> "SELECT DISTINCT tags.* FROM `tags` INNER JOIN `taggings` ON `taggings`.`tag_id` = `tags`.`id` WHERE (taggings.context = 'context')"

Query run in ~ 58.74s

Patch

3.0.2 :004 > ActsAsTaggableOn::Tag.where(ActsAsTaggableOn::Tagging.where(context: 'context').arel.exists).to_sql
# => "SELECT `tags`.* FROM `tags` WHERE EXISTS (SELECT `taggings`.* FROM `taggings` WHERE `taggings`.`context` = 'context')"

Query run in ~ 149ms