westonganger / paper_trail-association_tracking

Plugin for the PaperTrail gem to track and reify associations
MIT License
128 stars 38 forks source link

stop using subquery #18

Closed kei500 closed 4 years ago

kei500 commented 4 years ago

@westonganger

I want to stop using subquery because it sometimes causes high load of MySQL server.

For example, in my environment, Model.reify(has_many: true) executes SQL below:

SELECT `versions`.*
FROM `versions
WHERE `versions`.`id` IN (
  SELECT MIN(version_id)
  FROM `version_associations`
  INNER JOIN `versions` ON `versions`.`id` = `version_associations`.`version_id`
  WHERE (foreign_key_name = 'item_type_id')
  AND (foreign_key_id = 1)
  AND (`version_associations`.`foreign_type` = 'ItemType' OR `version_associations`.`foreign_type` IS NULL)
  AND (versions.item_type = 'Item')
  AND (created_at >= '2020-04-19 18:37:09' OR transaction_id = 2008)
  GROUP BY item_id
);

Explain of this SQL is:

+----+--------------------+----------------------+-------------+-----------------------------------------------------------------------------------+-------------------------------------------+---------+-----------------------------------------------------+------+---------------------------------------------------------------------+
| id | select_type        | table                | type        | possible_keys                                                                     | key                                       | key_len | ref                                                 | rows | Extra                                                               |
+----+--------------------+----------------------+-------------+-----------------------------------------------------------------------------------+-------------------------------------------+---------+-----------------------------------------------------+------+---------------------------------------------------------------------+
|  1 | PRIMARY            | versions             | ALL         | NULL                                                                              | NULL                                      | NULL    | NULL                                                | 1959 | Using where                                                         |
|  2 | DEPENDENT SUBQUERY | version_associations | ref_or_null | index_version_assciations_on_version_id,index_version_associations_on_foreign_key | index_version_associations_on_foreign_key | 2050    | const,const,const                                   |   31 | Using index condition; Using where; Using temporary; Using filesort |
|  2 | DEPENDENT SUBQUERY | versions             | eq_ref      | PRIMARY,index_versions_on_transaction_id,index_versions_on_item_type_item_id      | PRIMARY                                   | 4       | *******_development.version_associations.version_id |    1 | Using where                                                         |
+----+--------------------+----------------------+-------------+-----------------------------------------------------------------------------------+-------------------------------------------+---------+-----------------------------------------------------+------+---------------------------------------------------------------------+

As you can see, versions table is full scanned. The reason why index isn't used is a specification of MySQL.

So, I separated this SQL into two parts.

mysql> explain SELECT `versions`.* FROM `versions` WHERE `versions`.`id` IN (SELECT MIN(version_id) FROM `version_associations` INNER JOIN `versions` ON `versions`.`id` = `version_associations`.`version_id` WHERE (foreign_key_name = 'item_type_id') AND (foreign_key_id = 1) AND (`version_associations`.`foreign_type` = 'ItemType' OR `version_associations`.`foreign_type` IS NULL) AND (versions.item_type = 'Item') AND (created_at >= '2020-04-19 18:37:09' OR transaction_id = 2008) GROUP BY item_id);
+----+-------------+----------------------+-------------+-----------------------------------------------------------------------------------+-------------------------------------------+---------+-----------------------------------------------------+------+---------------------------------------------------------------------+
| id | select_type | table                | type        | possible_keys                                                                     | key                                       | key_len | ref                                                 | rows | Extra                                                               |
+----+-------------+----------------------+-------------+-----------------------------------------------------------------------------------+-------------------------------------------+---------+-----------------------------------------------------+------+---------------------------------------------------------------------+
|  1 | SIMPLE      | version_associations | ref_or_null | index_version_assciations_on_version_id,index_version_associations_on_foreign_key | index_version_associations_on_foreign_key | 2050    | const,const,const                                   |   31 | Using index condition; Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | versions             | eq_ref      | PRIMARY,index_versions_on_transaction_id,index_versions_on_item_type_item_id      | PRIMARY                                   | 4       | *******_development.version_associations.version_id |    1 | Using where                                                         |
+----+-------------+----------------------+-------------+-----------------------------------------------------------------------------------+-------------------------------------------+---------+-----------------------------------------------------+------+---------------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql> explain SELECT `versions`.* FROM `versions` WHERE `versions`.`id` IN (2008, 2009, 2010, 2011, 2012, 2013);
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table    | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | versions | range | PRIMARY       | PRIMARY | 4       | NULL |    6 | Using where |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
westonganger commented 4 years ago

Merged. Thanks for your contribution,

kfalconer commented 4 years ago

Any ETA on this being released? We are having the same issue in production, and this PR resolved performance issue.

westonganger commented 4 years ago

@kfalconer thanks for your additional confirmation of the improvement.

v2.1.0 is now released which contains this fix.