Closed natematykiewicz closed 4 years ago
I just stumbled over this false positive as well.
Maybe a check needs to be added if the class that pluck
is being called on is an ActiveRecord::Relation
or something else.
we had a situation where this Rubocop rule caused the MySQL query planner to not optimize the query correctly, and overwhelmed the DB.
.pluck(:id)
returns an array of numeric IDs
.select(:id)
returns the instances with an 'id' attribute - which is incorrect in the context we observed
the correct "select" statement in our case would have been .select(:id).map(&:id)
which is not preferable to .pluck(:id)
@koic I'm not convinced that this is a good and general Rubocop rule
@tilo can you provide more details?
.select(:id) returns the instances with an 'id' attribute
This is only true if you've called load
on the relation, which can happen if you directly call load
or any enumerable method such as each
or map
.
If you're just nesting relations, it does a subquery.
Foo.where(bar_id: Bar.active.select(:id))
This does 1 query:
SELECT *
FROM foos
WHERE bar_id IN (
SELECT id FROM bars WHERE active
);
If you use pluck, it does 2 queries:
Foo.where(bar_id: Bar.active.pluck(:id))
1 to grab the ids of all the relevant Bar
s.
SELECT id FROM bars WHERE active;
And 1 to get the Foo
records using those Bar
ids.
SELECT *
FROM foos
WHERE bar_id IN (1, 2, 3);
Also you said:
the correct "select" statement in our case would have been .select(:id).map(&:id) which is not preferable to .pluck(:id)
But .pluck(:id)
would result in exactly the same query plan and SQL as .select(:id).map(&:id)
, just the latter will instantiate a bunch of ActiveRecord objects to get the IDs, where the former will just have an array of IDs.
Both of those would do 2 queries -- 1 to select the ids and 1 to grab your rows.
Can you provide some relevant code?
Perhaps what you're meaning to say is that you wanted .pluck(:id)
because a subquery of .select(:id)
overwhelmed the query planner, but this cop pushed you to use a subquery?
I wanted to use pluck(:id)
, but this cop auto-corrected it to use select(:id)
.
Original PR, using pluck(:id)
resulted in this Query Plan:
> HugeTable.where(item_type: 'MyModel', item_id: MyModel.where(user_id: @user.id).pluck(:id)).explain
=>
EXPLAIN for: SELECT `hugetable`.* FROM `hugetable` WHERE `hugetable`.`item_type` = 'MyModel' AND `hugetable`.`item_id` IN (1442298, 2304566, 2304574, 2608158)
+----+-------------+----------+------------+-------+-----------------------------------------+-----------------------------------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+-----------------------------------------+-----------------------------------------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | hugetable | NULL | range | index_hugetable_on_item_type_and_item_id | index_hugetable_on_item_type_and_item_id | 1026 | NULL | 36 | 100.0 | Using index condition |
+----+-------------+----------+------------+-------+-----------------------------------------+-----------------------------------------+---------+------+------+----------+-----------------------+
uses an index
Modified PR, after using rubocop -A
, using select(:id)
, resulted in this Query Plan: (rule Rails/PluckInWhere
)
> HugeTable.where(item_type: 'MyModel', item_id: MyModel.where(user_id: @user.id).select(:id)).explain
=>
EXPLAIN for: SELECT `hugetable`.* FROM `hugetable` WHERE `hugetable`.`item_type` = 'MyModel' AND `hugetable`.`item_id` IN (SELECT `my_models`.`id` FROM `my_models` WHERE `my_models`.`user_id` = 1516073)
+----+-------------+-----------------------+------------+------+------------------------------------------------+-----------------------------------------+---------+----------------------------------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+------------+------+------------------------------------------------+-----------------------------------------+---------+----------------------------------------------+------+----------+-------------+
| 1 | SIMPLE | my_models | NULL | ref | PRIMARY,index_my_models_on_user_id | index_my_models_on_user_id | 5 | const | 4 | 100.0 | Using index |
| 1 | SIMPLE | hugetable | NULL | ref | index_hugetable_on_item_type_and_item_id | index_hugetable_on_item_type_and_item_id | 1026 | const,prod_db.my_models.id | 5 | 100.0 | NULL |
+----+-------------+-----------------------+------------+------+------------------------------------------------+-----------------------------------------+---------+----------------------------------------------+------+----------+-------------+
This added a sub-query, and the MySQL optimizer did no longer use an index on HugeTable! 😱 e.g. NULL HugeTable has around 1.6B rows.. 😱 (yes, we need to get rid of that table)
The select
which caused the sub-query was added by a RuboCop auto-correct for rule Rails/PluckInWhere
> MyModel.where(user_id: @user.id).select(:id)
=> #<ActiveRecord::Relation [#<MyModel id: 1442298>, #<MyModel id: 2304566>, #<MyModel id: 2304574>, #<MyModel id: 2608158>]>
this is returning instances of MyModel, instead of numeric IDs
Perhaps the auto-corrected query with select
should have been: .map(&:id)
:
> HugeTable.where(item_type: 'MyModel', item_id: MyModel.where(user_id: @user.id).select(:id).map(&:id)).explain
=>
EXPLAIN for: SELECT `hugetable`.* FROM `hugetable` WHERE `hugetable`.`item_type` = 'MyModel' AND `hugetable`.`item_id` IN (1442298, 2304566, 2304574, 2608158)
+----+-------------+----------+------------+-------+-----------------------------------------+-----------------------------------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+-----------------------------------------+-----------------------------------------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | hugetable | NULL | range | index_hugetable_on_item_type_and_item_id | index_hugetable_on_item_type_and_item_id | 1026 | NULL | 36 | 100.0 | Using index condition |
+----+-------------+----------+------------+-------+-----------------------------------------+-----------------------------------------+---------+------+------+----------+-----------------------+
and pluck(:id)
is simpler than .select(:id).map(&:id)
so maybe not use this rule..
PM me for more details :)
@natematykiewicz it did not result in the same query plan (see above)
What I was saying is .select(:id).map(&:id)
is the same SQL and Query Plan as .pluck(:id)
, but much less efficient in terms of Ruby object allocations. There's never a time when you want to use .select(:id).map(&:id)
. You should be using either .pluck(:id)
(which does 2 queries) or .select(:id)
(which does 1 query using subselect).
I'm a bit shocked that your subselect query performed so poorly (.select(:id)
). I wouldn't be surprised if huge_tables.item_id
was not the same datatype as my_models.id
; perhaps one is bigint and the other is int, and it's doing a poor job typecasting them mid-query. Or maybe you need to ANALYZE
or VACUUM
the tables at play here.
@koic Thoughts on marking this cop "unsafe", since there's no guarantee that a subquery will actually perform better than 2 queries?
@koic WDYT?
This cop has already been marked as unsafe. OTOH, the doc could be updated. https://docs.rubocop.org/rubocop-rails/cops_rails.html#railspluckinwhere
Thank you!
Rails/PluckInWhere assumes that the
pluck
is being called on an ActiveRecord relation.data
is not an ActiveRecord relation, it's an array of hashes. Thepluck
here is using ActiveSupport's Enumerable#pluck.Expected behavior
Call
pluck
on an array of hashes, and not haveRails/PluckInWhere
report an issue.Actual behavior
Call
pluck
on an array of hashes, and haveRails/PluckInWhere
report "Use select instead of pluck within where query method."Steps to reproduce the problem
RuboCop version
Rubocop 0.88.0, Rubocop-Rails 2.6.0