khiav223577 / deep_pluck

Allow you to pluck attributes from nested associations without loading a bunch of records.
MIT License
460 stars 14 forks source link

Strange results when a model joins multiple tables #29

Closed MasashiYokota closed 5 years ago

MasashiYokota commented 5 years ago

This might be a pluck_all issue ...

User.includes(:articles, :friends).deep_pluck(:name, { articles: :title, friends: :name })

This query shows some strange results. Although I found that it happened when I updated the pluck_all version above 2.0.3 ( this patch might have some problems. ), I have no idea to fix this issue. If you have some idea to fix it, please tell me.

khiav223577 commented 5 years ago

Thanks for your reporting. I'm going to take a look at it. Could you give an example of the expected results and the actual results you get?

khiav223577 commented 5 years ago

Maybe you can fix it by removing redundant includes. (deep_pluck will fire queries automatically without the use of includes.)

User.deep_pluck(:name, { articles: :title, friends: :name })
MasashiYokota commented 5 years ago

Thanks for your reporting. I'm going to take a look at it. Could you give an example of the expected results and the actual results you get?

> User.all
# <User: id: 1, name: 'hoge', ...>
> Article.all
# <Article: id: 1, title: 'title1', user_id: 1 ...>
# <Article: id: 2, title: 'title2', user_id: 1 ...>
> Friend.all
# <Friend: id: 1, title: 'friend1', user_id: 1  ...>

> User.includes(:articles, :friends).deep_pluck(:name, { articles: :title, friends: :name })
# expected => [
#  { 
#    name: 'hoge', 
#    articles: [
#      { title: 'title1' }, 
#      { title: 'title2' } 
#    ],  
#    friends: [
#      { name: 'friend1' }
#    ] 
#  }
#]

# actual => [
# { 
#    name: 'hoge', 
#    articles: [{ title: 'title1'} ], 
#    friends: [{ name: 'friend1' }] 
# }, { 
#    name: 'hoge', 
#    articles: [{ title: 'title1'} ], 
#    friends: [{ name: 'friend1' }]
#  }
# ] 

Although it's a simple example, the problematic query returns results like above. It is strange since it generates multiple results; nevertheless it's expected to return only one result. Do you have any ideas?

Maybe you fix it by removing redundant includes. (deep_pluck will fire queries automatically without the use of includes.)

Thank you for your useful comments. I could fix this issue by removing the includes 😄 But I wonder why that query didn't work by using includes 😓

khiav223577 commented 5 years ago

I think the strange results come from the misusing of includes. In Rails 5, using includes along with pluck will turn includes into left_joins.

Let user(id = 1) has 3 posts You will get one row if you select it by id:

User.where(id: 1).count 
# => 1

You will get three rows if you select it by id and joins posts:

User.where(id: 1).joins(:posts).count 
# => 3

The following examples show that includes + pluck is the same as left_joins + pluck

includes + pluck

User.where(id: 1).includes(:posts).pluck(:id)
# SELECT `users`.`id` FROM `users` LEFT OUTER JOIN `posts` ON `posts`.`user_id` = `users`.`id` WHERE `users`.`id` = 1
# => [1, 1, 1] 

left_joins + pluck

User.where(id: 1).left_joins(:posts).pluck(:id)
# SELECT `users`.`id` FROM `users` LEFT OUTER JOIN `posts` ON `posts`.`user_id` = `users`.`id` WHERE `users`.`id` = 1
# => [1, 1, 1] 

includes + deep_pluck

User.where(id: 1).includes(:posts).deep_pluck(:id, posts: :title)
# SELECT users.id FROM `users` LEFT OUTER JOIN `posts` ON `posts`.`user_id` = `users`.`id` WHERE `users`.`id` = 1
# SELECT posts.user_id, `posts`.`title` FROM `posts` WHERE `posts`.`user_id` = 1
# => [
#  {
#    "id"=>1, 
#    :posts=>[
#      {"title"=>"John's post1"}, 
#      {"title"=>"John's post2"}, 
#      {"title"=>"John's post3"}
#    ]
#  }, 
#  {
#    "id"=>1, 
#    :posts=>[
#      {"title"=>"John's post1"}, 
#      {"title"=>"John's post2"}, 
#      {"title"=>"John's post3"}
#    ]
#  }, 
#  {
#    "id"=>1, 
#    :posts=>[
#      {"title"=>"John's post1"}, 
#      {"title"=>"John's post2"}, 
#      {"title"=>"John's post3"}
#    ]
#  }, 
MasashiYokota commented 5 years ago

Thank you for your comments. But my explanation might be insufficient to reproduce it.

Perhaps, you could get some strange returns by including two or more tables like below:

User.includes(:articles, :friends).deep_pluck(:name, { articles: :title, friends: :name })

In summary, the three conditions to reproduce it are as below:

  1. pluck_all version above 2.0.3
  2. Eager loading multiple tables by using includes or left_joins
  3. Using deep_pluck

Although I could replicate this issue following the conditions in my project, rails c and so on, please tell me if you can't.

khiav223577 commented 5 years ago

You mean that deep_pluck will return multiple results in some situation but it should return only one. I can reproduce it if I didn't misunderstand what you mean. But I wonder if it is kind of feature or not.

pluck

Let see what the native pluck method behaves:

User.where(id: 1).includes(:posts).pluck(:id)
# => [1, 1, 1]

We expect there is only one user with id = 1 But we get 3 results here.

pluck_all

In pluck_all version below 2.0.3, the query result is inconsistent with pluck.

User.where(id: 1).includes(:posts).pluck_all(:id)
# => [{"id"=>1}] 

But I expect it to be consistent with pluck method, which means the following statement should always be true:

users.pluck_all(:id) == users.pluck(:id).map{|s| { "id" => s } }
# => true

So I did some changes in https://github.com/khiav223577/pluck_all/pull/30 and release 2.0.3. After that the behavior of pluck_all changed to:

User.where(id: 1).includes(:posts).pluck_all(:id)
# => [{"id"=>1}, {"id"=>1}, {"id"=>1}]

deep_pluck

The change of the behavior of pluck_all will affect the behavior of deep_pluck in that I would like to make the following statement to be true:

users.pluck_all(:id) == users.deep_pluck(:id)
# => true

It is kind of strange if them behave differently. For example:

users = User.where(id: 1).includes(:posts)

users.pluck_all(:id)
# => [{"id"=>1}, {"id"=>1}, {"id"=>1}]

users.deep_pluck(:id)
# => [{"id"=>1}]

But now, we may get different results even you are using the same version of deep_pluck. Maybe It will be better that I update the dependency of deep_pluck and make it depends on pluck_all ~> 2.0.4. What do you think?

MasashiYokota commented 5 years ago

Sorry for my poor English 😓 And, thank you for getting back to me so quickly and carefully considering my request.

You mean that deep_pluck will return multiple results in some situation but it should return only one. I can reproduce it if I didn't misunderstand what you mean. But I wonder if it is kind of feature or not.

I think you might overlook the below condition I wrote:

  1. Eager loading multiple tables by using includes or left_joins

"Including multiple tables" is one of the most important condition to reproduce it. So, you have to change the code

User.where(id: 1).includes(:posts).pluck_all(:id, { post: :id })

to

User.where(id: 1).includes(:posts, :favorites).pluck_all(:id, { post: :id, favorites: :id })

Please try to include two tables and call deep_pluck. It will return some strange values.

khiav223577 commented 5 years ago

The following is my test results image

Could you give me more details of what does strange values actually mean?

MasashiYokota commented 5 years ago

Thanks. I think you could have reproduced my problem.

And this is the example of the strange values in my dev env. As you can see, deep_pluck returns values too much when the three conditions I wrote are satisfied.

[55] pry(main)> User.where(id: 3).includes(:spot_memos).deep_pluck(:id, spot_memos: :id)
   (1.1ms)  SELECT users.id FROM "users" LEFT OUTER JOIN "spot_memos" ON "spot_memos"."user_id" = "users"."id" WHERE "users"."id" = 3
   (0.7ms)  SELECT spot_memos.user_id, "spot_memos"."id" FROM "spot_memos" WHERE "spot_memos"."user_id" = 3
=> [
  {"id"=>3, :spot_memos=>[{"id"=>2082}, {"id"=>2081}]}, 
  {"id"=>3, :spot_memos=>[{"id"=>2082}, {"id"=>2081}]}
]  # => Expected to be `[ {"id"=>3, :spot_memos=>[{"id"=>2082}, {"id"=>2081}]}]`

[56] pry(main)> User.where(id: 3).includes(:spot_memos).deep_pluck(:id, spot_memos: :id).length
   (1.0ms)  SELECT users.id FROM "users" LEFT OUTER JOIN "spot_memos" ON "spot_memos"."user_id" = "users"."id" WHERE "users"."id" = 3
   (0.6ms)  SELECT spot_memos.user_id, "spot_memos"."id" FROM "spot_memos" WHERE "spot_memos"."user_id" = 3
=> 2 # Expected to be `1`

[57] pry(main)> User.where(id: 3).includes(:spot_memos, :client).deep_pluck(:id, spot_memos: :id, client: :id)
   (1.2ms)  SELECT users.id, users.client_id FROM "users" LEFT OUTER JOIN "spot_memos" ON "spot_memos"."user_id" = "users"."id" LEFT OUTER JOIN "clients" ON "clients"."id" = "users"."client_id" WHERE "users"."id" = 3
   (0.6ms)  SELECT spot_memos.user_id, "spot_memos"."id" FROM "spot_memos" WHERE "spot_memos"."user_id" = 3
   (0.6ms)  SELECT clients.id FROM "clients" WHERE "clients"."id" = 3
=> [
  {"id"=>3, :spot_memos=>[{"id"=>2082}, {"id"=>2081}], :client=>{"id"=>3}}, 
  {"id"=>3, :spot_memos=>[{"id"=>2082}, {"id"=>2081}], :client=>{"id"=>3}}
] # => Expected to be `[{"id"=>3, :spot_memos=>[{"id"=>2082}, {"id"=>2081}], :client=>{"id"=>3}}]`

[58] pry(main)> User.where(id: 3).includes(:spot_memos, :client).deep_pluck(:id, spot_memos: :id, client: :id).length
   (4.4ms)  SELECT users.id, users.client_id FROM "users" LEFT OUTER JOIN "spot_memos" ON "spot_memos"."user_id" = "users"."id" LEFT OUTER JOIN "clients" ON "clients"."id" = "users"."client_id" WHERE "users"."id" = 3
   (0.7ms)  SELECT spot_memos.user_id, "spot_memos"."id" FROM "spot_memos" WHERE "spot_memos"."user_id" = 3
   (0.6ms)  SELECT clients.id FROM "clients" WHERE "clients"."id" = 3
=> 2 # Expected to be `1`

And, seeing your results I found my mistakes... I seemed to misunderstand this issue a little. It seems that including multiple tables is not necessary. However, getting child table attributes is important to replicate the issue.

I've rewritten the conditions to reproduce the issue

Could you think of anything that might have caused this issue?

khiav223577 commented 5 years ago

What do you expect to get from the following line?

User.where(id: 3).includes(:spot_memos).pluck(:id)
# => ?
MasashiYokota commented 5 years ago

I expect below:

User.where(id: 3).includes(:spot_memos).pluck(:id)
# => [{ "id" => 3 }]
MasashiYokota commented 5 years ago

Oh, I got it! I've overlooked your comments:

So I did some changes in khiav223577/pluck_all#30 and release 2.0.3. After that the behavior of pluck_all changed to:

User.where(id: 1).includes(:posts).pluck_all(:id)
# => [{"id"=>1}, {"id"=>1}, {"id"=>1}]

So deep_pluck expects that code's result to be below, right?

User.where(id: 3).includes(:spot_memos).pluck(:id)
# => [{ "id" => 3 }, { "id" => 3 }]
khiav223577 commented 5 years ago

Yes. You will get two results in Rails 5. So if we replace pluck with pluck_all, it's expected to get two results, too:

users = User.where(id: 3).includes(:spot_memos)
users.pluck(:id)
# => [3, 3]

users.pluck_all(:id)
# => [{ "id" => 3 }, { "id" => 3 }]
MasashiYokota commented 5 years ago

Thank you for your kind and courteous comments many many times. I have no more problems and questions. I close this issue.