rails / arel

A Relational Algebra
2.06k stars 390 forks source link

Add special treatment for an In node where the right side is [nil, true/false] #528

Closed sjaveed closed 6 years ago

sjaveed commented 6 years ago

This allows us to generate special SQL for a NotIn node which contains just a two element array that consists solely of nil and either true or false.

Translated to ActiveRecord, this means that

  User.where(active: [nil, true])

will generate:

  SELECT * FROM "users" WHERE "users"."active" IS NOT FALSE;

instead of the old:

  SELECT * from "users" WHERE "users"."active" IS NULL OR "users"."active" = true

and

  User.where(active: [nil, false])

will generate:

  SELECT * FROM "users" WHERE "users"."active" IS NOT TRUE;
rails-bot commented 6 years ago

r? @rafaelfranca

(@rails-bot has picked a reviewer for you, use r? to override)

sjaveed commented 6 years ago

Hah I was trying a couple different approaches - looks like I left the wrong one in. Fixed!

sjaveed commented 6 years ago

@rafaelfranca any thoughts on this?

sjaveed commented 6 years ago

@matthewd this is an alternate approach that generates IS TRUE and IS FALSE as shortcuts in the right context. This seems like a lighter approach than #526 before. Thoughts?

rafaelfranca commented 6 years ago
User.where(active: [nil, true])
#=> SELECT * FROM "users" WHERE "users"."active" IS NOT FALSE;

Will not this now return records which values are not NULL, TRUE and FALSE?

This make sense to boolean columns but the code you implemented will also generate that SQL for string columns that will include more than just the records with NULL and TRUE in that column that is what I expect when I write that method.

matthewd commented 6 years ago

I think I just don't see a benefit to special-casing this... it makes the SQL a bit shorter, but as @rafaelfranca points out, there are cases where it could introduce an ambiguity. And in the ones it doesn't, the SQL engine will treat both spellings identically.

sjaveed commented 6 years ago

@rafaelfranca

Will not this now return records which values are not NULL, TRUE and FALSE?

The idea is that for boolean columns it will return values which are NULL or TRUE only

This make sense to boolean columns but the code you implemented will also generate that SQL for string columns that will include more than just the records with NULL and TRUE in that column that is

That makes sense and is not the desired result at all. I'll see if it's possible to ensure this new behavior only for boolean columns.

what I expect when I write that method.

sjaveed commented 6 years ago

@matthewd

I think I just don't see a benefit to special-casing this... it makes the SQL a bit shorter,

I think the biggest benefit here is that it provides a way to generate IS TRUE / IS FALSE syntax without breaking backwards compatibility. It also removes an OR from the generated SQL which seems to always cause subtle problems.

If those have any value for boolean columns, I'll work to ensure this behavior is only triggered for boolean columns if possible, thus removing any issues with string columns. If those benefits aren't valuable, however, I'm willing to drop this.

Could I get a consensus on whether or not to continue?

Thanks!

but as @rafaelfranca points out, there are cases where it could introduce an ambiguity. And in the ones it doesn't, the SQL engine will treat both spellings identically.

matthewd commented 6 years ago

As we've established, though, it just breaks different backwards compatibility.

I'm dubious about the value of the special-use IS TRUE at all, but I'd definitely be interested in adding a Node for its more generic cousin IS [NOT] DISTINCT FROM, if we don't already have one.

Avoiding OR is uninteresting to me: it's generated, so any paren-placement confusion is a bug. As a general observation, we don't try to second-guess the SQL engine to "simplify" the SQL: it'll do a much better job of that for itself.

sjaveed commented 6 years ago

Not sure it breaks any backwards compatibility if it's restricted to boolean columns only. However, I'll close this PR as a no-go and see what I can do about adding a node for IS DISTINCT FROM