rzane / baby_squeel

:pig: An expressive query DSL for Active Record
MIT License
500 stars 49 forks source link

Boolean column reference without operator is only sometimes supported #129

Open owst opened 8 months ago

owst commented 8 months ago

Issue

In some contexts, a boolean column can be referenced without an explicit operator, but in others an exception is raised. I'm not sure that this is intentionally allowed in any context, but it would be nice if it were supported for convenience.

Setup

If we have a table:

# The generated SQL assumes SQLite in these examples
create_table :posts, force: true do |t|
  t.integer :view_count
  t.boolean :published
end

then we can query for all published posts with:

# SELECT "posts".* FROM "posts" WHERE "posts"."published" = 1
Post.where.has { published == true }

and can query for only viewed published posts with:

# SELECT "posts".* FROM "posts" WHERE "posts"."view_count" >= 1 AND "posts"."published" = 1
Post.where.has { (view_count >= 1) & (published == true) }

and equivalently, we can swap the order of the conditions:

# SELECT "posts".* FROM "posts" WHERE "posts"."published" = 1 AND "posts"."view_count" >= 1
Post.where.has { (published == true) & (view_count >= 1) }

Issue demonstration

Our inner Rubyist might not like the explicit published == true comparison in the first compound query, so we successfully rewrite to:

# SELECT "posts".* FROM "posts" WHERE "posts"."view_count" >= 1 AND "posts"."published"
Post.where.has { (view_count >= 1) & published }

but if we try the same rewrite on the reordered condition then an exception is raised:

Post.where.has { published & (view_count >= 1) }

# raises NoMethodError: undefined method `and' for #<struct Arel::Attributes::Attribute relation=#<Arel::Table:0x000000011408fee8 @name="posts", @klass=Post(id: integer, view_count: integer, published: boolean), @type_caster=#<ActiveRecord::TypeCaster::Map:0x000000011408fda8 @klass=Post(id: integer, view_count: integer, published: boolean)>, @table_alias=nil>, name="published">

similarly if we try to filter for all published posts without explicit comparison a (different) error is raised:

Post.where.has { published }

# raises ArgumentError: Unsupported argument type: #<struct Arel::Attributes::Attribute relation=#<Arel::Table:0x000000011726c0b0 @name="posts", @klass=Post(id: integer, view_count: integer, published: boolean), @type_caster=#<ActiveRecord::TypeCaster::Map:0x00000001172676c8 @klass=Post(id: integer, view_count: integer, published: boolean)>, @table_alias=nil>, name="published"> (Arel::Attributes::Attribute)

Expected behaviour

Either that a (boolean) column reference without explicit operator is always supported, or always raises an error. So either all of:

Post.where.has { (view_count >= 1) & published }
Post.where.has { published & (view_count >= 1) }
Post.where.has { published }

are supported and operate in the expected way, or all raise an error.

Perhaps a reference without operator to a boolean column should generate an implicit (column == true) node rather than being passed through as a Arel::Attributes::Attribute? A bare reference to a non-boolean column should raise an explicit error.

Reproduction

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  gem "rails", "= 7.1.2"
  gem "baby_squeel", "= 3.0.0"
  gem "sqlite3"
end

require "active_record"
require "minitest/autorun"
require "logger"

ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(STDOUT)

ActiveRecord::Schema.define do
  create_table :posts, force: true do |t|
    t.integer :view_count
    t.boolean :published
  end
end

class Post < ActiveRecord::Base
end

class BugTest < Minitest::Test
  def test_association_stuff
    post = Post.create!(view_count: 1, published: true)

    assert_equal Post.where.has { published == true }, [post]
    assert_equal Post.where.has { (view_count >= 1) & (published == true) }, [post]
    assert_equal Post.where.has { (published == true) & (view_count >= 1) }, [post]

    assert_equal Post.where.has { (view_count >= 1) & published }, [post]
    assert_raises(NoMethodError, "undefined method `and'") do
      assert_equal Post.where.has { published & (view_count >= 1) }, [post]
    end
    assert_raises(ArgumentError, "Unsupported argument type") do
      assert_equal Post.where.has { published }, [post]
    end
  end
end
rzane commented 8 months ago

Thank you for the detailed request. I agree that this does feel a bit inconsistent, and it'd be nice to fix it in some way.

What behavior would you expect for a non-boolean column?

Post.where.has { title }

I think what we're seeing in your examples is just Arel's behavior. So, for example:

irb> ugly = Author.arel_table[:ugly]
=> #<struct Arel::Attributes::Attribute ...>

irb> > name = Author.arel_table[:name]
=> #<struct Arel::Attributes::Attribute ...>

irb> Author.where(ugly)
ArgumentError: Unsupported argument type: #<struct Arel::Attributes::Attribute ...>

irb> Author.where(ugly.and(name.eq('Rick')))
NoMethodError: undefined method `and' for #<struct Arel::Attributes::Attribute ...>

irb> Author.where(name.eq('Rick').and(ugly))
Author Load (0.1ms)  SELECT "authors".* FROM "authors" WHERE "authors"."name" = 'Rick' AND "authors"."ugly" /* loading for pp */ LIMIT ?  [["LIMIT", 11]]
=> []

I haven't really looked into this, but I suspect that it might be a less invasive change to support usage like this (note the question mark):

Author.where.has { ugly? }

I'm open to other solutions, too. Would you be willing to put together a PR? If not, that's totally okay!

owst commented 8 months ago

Thanks for the quick reply @rzane and for the Arel demonstration!

For any of

Post.where.has { title }
Post.where.has { title & ugly}
Post.where.has { ugly & title }

I think I'd expect an error to be raised - perhaps something along the lines of ArgumentError: Non-boolean column :title used in boolean context - did you forget a comparison operator?

I like how baby_squeel makes the ActiveRecord/Arel interface more intuitive to use - therefore I wonder if it's feasible to improve on the underlying Arel behaviour. Do you think it's feasible to:

  1. track the context in which an attribute reference is made
  2. know if that context is a boolean context - OTTOMH, that would be in a WHERE clause, at the top-level, or directly under any & or | (or ! if baby_squeel supports it)
  3. raise an error if the attribute is not a boolean attribute, and otherwise generate the equivalent of (column == true)

I'm happy to work on a PR, if we can decide what the desired change is 😄

rzane commented 8 months ago

I'm not opposed to this as a proposal. I'd be okay with raising an error for non-boolean attributes, though I could also see an argument for:

Post.where.has { title }
# SELECT * FROM posts WHERE title IS NOT NULL

I think we might also want to support:

Author.where.has { !ugly  }

You already hit on this, but this usage might complicate your proposal because ugly would have a different meaning in these two examples:

Author.selecting { ugly }
Author.where.has { ugly }

So, I suggest resolving ugly? to ugly == true in resolver.rb might simplify the implementation. The use of the question mark indicates intent, so you wouldn't need to track the context in which an attribute reference is made. You could just raise here if an attribute is returned.

Here are a few places to look:

rocket-turtle commented 8 months ago

Interesting report.

For me it seems, that this is an AREL feature / bug and was never an intentional feature of baby_squeel.

If I read it correct it would be best to raise an error in BabySqueel::Resolver#resolve if something is resolved to an attribute. (and not an BabySqueel::Attribute?)

@owst Is it possible to "fix" your code and add == true to the boolean values?


So, I suggest resolving ugly? to ugly == true in resolver.rb might simplify the implementation.

I would vote against this. Because normally methods ending with a question mark return true or false. I do not see a benefit of Author.where.has { published? } over Author.where.has { published.eq(true) }

BabySqueel::Attribute - this "wraps" an Arel::Attributes::Attribute with some custom behavior. You could define &, |, and ! here to customize how they behave.

That might not be so easy beacause & is just an alias for and etc. and they should behave the same. https://github.com/rzane/baby_squeel/blob/master/lib/baby_squeel/operators.rb

owst commented 8 months ago

@rzane @rocket-turtle - thanks for the thoughts so far - I've created a draft PR (#131) that explicitly supports "plain references" to attributes, coercing boolean attributes to col == true and raising for non-boolean attributes. Could you take a look and let me know what you think, and I can polish further. Thanks!

rocket-turtle commented 8 months ago

The MR seems to be a good start.

Corrosponding to Author.where.has { ugly & (id == 1) } this should also work: Author.where.has { (ugly).and(id == 1) } etc.


I still don't think the comlexety is worth the gain of writing Author.where.has { published } over Author.where.has { published.eq(true) } or Author.where.has { published == true }.

owst commented 8 months ago

Thanks @rocket-turtle. My main concern is the "only sometimes works" aspect of this issue, rather than particularly allowing plain boolean attribute references at the top-level of a where.has block. That said, I see a handful of options for how to proceed from here:

  1. Do nothing
  2. Keep going with the draft PR to support boolean column attribute references without operator (thus making "sometimes works" become "always works")
  3. Explicitly prevent any reference to an attribute without operator in a where.has clause (thus making "sometimes works" become "never works") - I suspect this is trickier to get right, hence I went with 2. in the draft PR
  4. Consider raising an issue or creating a PR to support Author.where(ugly.and(name.eq('Rick'))) in arel so that baby_squeel can stay as it is and this issue and #130 could be resolved (again "sometimes works" becomes "always works")

Any thoughts/preference between those? FYI I've already changed our code to (attribute == true) so I'm fine with option 1, but I think it would be nice to avoid the slightly subtle issues described above

rocket-turtle commented 8 months ago

If 3 is even more complex then 2 I would prefere 1 maybe with a documentation in the Readme and a Test so we could remove the hint if arel fixes something on there side.

4 would be the best. The Documentation of the bug is realy good and could probably be transfered to AR only. But I don't know if they think its a bug or a misuse of AR.