jeremyevans / sequel

Sequel: The Database Toolkit for Ruby
http://sequel.jeremyevans.net
Other
4.99k stars 1.07k forks source link

Sequel Model crashes with jsonb columns whilst attempting to write bare strings #2028

Closed fdr closed 1 year ago

fdr commented 1 year ago

Complete Description of Issue

Sequel Model and validations has a number of interesting bugs and also a crash when dealing with jsonb columns that contain a bare string.

I haven't been able to minimally reproduce my particular symptom, but I started documenting this bug, which has a number of other effects, including a more interesting crash, when I saw something pretty unusual when trying to do a model.update on unrelated fields on a record that had a JSONB with a bare string in it:

Sequel::ValidationFailed: colname is not a valid sequel::postgres::jsonbobject

I thought the lowercase sequel::postgres::jsonbobject was interesting.

Simplest Possible Self-Contained Example Showing the Bug

I take a tour through several abnormalities here. This can be run via the sequel command.

DB.extension :pg_json
DB["DROP TABLE IF EXISTS auto_validation_bugs"].all
DB["CREATE TABLE auto_validation_bugs (a serial primary key, data jsonb, extra int)"].all

class AutoValidationBug < Sequel::Model
end

# No problem with a hash.
AutoValidationBug.create(data: {"okay" => "yes"})

# No problem with a bare dataset...provided one quotes. I'll be demonstrating
# something interesting with this record later.
risky_key = AutoValidationBug.dataset.insert(data: '"bare string"')

# Things work with Model if wrapping is done explicitly.
AutoValidationBug.create(data: Sequel.pg_jsonb_wrap("now it's good"))

begin
  AutoValidationBug.create(data: "not okay")
rescue Sequel::DatabaseError => ex
  raise unless ex.cause.is_a?(PG::InvalidTextRepresentation)
  puts "oh no"
end

begin
  AutoValidationBug.create(data: '"still not okay"')
rescue Sequel::DatabaseError => ex
  raise unless ex.cause.is_a?(PG::InvalidTextRepresentation)
  puts "oh no"
end

Sequel::Model.plugin :auto_validations

# You can load the bare string records fine, if they were created via
# SQL, or Datasets, or a pg_jsonb_wrap.
avb = AutoValidationBug[risky_key]

begin
  # But update goes badly
  avb.update(extra: 1000)
rescue
  puts "oh no"
end

# Create goes similarly badly.

# Crashes like
# /ruby/gems/3.2.0/gems/sequel-5.67.0/lib/sequel/plugins/auto_validations.rb:255:in `validate': undefined method `empty?' for nil:NilClass (NoMethodError)
#
#           unless skip.include?(:no_null_byte) || (no_null_byte_columns = model.auto_validate_no_null_byte_columns).empty?
#                                                                                                                   ^^^^^^^
#   from /home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/lib/sequel/model/base.rb:1920:in `block in _valid?'
#   from /home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/lib/sequel/model/base.rb:1047:in `around_validation'
#   from /home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/lib/sequel/model/base.rb:1917:in `_valid?'
#   from /home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/lib/sequel/model/base.rb:1877:in `block in _save_valid?'
#   from /home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/lib/sequel/model/base.rb:1937:in `checked_save_failure'
#   from /home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/lib/sequel/model/base.rb:1877:in `_save_valid?'
#   from /home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/lib/sequel/model/base.rb:1493:in `save'
#   from /home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/lib/sequel/model/base.rb:254:in `create'
#   from auto_validation_bug.rb:31:in `<top (required)>'
AutoValidationBug.create(data: "even weirder with auto validation")

Full Backtrace of Exception (if any)

There are several abnormalities. But a crash can be seen at:

/home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/lib/sequel/plugins/auto_validations.rb:255:in `validate': undefined method `empty?' for nil:NilClass (NoMethodError)

          unless skip.include?(:no_null_byte) || (no_null_byte_columns = model.auto_validate_no_null_byte_columns).empty?
                                                                                                                  ^^^^^^^
    from /home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/lib/sequel/model/base.rb:1920:in `block in _valid?'
    from /home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/lib/sequel/model/base.rb:1047:in `around_validation'
    from /home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/lib/sequel/model/base.rb:1917:in `_valid?'
    from /home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/lib/sequel/model/base.rb:1877:in `block in _save_valid?'
    from /home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/lib/sequel/model/base.rb:1937:in `checked_save_failure'
    from /home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/lib/sequel/model/base.rb:1877:in `_save_valid?'
    from /home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/lib/sequel/model/base.rb:1493:in `save'
    from /home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/lib/sequel/model/base.rb:254:in `create'
    from auto_validation_bug.rb:63:in `<top (required)>'
    from /home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/bin/sequel:266:in `load'
    from /home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/bin/sequel:266:in `block in <top (required)>'
    from /home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/bin/sequel:266:in `each'
    from /home/fdr/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sequel-5.67.0/bin/sequel:266:in `<top (required)>'
    from /home/fdr/.asdf/installs/ruby/3.2.2/bin/sequel:25:in `load'
    from /home/fdr/.asdf/installs/ruby/3.2.2/bin/sequel:25:in `<main>'

SQL Log (if any)

No response

Ruby Version

3.2.2

Sequel Version

5.67

jeremyevans commented 1 year ago

Thanks for the report. Usage of non array/object values in json columns probably still has some issues. The error inside auto_validations is definitely unexpected and should be fixed. I'll look into the other issues and see if support can be improved for them.

jeremyevans commented 1 year ago

The exception inside auto_validations is not a bug. Adding plugins to a Sequel::Model class after creating a subclass of the model is not supported, and many of Sequel's plugins will not work correctly if you do that. You must add the plugin to the Sequel::Model class before creating any subclasses of it. I checked and if you move the Sequel::Model.plugin :auto_validations line to the top, or switch it to AutoValidationBug.plugin :auto_validations, that the error goes away. You still get the is not a valid sequel::postgres::jsonbobject exception, but that's a different issue.

I'll keep looking into the other issues.

jeremyevans commented 1 year ago

I've reviewed and there is no bug. Near the top of the pg_json documentation, it states:

If you would like to wrap JSON primitives (numbers, strings, null, true, and false),
you need to use the wrap_json_primitives setter:

  DB.extension :pg_json
  DB.wrap_json_primitives = true

If you set this correctly, other things work as expected.

AutoValidationBug.create(data: "not okay") still fails, because typecasting expects that an input string contains valid JSON, and not okay is invalid JSON. AutoValidationBug.create(data: '"not okay"') would work correctly. There currently isn't an option to typecast an input string as a JSON string, because it would make behavior ambiguous (AutoValidationBug.create(data: '{}') means empty JSON object or a JSON string containing{}?).

In terms of the lowercase in the validation error message, that's not difficult to fix, though it could potentially break backwards compatibility if users are checking for specific messages. Here's a potential patch:

diff --git a/lib/sequel/plugins/validation_helpers.rb b/lib/sequel/plugins/validation_helpers.rb
index dea64a109..7373b2968 100644
--- a/lib/sequel/plugins/validation_helpers.rb
+++ b/lib/sequel/plugins/validation_helpers.rb
@@ -90,7 +90,19 @@ module Sequel
         :no_null_byte=>{:message=>lambda{"contains a null byte"}},
         :numeric=>{:message=>lambda{"is not a number"}},
         :operator=>{:message=>lambda{|operator, rhs| "is not #{operator} #{rhs}"}},
-        :type=>{:message=>lambda{|klass| klass.is_a?(Array) ? "is not a valid #{klass.join(" or ").downcase}" : "is not a valid #{klass.to_s.downcase}"}},
+        :type=>{:message=>lambda do |klass|
+          if klass.is_a?(Array)
+            klass = klass.map do |k|
+              k = k.to_s
+              k = k.downcase unless k.include?('::')
+              k
+            end.join(" or ")
+          else
+            klass = klass.to_s
+            klass = klass.downcase unless klass.include?('::')
+          end
+          "is not a valid #{klass}"
+        end},
         :presence=>{:message=>lambda{"is not present"}},
         :unique=>{:message=>lambda{'is already taken'}}
       }.freeze

What are your thoughts on that?

fdr commented 1 year ago

Applying the option is reasonable. I think it's rather strange that Sequel will read values out and unwrap it into a primitive ruby data type but not be able to put them back in, even when I'm modifying other columns. Without thinking too hard about it, the ambiguous type -- a bare string -- would be nice to be auto-promoted on read to a wrapped value. I can't think of a reason it'd be so great for even save_changes on an unrelated field to crash out so strangely from something that got put in the database somehow (e.g. other applications, or dataset methods where you understandably need to quote the bare string for JSONB)

The lowercased program symbol made me think "unintended composition" a.k.a. bug. I think the fix you propose is good.

jeremyevans commented 1 year ago

Applying the option is reasonable. I think it's rather strange that Sequel will read values out and unwrap it into a primitive ruby data type but not be able to put them back in, even when I'm modifying other columns.

One problem with wrapping basic types is it breaks expected handling for null and false JSON values (as described in the pg_json documentation). Additionally, few users actually benefit from JSON basic types. Most PostgreSQL json/jsonb users would probably be best served with a CHECK constraint that uses json_typeof to enforce a specific JSON type. JSON flexibility is great, but I think few users want to mix JSON objects/arrays in the the same column, let alone primitive JSON types.

Without thinking too hard about it, the ambiguous type -- a bare string -- would be nice to be auto-promoted on read to a wrapped value. I can't think of a reason it'd be so great for even save_changes on an unrelated field to crash out so strangely from something that got put in the database somehow (e.g. other applications, or dataset methods where you understandably need to quote the bare string for JSONB)

I don't think save_changes crashes if you aren't attempting to validate the jsonb column value:

DB.extension :pg_json
class AutoValidationBug < Sequel::Model ; end
id = AutoValidationBug.insert(data: '"not okay"')
AutoValidationBug[id].update(:extra=>100) # works as expected

AutoValidationBug.create(data: '"not okay"') doesn't work because it parses the input string as JSON, but it doesn't wrap it (because wrap_json_primitives is not enabled).

The lowercased program symbol made me think "unintended composition" a.k.a. bug. I think the fix you propose is good.

I though about this a little more and think it might be a better idea to use something more generic for more specialized types. The validation error messages are designed to be viewed by the average users, and a validation error that mentions Sequel::Postgres::JSONBObject does them little good. It's fine for many primitive types (e.g. is not a valid date), since I think those cases are understandable by a user, but I think more specialized types would be better served by a more generic error message such as is not the expected type (or something similar). I'll probably make that change before the next release.

fdr commented 1 year ago

I think that sounds reasonable. I may avoid using primitive types (well, in reality, just String) in the JSONB column, since I don't see good reason to lose other validations via the :auto_validation plugin. "false" does not seem that ambiguous to me, and neither is String (if the bare string is marked by a type or module somehow), but null is the hard one.