jrochkind / attr_json

Serialized json-hash-backed ActiveRecord attributes, super smooth
MIT License
540 stars 35 forks source link

Using the merge operator to update the jsonb column #65

Closed volkanunsal closed 4 years ago

volkanunsal commented 5 years ago

Hey @jrochkind,

I saw your comment about adding support for the merge operator. I've been thinking about this problem myself. I started using your library recently and I've quickly realized the problem you stated in the readme. Using optimistic locks everywhere I use attribute json columns isn't practical. We need a different solution.

I've scoped out how Rails compiles the update SQL, and I think my solution is 90% there. Here is my rough plan for getting there:

or

module Arel
  module Visitors
    module PgJson
      private

      def visit_Arel_Nodes_JsonbMerge o, a
        json_infix o, a, '||'
      end

      def json_infix(o, a, opr)
        visit(Nodes::InfixOperation.new(opr, o.left, o.right), a)
      end
    end

    PostgreSQL.send :include, PgJson
  end
end
module ActiveRecord::Persistence::ClassMethods
  def _update_record(values, constraints) # :nodoc:
    constraints = _substitute_values(constraints).map { |attr, bind| attr.eq(bind) }

    um = arel_table.where(
      constraints.reduce(&:and)
    ).compile_update(_substitute_values(_replace_jsonb_column(values)), primary_key)

    connection.update(um, "#{self} Update")
  end

  def _replace_jsonb_column(values)
    return values unless respond_to?(:attr_json_config)
    # Look up the name of the attr_json column
    container_name = attr_json_config.default_container_attribute
    # Determine if the key is in values hash
    return values unless values.key?(container_name)
    # Get the changes to this column
    change_arr = a.changes[container_name]
    changes = change_arr.last.reject {|k,v| change_arr.first[k] == v }
    # Replace the changes with the actual value in the `values` hash
    values[container_name] = changes
    values
  end
end

What do you think of this plan? It's not very clean, but it might work. Let me know if you'd be interested in a pull request or if it should be in its own repo.

jrochkind commented 5 years ago

Thanks for getting in touch!

Wow, yeah I wish AR made this easier to hook into. I haven't totally wrapped my head around your code, but it's complexity seems to justify my initial analysis too, this isn't going to be easy. You've got to monkey-patch 4 different glasses? Horrid.

It seems likely to break in future Rails versions, with so many patches. Although avoiding patches isn't necessarily an improvement of future-compatibility (even cooperating objects can have their assumptions violated) I wonder if there's a way to reduce the patching -- we can have control over the "column type" of the container column. Would there be a way to add a custom type to use instead of using the "Jsonb column type", and would that make any lighter touch on modifying Rails code possible?

If you can get it to work and feel it's reliable, I'd definitely be interested in a PR. It should somehow be implemented in a way that is optional, can be toggled on-off, and if you haven't chosen to use it, it should not patch or otherwise touch AR classes. In case such an intervention ends up messing up AR, it should be entirely opt-in. (And should have documentation warning this is one of the more inteventionist parts of attr_json, and is risky for forwards-compat).

It would also of course need to be very well-tested.

I also wonder if your investigations here give you any ideas for relatively simple few-lines-of-code enhancements to Rails that could let this work with more natural and less likely to break in the future Rails API instead of patches? If you can, maybe we should try PR'ing it to Rails and see what we can do?

But I'm def interested in a PR here. But I might want to spend some time with it totally understanding it and seeing if I can think of improvements (particularly to make fewer assumptions about AR internals/fewer monkey-patches), which might take me a while as I have to fit it in to my regular duties, so it might not get merged in a hurry, I prefer to take it slow and careful -- but it does seem better to have it here than in another separate gem that has to be coordinated!

jrochkind commented 5 years ago

If you could provide a PR with tests (even of the draft not yet ready code), it would make it easier for me (or someone else) to dive in and understand it -- cause I could use byebug/pry while running the tests (or a test) to interactively understand what the new code is doing.

Without totally understanding the code, some initial thoughts:

  1. It seems like the _update_record override is doing the real work, replacing the SQL that gets generated. Why are the other patches necessary? If _update_record does the right thing, is nothing else needed, and the other stuff is just there to support the implementation of _update_record? If so, could there a way to write the _update_record implementation that doesn't rely on the other patches, that encapsulates all it's functionality?

  2. Can we imagine what feature would need to be in Rails to make this clean, and write the patch like that, so we can PR the same thing to Rails? I'm thinking _update_record needs to be able to see if the underlying column type supports some kind of 'advanced update SQL generation', and then ask it for the SQL, instead of using the default. A custom jsonb-merging type would then implement that. I guess the API for the "generate some custom update SQL" would need to have access to the "prior state from Dirty tracking" in order to support jsonb merge.

Just some initial thoughts, again without totally understanding your code yet, so they may be off!

volkanunsal commented 5 years ago

It seems like the _update_record override is doing the real work, replacing the SQL that gets generated. Why are the other patches necessary? If _update_record does the right thing, is nothing else needed, and the other stuff is just there to support the implementation of _update_record? If so, could there a way to write the _update_record implementation that doesn't rely on the other patches, that encapsulates all it's functionality?

Yes, there is. We can take the container column from the attributes hash and write raw SQL to perform the update. Here is the simplest implementation I could come up with:

module ActiveRecord::Persistence::ClassMethods
  def _update_record(values, constraints)
    values_without_container_attribute = _remove_jsonb_column(values)
    container_changes = _pluck_jsonb_column_changes(values)

    if container_changes.present?
      container_name = container_changes.keys.first
      container_value = container_changes.values.first
      sql_to_update_container = Arel.sql(%Q{
        #{container_name} = #{container_name} || '#{container_value.to_json}'::jsonb
      })
      _update_record2({ container_name => [sql_to_update_container]}, constraints)
    end
    _update_record2(values_without_container_attribute, constraints)
  end

  # Original _update_record
  def _update_record2(values, constraints)
    constraints = _substitute_values(constraints).map { |attr, bind| attr.eq(bind) }
    um = arel_table.where(
      constraints.reduce(&:and)
    ).compile_update(_substitute_values(values), primary_key)
    connection.update(um, "#{self} Update")
  end

  def _remove_jsonb_column(values)
    return values unless respond_to?(:attr_json_config)
    # Look up the name of the attr_json column
    container_name = attr_json_config.default_container_attribute
    # Determine if the key is in values hash
    return values unless values.key?(container_name)
    # Remove container_name key
    values.delete(container_name)
    # Return the rest
    values
  end

  def _pluck_jsonb_column_changes(values)
    return values unless respond_to?(:attr_json_config)
    # Look up the name of the attr_json column
    container_name = attr_json_config.default_container_attribute
    # Determine if the key is in values hash
    return values unless values.key?(container_name)
    # Get the changes to this column
    change_arr = a.changes[container_name]
    changes = change_arr.last.reject {|k,v| change_arr.first[k] == v }
    # Return a hash of changed values for container column.
    { container_name => changes }
  end
end

Can we imagine what feature would need to be in Rails to make this clean, and write the patch like that, so we can PR the same thing to Rails? I'm thinking _update_record needs to be able to see if the underlying column type supports some kind of 'advanced update SQL generation', and then ask it for the SQL, instead of using the default. A custom jsonb-merging type would then implement that. I guess the API for the "generate some custom update SQL" would need to have access to the "prior state from Dirty tracking" in order to support jsonb merge.

This could be tricky, but I can definitely see something like that. Rails uses Arel UpdateManager's .set method to when it performs the update operation. And that method is only meant to generate an assignment. The left node of the assignment is the jsonb column, and the right node is the new value of the attribute. If we imagine Rails called the column definition to ask the Arel node for the right side of the assignment operation, it would be the cleanest solution. Like you mentioned, the column definition needs to have access to the Dirty attributes of the record, too.

volkanunsal commented 5 years ago

There were some problems in the code above, but this one now works for me:

module ActiveRecord::Persistence::ClassMethods
  def _update_record(values, constraints)
    perform_merge_mod_if_enabled
    values = remove_container_column(values) if attr_json_merge_mod_enabled?
    _update_record2(values, constraints)
  end

  def _update_record2(values, constraints, with_substition = true)
    constraints = _substitute_values(constraints).map { |attr, bind| attr.eq(bind) }
    v = with_substition ? _substitute_values(values)  : values
    um = arel_table.where(constraints.reduce(&:and)).compile_update(v, primary_key)
    connection.update(um, "#{self} Update")
  end

  def perform_merge_mod_if_enabled
    return unless attr_json_merge_mod_enabled?
    container_changes = pluck_container_column_changes(values)
    return unless container_changes.present?

    name, value = container_changes
    sql_to_update_container = Arel.sql(%(
      #{name} = #{name} || '#{value.to_json}'::jsonb
    ))
    values = { name => [sql_to_update_container] }
    _update_record2(values, constraints, false)
  end

  def attr_json_merge_mod_enabled?
    return false unless respond_to?(:attr_json_config)
    attr_json_config.enable_merge_mod
  end

  def remove_container_column(values)
    return values unless respond_to?(:attr_json_config)
    # Look up the name of the attr_json column
    container_name = attr_json_config.default_container_attribute
    # Determine if the key is in values hash
    return values unless values.key?(container_name)
    # Remove container_name key
    values.delete(container_name)
    # Return the rest
    values
  end

  def pluck_container_column_changes(values)
    return [] unless respond_to?(:attr_json_config)
    # Look up the name of the attr_json column
    container_name = attr_json_config.default_container_attribute
    # Determine if the key is in values hash
    return [] unless values.key?(container_name)
    # Get the changes to this column
    change_arr = changes[container_name]
    changes = change_arr.last.reject { |k, v| change_arr.first[k] == v }
    # Return container_name and changed values.
    [container_name, changes]
  end
end
jrochkind commented 5 years ago

I think you are on the right track, and like where you are going with this.

I am interested in working with you on it, and think it can be in the AttrJson gem -- although ideally it might then be extracted to another gem that just provides "jsonb merge update" functionality without even depending on AttrJson at all. Someone might hypothetically want this func even without using AttrJson, and a separate gem might help advocate for getting the feature into Rails, or the Rails API changes necessary to support the feature.

But since AttrJson makes it especially desirable feature, and I'm interested in it, I'm happy to work on it here for now and am interested in working on it with you -- although I can't promise I'll have the time to do it super quickly myself. "Slow and steady" has been what has resulted in attr_json having, I think, solid and reliable and unlikely-to-be-broken-by-rails-in-the-future code.

Personally, I'd really advocate you write some tests sooner rather than later. The benefit of having the tests up front is that then we can work on refactoring the code to be simpler, to make fewer assumptions about what AR does, to monkey-patch AR less, and/or to be based on features we could PR to Rails to support this without patching -- and know as we are refactoring that if the tests still pass, our refactor was good.

volkanunsal commented 5 years ago

Copying from PR

It should ideally be somewhat decoupled from AttrJson::Record, ideally the merge feature would let you apply merge-updates to a Jsonb column whether or not you are using AttrJson::Record on the model, yeah?

This would be really neat, but I'm sure how to achieve it yet. There is a fair amount of complexity to build a generic solution that would work for all jsonb columns and to make it opt-in after that. It would require working around a number of Rails conventions and more patching. But it's much easier to solve the problem in the specific context of this gem, and even make it opt-in.

volkanunsal commented 5 years ago

Thanks for the feedback. I'll try to get in some tests for it this week.

jrochkind commented 5 years ago

Yeah, that makes sense. Once we have a working implementation with tests we can look at if we can factor out the AttrJson dependencies!

It's definitely a fair amount of complexity here -- which is why I had given up on it myself. Possibly so much complexity we can't do it in a reliable way with AR, but it's super useful to try, in part to see the simplest way AR could be changed to support it!

It looks like your existing code has very limited AttrJson dependencies though -- just the container attribute? I wonder if this could be detected by a custom MergingJsobType < ActiveRecord::ConnectionAdapters::PostgreSQL::OID::Jsonb type instead?

But the first step is just getting something that works and passes tests, then we can try refactoring!

jrochkind commented 5 years ago

Sorry if I'm saying too much, but this is interesting, and I have Opinions about what will make the code clean and maintainable.

container_name = attr_json_config.default_container_attribute

This isn't quite right, because even if the model has a certain default_container_attribute -- models can actually contain more than one attr_json container, different columns.

Here's what I'm thinking, although it may not work out once you get to the actual code, but an idea.

Instead of using the built in AR jsonb type, create a new ActiveRecord::Type that descends from the pg jsonb type, and register it with ActiveRecord::Types as say merging_update_jsonb. Then in your migrations, you should be able to declare it as that type.

Then whenever you need to see whether a given column needs to have custom behavior, you can just check for that type. So instead of return values unless values.key?(container_name), it would be more like values_needing_merging = values.keys.all? { |v| ActiveRecord::Type.lookup(v).kind_of? OurMergingUpdateJsonb. (still psuedocode).

And same in the places you are doing column.is_jsonb? -- instead check to see if column has a type of this "special type". That makes it opt-in on a column-by-column basis.

Just thoughts, might not work out! This stuff is definitely complicated and hard, you have my respect for trying!

volkanunsal commented 5 years ago

Thank you very much for the feedback. No need to apologize, I truly appreciate your Opinions. :)

Instead of using the built in AR jsonb type, create a new ActiveRecord::Type that descends from the pg jsonb type, and register it with ActiveRecord::Types as say merging_update_jsonb. Then in your migrations, you should be able to declare it as that type.

This is something I thought about as well. I suppose it would make the code simpler to create a new type for a mergeable jsonb column than to hack it with a global. I'll try to come up with a proof of concept for it.

jrochkind commented 5 years ago

Yeah, it's also a place that can potentially have some of the custom logic, if it works out -- the custom "update mergeable" Type class. I think it could get us closer to something that would let us have a very small PR to Rails to provide API to make this possible without patching Rails -- but one step at a time!

jrochkind commented 5 years ago

@volkanunsal I remembered the other thing that gave me pause for jsonb-update-merge with attr_json specifically.

I think the path you are going on will work out well -- in a way that's attr_json-independent too -- as long as the Jsonb hash is flat. Each key has as a value a "primitive": string or integer.

But attr_json also supports repeatable attributes, whose values are arrays. As well as nested/structured/compound attributes, whose values are Hashes. That can be in arrays, and can have values that are arrays/hashes/arrays-of-hashes too.

This becomes a "deep merge" problem. The semantics of what you want to do are confusing/complicated. Only replacing the entire top-level value (not a deep merge) is unlikely to be correct enough to protect you from needing optimistic locking still. An actual "deep merge" that does what you want, in the case of arrays, may require nested hashes to have "id" values, so you know which change goes with which existing value. The implementation gets very difficult. For semantics, it may be useful to see how things like mongodb handle it.

In the "deep merge" case we are essentially trying to re-invent what a normalized rdbms does naturally. It may be a case where it's just gonna be too hard to get that out of jsonb, if you really want to avoid optimistic locking, you should just give up on jsonb and use a normalized rdbms schema.

On the other hand, maybe your solution for "flat" (not deep) merge is good enough to be valuable for many people? Especially if you aren't using "deep" structures at all, and your jsonb hash just does have primitives as values, it's "flat". Curious if this is your case @volkanunsal ?

volkanunsal commented 5 years ago

Thanks for bringing this up. My use case doesn't require using deeply nested data structures right now, so a flat merge would be good enough for me. But for the sake of anyone who wanted to use deep merge instead, one option would be to use a Postgres function that implements deep merge.

jrochkind commented 5 years ago

Oh that's a nice find, although I don't think that function by itself will do "the right thing" (as if it were a normalized rdbms schema) out of the box. But a good reference.

Yeah, if your use case is just "flat" hashes with primitives then there are probably others with similar use cases, and it will probably be a good solution for some, to offer as an option. Cool!

volkanunsal commented 5 years ago

Ok, if I create a type like this, it could work. Let me know what you think!

class JsonbWithMergeType < ActiveRecord::Type::Json
  def type
    :jsonb_with_merge
  end

  def serialize(value)
    return if value.nil?
    Arel.sql("#{name} || '#{value.to_json}'::jsonb")
  end

  def use_dirty?
    true
  end
end

ActiveRecord::Type.register(:jsonb_with_merge, JsonbWithMergeType)

class DirtyRegistry
  def self.include?(klass, name)
    klass.attribute_types[name].try(:use_dirty?)
  end

  def self.replace(klass, values, changes)
    values.map do |name, value|
      value = changes[name].try(:last) if include?(klass, name)
      [name, value]
    end.to_h
  end
end

# activerecord/lib/active_record/persistence.rb
module ActiveRecord::Persistence::ClassMethods
  def _update_record(values, constraints) # :nodoc:
    constraints = _substitute_values(constraints).map { |attr, bind| attr.eq(bind) }

    # Patch
    values = DirtyRegistry.replace(self.class, values, changes)
    # /Patch

    um = arel_table.where(
      constraints.reduce(&:and)
    ).compile_update(_substitute_values(values), primary_key)

    connection.update(um, "#{self} Update")
  end
end
jrochkind commented 5 years ago

I'd have to spend more time to totally wrap my head around this -- but is that entire thing a proof of concept, that's all the code you need? That's a lot less code than the first try, with only one patch to AR??? If so, that's very encouraging! Nice!

But I'm not seeing in the code where the actual calculation of the merge is happening; not sure if I'm missing it, or this isn't a total proof of concept.

I like how this no longer relies on AttrJson at all, so we could maybe release it as a totally separate gem -- and also try to use it to advocate for Rails supporting out of the box, or supplying API so it can be supported without monkey patch. I would like to think on what change to Rails might be needed so Rails can support this without a patch, but would have to spend more time with the code.

Some nit picks, not relevant to the proof of concept, but just toward getting it to final -- names matter:

I think the next step is still getting some automated tests, which will make it easier for me (or anyone else) to interactively investigate the code and how it works, and also support us in refactoring while keeping functionality working. As well as, of course, being necessary for a reliable release. I'd get this set up with tests before trying any of the refactoring I suggest above (and I could be wrong about that refactoring too, just first-look thoughts).

jrochkind commented 5 years ago

Overall, I think we should get something working -- based on a custom Type is great -- with test, and then we can try refactoring to simplify and reduce the number of lines of code.

volkanunsal commented 5 years ago

Thanks for the feedback. :)

But I'm not seeing in the code where the actual calculation of the merge is happening; not sure if I'm missing it, or this isn't a total proof of concept.

The merge is happening in the serialize method. The jsonb attribute uses ActiveSupport::JSON.encode(value). Instead, I'm using Postgres merge operator ||. This concatenates the original column name with the changed values of the jsonb column, coming from the dirty attributes.

I think I forgot to add the code that removes the keys that haven't changed:

def self.replace(klass, values, changes)
  values.map do |name, value|
    value = changes[name].last.reject { |k, v| changes[name].first[k] == v }  if include?(klass, name)
    [name, value]
  end.to_h
end

This should generate a SQL like this (with the bindings):

=> "UPDATE \"projects\" SET \"misc\" = $1 WHERE SELECT \"projects\".* FROM \"projects\"", [["misc", "misc || '{"a": 1}'::jsonb"], ["updated_at", "2019-06-08 19:24:36.771424"], ["id", 6]]

Good pointers on the naming. I haven't thought about the names much, but you're right that DirtyRegistry doesn't seem right.

I'll create a repo for this as soon as I can.

jrochkind commented 5 years ago

You are welcome to do it as a pr here, we can always extract it later. Let's you get up and running with a test (in multiple rails versions even) without having to set up the account infrastructure yourself. Tests will help a lot and will verify the code!

volkanunsal commented 5 years ago

Ok, but I'll need your help here. What's the best way to integrate? I see you have a type called ContainerAttribute. Ideally, there would be a global option that would set the the superclass of that attribute to AttrJson::Type::JsonbMerger. So you would do something like this to opt-in to use jsonb_with_merge type for your container column.

class Project < ActiveRecord::Base
  include AttrJson::Record

  attr_json_config(enable_jsonb_merge: true)

  attr_json :str, :string
  attr_json :int, :integer
end

Does this sound right to you?

As for the code itself, I cleaned it up based on your feedback. Here it is without the warts:

module AttrJson
  module Type
    SuperKlass = (if Gem.loaded_specs["activerecord"].version.release >= Gem::Version.new('5.2')
      ActiveRecord::Type::Json
    else
      ActiveRecord::Type::Internal::AbstractJson
    end)

    class JsonbMerger < SuperKlass
      module Patch
        def _update_record(values, constraints)
          values = JsonbMerger.replace(self.class, values, changes)
          super(values, constraints)
        end
      end

      def type
        :jsonb_with_merge
      end

      def serialize(value)
        return if value.nil?
        "#{name} || '#{value.to_json}'::jsonb"
      end

      def should_jsonb_merge?
        true
      end

      def self.include?(klass, name)
        klass.attribute_types[name].try(:should_jsonb_merge?)
      end

      def self.replace(klass, values, changes)
        values.map do |name, value|
          if include?(klass, name)
            c = changes[name]
            h = c.last
            unless h.nil? || h.kind_of?(Array)
              value = h.reject { |k, v| c.first[k] == v }
            end
          end
          [name, value]
        end.to_h
      end
    end
  end
end

ActiveSupport.on_load(:active_record) {
  patch = AttrJson::Type::JsonbMerger::Patch
  ActiveRecord::Persistence::ClassMethods.send(:prepend, patch)
}

I'll start a PR.

jrochkind commented 5 years ago

Oh I see your point. I'm nto sure. Just to have something to get started with and have tests that pass, simplest thing that might work is I just guess, uh... I dunno, just make ContainingAttribute have a superclass of your thing hard-coded for now? Not sure, whatever you can do to get a demo proof of concept.

jrochkind commented 4 years ago

Not going to deal with this for now, see #68.