jeremyevans / sequel

Sequel: The Database Toolkit for Ruby
http://sequel.jeremyevans.net
Other
4.99k stars 1.07k forks source link

Use deep copy when returning default value #2069

Closed byucesoy closed 1 year ago

byucesoy commented 1 year ago

Previously, default values were returned as a shallow copy, which works fine for most of the cases. However if the returned object is a complex object such as Hash or Array, which can be representation of PostgreSQL's JSONB column for example, shallow copy is not enough. If caller updates the returned object, default value also gets overwritten, thus all subsequent calls use the updated value.

Marshal.dump & Marshal.load is used to generate a deep copy of the object, which might be an overkill, but this seems like a most reliable way of creating a deep copy.

byucesoy commented 1 year ago

Not sure about the use of Marshal.dump & load. Any other way of doing deep copy should work fine.

jeremyevans commented 1 year ago

I am OK with the general idea of making sure the default values are unique and not shared (unless frozen). The issue with this implementation is it causes a slowdown which in the vast majority of cases is not needed.

I think a better approach in this case would be to have the default value be a proc that returns a unique object. That would probably require a change to the pg_json extension instead of the defaults_setter extension.

byucesoy commented 1 year ago

That would fix my particular problem, but I'm afraid this would resurface in the future for another complex type. I didn't try yet, but I suspect pg_range would also have similar problem. There could be others as well. So, I think this needs to be fixed in defaults_setter.

Are you concerned about unnecessary overhead of deep copying simple types? For those, I expect dump/load to be quite fast, still we can have an if else statement to only run deep copy logic for complex types (checking for Array and Hash seems like enough). Or are you concerned about overhead of deep copying complex types? I think correct behavior is deep copying those even though it is slow. Running Model.new and then updating a column is quite common, which breaks next Model.new for complex types.

jeremyevans commented 1 year ago

pg_hstore and pg_row would need similar behavior. pg_range shouldn't, as ranges should be immutable (at least, there isn't a supported API for mutation).

My concern is that defaults_setter already has support for separate values per instance, and that is having the default object respond to call (typically, a proc), and the object is called to get the default value. I don't want to introduce a separate and less efficient way to handle separate values per instance. It's possible the proc used can use Marshal.load (only need to dump on model setup/dataset change, not every call).

byucesoy commented 1 year ago

Just to be clear, my goal is not implementing separate default values per instance. I want to have single default value. The problem is if one instance is updated by caller (which is quite common), it also changes the default value. Thus next calls of Model.new uses updated value not the original default value.

My expectation would be Model.new to always come up with the value given as default. In our case we have a model called Strand. We set the default value like this;

Strand.plugin :defaults_setter, cache: true
Strand.default_values[:stack] = [{}]

Then at some point in the code we do something like this;

s1 = Strand.new
s1.stack << "This shouldn't exist in other Strands"

Then any new calls to Strand.new.stack returns [{}, "This shouldn't exist in other Strands"] instead of the default value [{}].

jeremyevans commented 1 year ago

Just to be clear, my goal is not implementing separate default values per instance. I want to have single default value. The problem is if one instance is updated by caller (which is quite common), it also changes the default value. Thus next calls of Model.new uses updated value not the original default value.

This is what is meant by separate default values per instance (each model instance has its own instance of the default value).

My expectation would be Model.new to always come up with the value given as default. In our case we have a model called Strand. We set the default value like this;

Strand.plugin :defaults_setter, cache: true
Strand.default_values[:stack] = [{}]

Then at some point in the code we do something like this;

s1 = Strand.new
s1.stack << "This shouldn't exist in other Strands"

Then any new calls to Strand.new.stack returns [{}, "This shouldn't exist in other Strands"] instead of the default value [{}].

The way to implement this behavior in Sequel is:

Strand.default_values[:stack] = proc{[{}]}`

We could potentially handle this in the defaults_setter instead of in individual extensions, but the change should be in convert_default_value, to use a proc for Hash/Array (and DelegateClasses of each).

jeremyevans commented 1 year ago

Can you try the following and see if it works for you?

diff --git a/lib/sequel/plugins/defaults_setter.rb b/lib/sequel/plugins/defaults_setter.rb
index b20b47a5f..99bcff8d7 100644
--- a/lib/sequel/plugins/defaults_setter.rb
+++ b/lib/sequel/plugins/defaults_setter.rb
@@ -1,5 +1,7 @@
 # frozen-string-literal: true

+require 'delegate'
+
 module Sequel
   module Plugins
     # The defaults_setter plugin makes the column getter methods return the default
@@ -106,6 +108,18 @@ module Sequel
             lambda{Date.today}
           when Sequel::CURRENT_TIMESTAMP
             lambda{dataset.current_datetime}
+          when Hash, Array
+            v = Marshal.dump(v).freeze
+            lambda{Marshal.load(v)}
+          when Delegator
+            klass = v.class
+            case o = v.__getobj__
+            when Hash, Array
+              v = Marshal.dump(o).freeze
+              lambda{klass.new(Marshal.load(v))}
+            else
+              v
+            end
           else
             v
           end
byucesoy commented 1 year ago

Using proc{[{}]} fixes the problem I'm hitting, but the applying the diff doesn't, which is weird because it passes the test case I added to this PR.

I'll try to find out the reason and report back.

jeremyevans commented 1 year ago

Using proc{[{}]} fixes the problem I'm hitting, but the applying the diff doesn't, which is weird because it passes the test case I added to this PR.

The applying diff only affects cases where the plugin is parsing the default values from the database. If a user is overriding a default value, it is on them to use a callable object that returns a unique value.

byucesoy commented 1 year ago

Got it, thanks for the fix :)