holoviz / param

Param: Make your Python code clearer and more reliable by declaring Parameters
https://param.holoviz.org
BSD 3-Clause "New" or "Revised" License
412 stars 69 forks source link

Shallow-copy mutable containers in slot values #826

Closed jbednar closed 9 months ago

jbednar commented 10 months ago

Fixes https://github.com/holoviz/param/issues/793. Fixes https://github.com/holoviz/param/issues/746.

Adds shallow copying of mutable slot values when instantiating Parameter objects or inheriting values for slots. Shallow copying is only done when per_instance is True (the default); if you don't want such copying you can share both Parameter instances and mutable slot values by setting per_instance=False.

import param

class A(param.Parameterized):
    p = param.Selector(objects=[1, 2], check_on_set=False)

class B(A):
    p = param.Selector(default=2)

b = B()
b.p = 3

print(A.param.p.objects, B.param.p.objects, b.param.p.objects)
# [1, 2] [1, 2, 3] [1, 2, 3]

(previously A's objects list was also [1, 2, 3]; now fixed to be [1, 2])

There are various issues still, as listed below.

Tests

Needs a test capturing the example above, but I've got to run, so if anyone wants to add that I'd be grateful, or I can do that tomorrow.

Previous handling of watchers

PR https://github.com/holoviz/param/pull/306 added this logic to Parameters.getitem() for shallow-copying watchers:

try:
    # Do not copy watchers on class parameter
    watchers = p.watchers
    p.watchers = {}
    p = copy.copy(p)
except:
    raise
finally:
    p.watchers = {k: list(v) for k, v in watchers.items()}

This PR generalizes such copying to apply to all the slot values besides default, but I was unable to determine why the existing code was so complex. First, why p.watchers = {k: list(v) for k, v in watchers.items()} instead of just p.watchers = copy.copy(watchers.items())? copy.copy is used just a couple of lines before, and when I use it here it seems to work just the same, as I'd expect, so I can't tell why it was a list comprehension that appears to achieve the same as copy.copy.

Second, what's with the try/except/finally? Deleting all that seems to work just fine, and I can't quite imagine a situation when the try block could fail but the watchers are still valid to reconstruct finally. Maybe something in Panel? For now I deleted it as it does not seem to achieve anything.

Third, I've kept the convoluted mechanism for zeroing-out the watchers in the original Parameter object when creating the new one. Simply shallow-copying the watchers after copying the Parameter object leaves an extra set of watchers that causes tests to fail, but I'm not at all sure why it's appropriate to move the watchers to the new Parameter object rather than where they were originally watching. For now I chose to be conservative, but I'm not confident why it was done this way, so if there's a reason, it should have a clear comment added.

What's mutable and when is it copied?

For now, mutable sequences, mappings, and sets are considered mutable, which covers the list and dict cases that I'm aware of us using in slots. Such values are shallow-copied for all slots except default, to ensure that any independent Parameter copies also have independent objects lists, etc. Parameterized objects are also mutable, but I don't know if they are ever used in slots. If they are, we should consider whether they should be shallow-copied, but my guess is that they should not, since it's largely containers that we want shallow-copied, not arbitrary instantiated objects. Maybe is_mutable should be renamed is_mutable_container to convey that?

The reason for not copying the default value is to allow a specific list or dict to be shared across Parameters, which has always been supported. E.g. one could have a few global lists like search_paths, debug_search_paths, etc., whose values are curated and maintained independently of which particular Parameters have been set to those values. We can revisit whether mutable containers should be copied for default too, but if so, it should be a separate PR with its own justification.

I currently check for default only when copying the Parameter object, not during inheritance. It might also be appropriate to skip shallow-copying for for inheritance as well, but I haven't found an example where that would come up. Probably worth doing?

_update_state

After merging this PR, I believe we should be able to remove _update_state as discussed in https://github.com/holoviz/param/issues/807 and https://github.com/holoviz/param/pull/817 . _update_state was needed because we had to wait for param inheritance before adding to the objects lists for Selectors, but now that values get shallow copied, it should be feasible to populate the objects lists (e.g. from the default value) immediately, during the constructor. Doing so should simplify the Selector logic, but I haven't yet tested that this PR's approach will achieve that goal.

My guess is that eliminating _update_state is required, for fixing this remaining bad behavior:

0.

import param

class A(param.Parameterized):
    p = param.Selector(objects=[1, 2], check_on_set=False, per_instance=True)

class B(A):
    i = param.Integer(8)

b = B()
b.p = 3

print(A.param.p.objects, B.param.p.objects, b.param.p.objects)
#[1, 2, 3] [1, 2, 3] [1, 2, 3]

Here the Parameter object isn't copied into B, and the new value 3 ends up in A's object list, which it should not. I think that may be from _update_state being called on the class's copy of the objects list, not the instance's, since it's called in the metaclass, but I haven't tracked that down.

maximlt commented 10 months ago

What do you think about the examples below? In the first snippet the code accesses the Parameter before setting p.s to 3, to run __getitem__ and shallow-copy the objects. In the second case this is commented out, leading to the objects on the class Parameter being updated as the Parameter doesn't yet exist on the instance. This is similar to the last example you shared @jbednar, without any inheritance though.

1)

class P(param.Parameterized):
    s = param.Selector(objects=[1, 2], check_on_set=False)

p = P()
p.param.s.objects  # runs `__getitem__`
p.s = 3
assert P.param.s.objects == [1, 2]

2)

class P(param.Parameterized):
    s = param.Selector(objects=[1, 2], check_on_set=False)

p = P()
# p.param.s.objects
p.s = 3
assert P.param.s.objects == [1, 2, 3]

jbednar commented 10 months ago

@maximlt , that's a good point:

image

Maybe we need to populate the Parameter objects more proactively. Any suggestions for where to make that change?

maximlt commented 10 months ago

Any suggestions for where to make that change?

Maybe at the start of Parameters._setup_params which is called from Parameterized.__init__?

maximlt commented 10 months ago

Actually the behavior I reported in my previous comment is not tied to mutable containers.

3.

image
Code ```python import param class P(param.Parameterized): x = param.Number() p = P() P.param.x.bounds = (100, 200) assert p.param.x.bounds == (100, 200) class P(param.Parameterized): x = param.Number() p = P() p.param.x # side-effects there P.param.x.bounds = (100, 200) assert p.param.x.bounds is None ```

Accessing a Parameter on the .param namespace creates a Parameter on the instance. I'm not sure it's really a bug, certainly it's confusing, but really the way to get a Parameter you can control at the class level should be with per_instance=False.

philippjfr commented 10 months ago

My reading of the watchers code above is that it's not fully correct. The problem being that we can't distinguish between watchers that are meant to be on the class parameter and watchers that are meant to be on the instance parameter since there is no difference until the instance parameter is created, i.e.:

param.Parameterized.param.watch(print, 'name', what='allow_None')

looks exactly the same as:

p = param.Parameterized()
p.param.watch(print, 'name', what='allow_None')

In general it seems very rare to have watchers for attributes on class parameters so this hasn't come up, but it's definitely not correct. I think this is another issue we can't easily fix without making instance parameter creation non-lazy.

Second, what's with the try/except/finally? Deleting all that seems to work just fine, and I can't quite imagine a situation when the try block could fail but the watchers are still valid to reconstruct finally. Maybe something in Panel? For now I deleted it as it does not seem to achieve anything.

I do think it's important, at least if we decide not to fix the underlying issue yet. If there is somehow an error while creating the instance parameter (e.g. because some value is not safe to copy) then the watchers end up being lost entirely.

jbednar commented 10 months ago

If there is somehow an error while creating the instance parameter (e.g. because some value is not safe to copy) then the watchers end up being lost entirely.

How would it help to have watchers copied correctly but not whatever slot value was about to be copied after the failing one? This try/except/finally seems like it would just cover up problems, not avoid them -- why would we want watchers copied correctly when other slots aren't? If we wanted to copy all the slots that we could, and put a warning for any that didn't work, wouldn't we do that in a loop where the try/except was per slot, not for the entire copy operation?

philippjfr commented 10 months ago

It's not about the object being copied to but about restoring the class that's being copied from.

jbednar commented 10 months ago

Hmm; that's weird. My understanding of that code is that if copy.copy fails, an exception will be raised, with p remaining the class Parameter object, except that p.watchers has been replaced with a copy of the original dict. Seems like a lot of hoops to be jumping back and forth through to achieve that outcome, i.e. emptying out the watchers and then replacing it with something just like the original.

And then if copy.copy succeeds, no exception will be raised, and p will become the instance's copy of the Parameter, where the instance p.watchers will have a copy of the original dict. But won't the original class Parameter now have an empty watchers dict? How is this working at all when there are multiple instances that need to have the same Parameter instantiated into it? Only the first one gets the watchers?

Unless I'm very confused, it seems like we have two defensible options here: (1) Leave watchers on class Parameter objects, and don't copy them to instances. Instances would need their own watchers, which would be specific to that instance. Watchers on class Parameter objects would be watching the class Parameter object, not instances. (2) Leave watchers on Class Parameter objects, and copy them to all instances. Thus anything that watches the class Parameter will also be watching all the instances.

I am not seeing any argument for what it seems like the current code is doing, i.e. emptying out the watchers from the class Parameter object after instantiation. When is that ever appropriate?

jbednar commented 10 months ago

@maximlt , I've pushed a few more commits to address the issues above:

6ce1ef3 is the key one for addressing the problems above. After that commit, I believe these behaviors are now correct:

0.

import param

class A(param.Parameterized):
    p = param.Selector(objects=[1, 2], check_on_set=False, per_instance=True)

class B(A):
    i = param.Integer(8)

b = B()
b.p = 3

print(A.param.p.objects, B.param.p.objects, b.param.p.objects)
# [1, 2] [1, 2] [1, 2, 3]

(b now gets its own per_instance copy of the objects list)

1.

class P(param.Parameterized):
    s = param.Selector(objects=[1, 2], check_on_set=False)

p = P()
p.param.s.objects
p.s = 3
print(P.param.s.objects, p.param.s.objects)
#[1, 2] [1, 2, 3]

2.

class P(param.Parameterized):
    s = param.Selector(objects=[1, 2], check_on_set=False)

p = P()
p.s = 3
print(P.param.s.objects, p.param.s.objects)
#[1, 2] [1, 2, 3]

(invoking __getitem__ is no longer required for p to get its own per_instance copy of the s Parameter object)

I'm ok with keeping the behavior you show in 3. above, i.e. that accessing a Parameter on the .param namespace creates a Parameter on the instance. I agree that if you want to control a Parameter at the class level, you can either change the class values before creating any instances, or you can set per_instance=False. If you change the Parameter at the class level after having instantiated some classes, there's no guarantee that the class-level changes will propagate down. As you show, sometimes they do, but really if you want such changes you have to make them on the instance or you have to set per_instance to False!

So far, I haven't messed with _update_state, because changing that wasn't necessary for anything to do with instantiating slots.

So this PR should be ready to review and merge now; it now does what it says: adds shallow-copying for Parameter slot values that happen to be mutable containers, whenever a Parameter object is copied to an instance or inherits its slot values. Parameter objects are still copied only lazily, as before, but now there is one more case where it gets copied, namely when a parameter value is set on the instance (to handle _ensure_value_is_in_objects).

jbednar commented 9 months ago

I assume everyone is happy with the behavior in case 1 (two separate Parameter objects, and two separate objects lists)?

Case 2 is trickier. On the one hand, the current behavior in 2 seems like the only way to define a Parameter object whose settings can easily be inherited across a hierarchy. I.e., not copying the Parameter object between A and B is what lets people mess with the one in A and set things up as they like, while knowing that it will all propagate nicely. Plus, if they do want separate Parameter objects in subclasses, it's easy enough to just put them there, as in 1.

On the other, I do agree that there's some argument for duplicating the Parameter automatically to make case 2 act like case 1, so that they behave independently. On balance I think it's ok that we aren't doing that; seems like we lose more than we gain.

philippjfr commented 9 months ago

I'll be addressing watcher related issues in a separate PR. I also think that on balance the behavior in the example above is defensible. So I'm going to consider this ready and merge.