zopefoundation / persistent

automatic persistence for Python objects
https://pypi.org/project/persistent/
Other
46 stars 28 forks source link

C/Python difference: Setting __class__ activates in C, doesn't activate in Python #155

Closed jamadden closed 3 years ago

jamadden commented 3 years ago

In the C implementation, setting __class__ of a ghost calls _p_activate(), and uses the class methods defined on the original class.

But in the Python implementation, setting __class__ does not call _p_activate(); consequently, the class methods of the new class are used.

This makes a difference when class methods include those used in the pickle protocol, such as __setstate__.

Here's an example. Consider this code:

from persistent import Persistent
from ZODB.DB import DB
from ZODB.DemoStorage import DemoStorage
import transaction

class A(Persistent):

    def __setstate__(self, state):
        print("Setting state in A")
        Persistent.__setstate__(self, state)

class B(Persistent):

    def __setstate__(self, state):
        print("Setting state in B")
        Persistent.__setstate__(self, state)

db = DB(DemoStorage())
transaction.begin()
conn = db.open()
conn.root()['key'] = A()
transaction.commit()

transaction.begin()
conn2 = db.open()
a = conn2.root()['key']
print("Setting the class")
a.__class__ = B
print("Activating the object")
a._p_activate()
transaction.abort()

When we run it with the C implementation, the object is activated by the line a.__class__ = B and uses the A.__setstate__ implementation to do so.

$ python /tmp/t.py
Setting the class
Setting state in A
Activating the object
Done activating

Using the pure-Python version, however, activation doesn't happen until we call _p_activate() explicitly, and then it uses the Other.__setstate__ implementation.

PURE_PYTHON=1 python /tmp/t.py
Setting the class
Activating the object
Setting state in B
Done activating

Ideally, setting the class of a ghost wouldn't activate the object, I think. But since that's the way the C implementation has worked for some time, probably the Python implementation needs to change to activate the object when the class is assigned to.

jamadden commented 3 years ago

Probably closely related: The C implementation sets _p_changed when __class__ is assigned to, but the Python implementation doesn't. Adding print("Object is changed?", a._p_changed) right after the assignment shows this:

$ python /tmp/t.py
Setting the class
Setting state in A
Object is changed? True
...

$ PURE_PYTHON=1 python /tmp/t.py
Setting the class
Object is changed? None
...

This is another reason why I prefer the Python behaviour. But I assume there's reasons the C code does what it does (I haven't checked).

jamadden commented 3 years ago

There don't really seem to be any reasons that the C code does what it does; it could be changed to match Python, but that's a non-starter for backwards compatibility.

I also discovered this same discrepancy applies to __dict__.

It also applies to a few other special attributes: __of__, __setstate__, __del__. I don't think those are actually a problem, though, because as methods, typically found by looking at the type, they are unlikely to be assigned to ghost instance.

mgedmin commented 3 years ago

The C behaviour makes sense to me if you view an object's class as part of its state (and there are some design patterns in Python that (ab)use __class__ this way to implement e.g. state machines).

OTOH, I seem to remember that the way persistent references work is that each object that has a reference to some persistent object stores a tuple of (class name, oid), which makes me question whether it's even a good idea to allow changing __class__ ever?

jamadden commented 3 years ago

The C behaviour makes sense to me if you view an object's class as part of its state (and there are some design patterns in Python that (ab)use class this way to implement e.g. state machines).

True. I did exactly that (state machine based on __class__) once, until I learned that it didn't (consistently) work with ZODB...

OTOH, I seem to remember that the way persistent references work is that each object that has a reference to some persistent object stores a tuple of (class name, oid), which makes me question whether it's even a good idea to allow changing __class__ ever?

… and this is why it doesn't work on ZODB: ZODB stores the __class__ with each persistent reference in order to be able to cheaply create ghost objects with no additional information. Things appear fine until your object gets flushed from the connection's cache and reloaded, and then your change may disappear. (I'm not the only one who has been bitten by this.)

That said, changing the class can still be useful at runtime if you're very careful.