pyeventsourcing / eventsourcing

A library for event sourcing in Python.
https://eventsourcing.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.47k stars 129 forks source link

created_on and modified_on attributes of Aggregates reconstructed from snapshots have inconsistent types #283

Open matthewchao opened 4 days ago

matthewchao commented 4 days ago

Screenshot 2024-10-17 122045

While trying to enable snapshotting, I found behavior that might not be consistent with what's written in the documentation. As a simple example, I have slightly modified the test code from example aggregate 8 to say

# Construct application object.
school = DogSchool()

# Evolve application state.
dog_id = school.register_dog("Fido")
school.add_trick(dog_id, "roll over")
school.add_trick(dog_id, "play dead")

# Query application state.
dog = school.get_dog(dog_id)
print("type(dog['created_on']): ", type(dog['created_on']))
print("type(dog['modified_on']): ", type(dog['modified_on']))

# Take snapshot.
school.take_snapshot(dog_id, version=3)
dog = school.get_dog(dog_id)
print("type(dog['created_on']): ", type(dog['created_on']))
print("type(dog['modified_on']): ", type(dog['modified_on']))

# Continue with snapshotted aggregate.
school.add_trick(dog_id, "fetch ball")
dog = school.get_dog(dog_id)
print("type(dog['created_on']): ", type(dog['created_on']))
print("type(dog['modified_on']): ", type(dog['modified_on']))

and gotten the output

type(dog['created_on']):  <class 'datetime.datetime'>
type(dog['modified_on']):  <class 'datetime.datetime'>
type(dog['created_on']):  <class 'str'>
type(dog['modified_on']):  <class 'str'>
type(dog['created_on']):  <class 'str'>
type(dog['modified_on']):  <class 'datetime.datetime'>

The documents mention that these attributes should be datetimes.

From using a debugger, I found that those 2 attributes start off as datetimes, but during reconstruction from the repository, they become strings when mutated by a Snapshot's mutate method:

        aggregate.__dict__.update(aggregate_state)

However, if a domain event occurs after the snapshot was taken, its mutate method

        aggregate.modified_on = self.timestamp

restores the modified_on to being a datetime, which explains why the third case in my output has two different types.

To summarize,

Is this intended? And should we be able to rely on the type being datetime?

d-m commented 3 days ago

The aggregate's created_on and modified_on are actually properties that read the value from _created_on and _modified_on, respectively. In example "Aggregate 8", the declarative Pydantic example, the OrJsonTranscoder used encode to convert the datetime object to a string for these fields. It also uses decode to convert the stored event bytes value to a string for these fields. Normally, there are a bunch of transcoders that convert the string to ISO, etc., but in this example it raises an error because the expectation is that these fields will be defined on the Pydantic model for the events.

The first thing I tried was to add _created_on and _modified_on fields to the SnapshotState model. However, Pydantic uses fields prefixed with _ as private model attributes and these don't get serialized. You can almost get around this using aliases:

class SnapshotState(BaseModel):
    """Pydantic-based class for storing an aggregate's snapshot state."""

    # add leading underscores to avoid conflicting with `created_on` and `modified_on` Aggregate parameters
    created_on_: datetime = Field(alias="_created_on")
    modified_on_: datetime = Field(alias="_modified_on")

    model_config = ConfigDict(extra="allow", populate_by_name=True)

To serialize the fields of a Pydantic model with the alias values you need to add by_alias=True to model_dump, so the [PydanticMapper to_stored_event method needs to be updated to be:

    def to_stored_event(self, domain_event: DomainEventProtocol) -> StoredEvent:
        topic = get_topic(domain_event.__class__)
        event_state = cast(BaseModel, domain_event).model_dump(by_alias=True) # <- Change here
        stored_state = self.transcoder.encode(event_state)
        if self.compressor:
            stored_state = self.compressor.compress(stored_state)
        if self.cipher:
            stored_state = self.cipher.encrypt(stored_state)
        return StoredEvent(
            originator_id=domain_event.originator_id,
            originator_version=domain_event.originator_version,
            topic=topic,

The mutate method of AggregateSnapshot also needs to be overridden to include the by_alias=True line:

    def mutate(self, _: None) -> Aggregate:
        """
        Reconstructs the snapshotted :class:`Aggregate` object.
        """
        cls = cast(Type[Aggregate], resolve_topic(self.topic))
        # aggregate_state = dict(aggregate.__dict__)
        aggregate_state = self.state.model_dump(by_alias=True) # <- Change here
        from_version = aggregate_state.pop("class_version", 1)
        class_version = getattr(cls, "class_version", 1)
        while from_version < class_version:
            upcast_name = f"upcast_v{from_version}_v{from_version + 1}"
            upcast = getattr(cls, upcast_name)
            upcast(aggregate_state)
            from_version += 1

        aggregate_state["_id"] = self.originator_id
        aggregate_state["_version"] = self.originator_version
        aggregate_state["_pending_events"] = []
        aggregate = object.__new__(cls)
        aggregate.__dict__.update(aggregate_state)
        return aggregate

If you don't do this, _created_on and _modified_on attributes don't get set in the aggregate (they get set as created_on_ and modified_on_ as defined in the model) and you get an AttributeError. However, dumping the model has an side effect. Any nested Pydantic BaseModel subclasses, like Trick, also get dumped to a dictionary. Since the aggregate instance is not a BaseModel subclass, it's very difficult to rehydrate these BaseModel instances in a programatic way. If it were, we could do aggregate(**aggregate_state) and we'd be done.

A few things here:

  1. Why are Aggregate.created_on and Aggregate.modified_on (and Aggregate.version, Aggregate.id, etc.) properties? It seems like they only return their underscore counterparts without modification. If they were regular attributes this becomes much easier to do; there would be no need for model_dump or Pydantic aliases.
  2. Is there a (relatively) easy way to subclass both pydantic.BaseModel and eventsourcing.Aggregate? I tried this briefly but got a metaclass error.
  3. Is the best course of action to treate _modified_on and _created_on as special, similar to _id, _version, and _pending_events in the mutate method?
  4. Are there other hooks here that I may have missed that would simplify snapshotting with Pydantic models?

Thanks!

johnbywater commented 23 hours ago

Thanks for your very detailed and interesting analysis @matthewchao and @d-m. I'm looking into this now....