Alternative pattern for Event upcasting

danieleades commented 1 year ago

I wanted to present an alternative pattern for event upcasting. This relies on serde's ability to deserialize untagged enum representations.

I've used this pattern before for backwards compatibility of configuration files-

The gist is to separate the internal representation of an event from its serialised representation. It's serialised representation is an untagged union of all historical versions of the Event. You then add an infallible conversion from the union to the current version, and let serde do the rest.

use serde::{Deserialize, Serialize};

mod legacy {
    //! Previous versions of the `Event` enum, for backwards compatibility
    use serde::{Deserialize, Serialize};

    #[derive(Serialize, Deserialize)]
    pub enum V1 {}

    #[derive(Serialize, Deserialize)]
    pub enum V2 {}
}

// This is version 3 of the 'event'
#[derive(Serialize, Deserialize)]
#[serde(from = "EventRep")]
pub enum Event {}

#[derive(Serialize, Deserialize)]
#[serde(untagged)]
enum EventRep {
    V1(legacy::V1),
    V2(legacy::V2),
    V3(Event)
}

impl From<EventRep> for Event {
    fn from(value: EventRep) -> Self {
        match value {
            EventRep::V1(_) => todo!(),
            EventRep::V2(_) => todo!(),
            EventRep::V3(event) => event,
        }
    }
}

For fallible conversions, you could also use #[serde(try_from = "EventRep"].

Implementing upcasting this way simplifies the implementation of the framework, and removes the 'stringly' typed upcasting API in favour of a strongly-typed pattern. The downside is possibly more cognitive load on downstream users to implement this themselves and to get it right.

Obviously this is a breaking change, but i'm interested to get your thoughts.

I'd say it's likely to be possible to simplify some of the boilerplate with a derive macro, if such a thing doesn't already exist in the wild

danieleades commented 1 year ago

this approach would use simple fall-through - serde will attempt to dserialize against each variant in EventRep until one succeeds. That should be ok for most applications, but one obvious optimisation would be use some kind of semver-aware strategy for deserialisation to select the correct variant to deserialise to. A quick search on crates.io shows a few promising approaches

danieleades commented 1 year ago

Small correction - it's not actually a breaking change. It would however make the update API redundant

danieleades commented 1 year ago

here's a slightly more involved version that does away with the 'fall-through' deserialisation by using an internally tagged enum representation-

use serde::{Deserialize, Serialize};

mod legacy {
    //! Previous versions of the `Event` enum, for backwards compatibility
    use serde::{Deserialize, Serialize};

    #[derive(Serialize, Deserialize)]
    pub struct A {
        pub field1: String,
        pub field2: u32,
    }

    #[derive(Serialize, Deserialize)]
    pub enum V1 {
        A(A),
        B,
    }

    #[derive(Serialize, Deserialize)]
    pub enum V2 {
        A,
        B,
        C,
    }
}

// This is version 3 of the 'event'
#[derive(Clone, Serialize, Deserialize)]
#[serde(from = "EventRep", into = "EventRep")]
pub enum Event {}

#[derive(Serialize, Deserialize)]
#[serde(tag = "version", rename_all = "lowercase")]
enum EventRep {
    V1(legacy::V1),
    V2(legacy::V2),
    V3(Event),
}

impl From<EventRep> for Event {
    fn from(value: EventRep) -> Self {
        match value {
            EventRep::V1(_) => todo!(),
            EventRep::V2(_) => todo!(),
            EventRep::V3(event) => event,
        }
    }
}

impl From<Event> for EventRep {
    fn from(event: Event) -> Self {
        EventRep::V3(event)
    }
}

#[cfg(test)]
mod tests {
    use super::Event;

    #[test]
    fn deserialise_v1() {
        let _event: Event = serde_json::from_str(
            r#"
{
    "version": "v1",
    "A": {
        "field1": "Some String",
        "field2": 12
    }
}"#,
        )
        .unwrap();
    }
}

These patterns have some additional properties which are quite nice-

once implemented they are totally transparent to the framework. The magic is handled by serde
only major versions need to be considered, minor version changes can be handled use serde field attributes to provide default values
separation of the internal representation and the serialised representation allows the two to change independently, which may mean some breaking changes can be avoided entirely.
The pattern can be applied recursively. If a root aggregate event is an enum whose variants correspond to the events of 'child' aggregates, then those child aggregates can version the serialised representation of their own events independently of the parent aggregate (if for some reason you want to do that)

davegarred commented 1 year ago

So you're thinking of using the compiler to understand when to upcast events? I see a couple of potential issues here:

the upcast may not be something that the compiler can identify
initial state information many times needs to be added (setting a new 'country' value to 'USA')
chained upcasters are common (e.g., v1 ==> v3 ==> v8 ==> current)
could add a performance penalty over simple version comparator (granted, this might be negligible with Rust)

danieleades commented 1 year ago

Let me first preface this by saying i don't unequivocally think you should move to this approach, but i think it's interesting to explore. I can answer most of your questions, but injecting context during upcasting is definitely a wrinkle. With that out of the way-

So you're thinking of using the compiler to understand when to upcast events? I see a couple of potential issues here:

sort of. The upcasting still happens at runtime, but it's handled transparently by the serde framework in a declarative fashion, instead of in custom imperative code.

the upcast may not be something that the compiler can identify

Can you give an example of what you mean here?

initial state information many times needs to be added (setting a new 'country' value to 'USA')

i don't think there's anything stopping you from doing that with the approach of outlined, assuming for a given case it was always 'USA'.

an example from the code-

        let upcast_function = Box::new(|payload: Value| {
            if let Value::Object(mut object_map) = payload {
                object_map.insert("country".to_string(), "USA".into());
                Value::Object(object_map)
            } else {
                panic!("the event payload is not an object")
            }
        });
        let upcaster = SemanticVersionEventUpcaster::new("EventX", "2.3.4", upcast_function);

this would become

struct EventV1 {
    zip_code: usize,
    state: String,
}

struct Event {
    zip_code: usize,
    state: String,
    country: String,
}

impl From<EventV1> for Event {
    fn from(event: EventV1) -> Self {
        Self {
            zip_code: event.zip_code,
            state: event.state,
            country: "USA".to_string(),
        }
}

Where it gets tricky is if sometimes its 'USA' and sometimes you need to inject something else. The current upcasting implementation could track this with internal state, whereas the new one has no internal state. I can't see any examples of this in the code currently by the way, but let's assume for the sake of argument there are cases where you want this.

That too would be solveable with an approach which is similar to what i've implemented in this PR, though with slightly more boilerplate. You could do something like-

struct EventV1 {
    zip_code: usize,
    state: String,
}

struct EventV2 {
    zip_code: usize,
    state: String,
    country: String,
}

/// The serialised representation of the event, supports multiple versions
enum EventRep {
    V1(EventV1)
    V2(EventV2)
}

trait Upcast {
    /// inject arbitrary context during upcasting
    type: Context;
    /// the current version of your event
    type: Target;

    fn upcast(self, context: &Self::Context) -> Self::Target;
}

impl Upcast for EventRep {
    type: Context = String;
    type: Target = EventV2;

    fn upcast(self, context: String) -> EventV2 {
        match self {
            Self::V1(event_v1) => {
               EventV2 {
                   zip_code: event_v1.zip_code,
                   state: event_v1.state,
                   country: context,
            },
            Self::V2(event_v2) => event_v2,
        }
    }
}

I guess this is something of a halfway house between the two approaches. You gain the ability to inject arbitrary context, everything is still strongly-typed, and you still don't need to juggle callbacks.

chained upcasters are common (e.g., v1 ==> v3 ==> v8 ==> current)

nothing in the From/TryFrom approach stops you from chaining conversions.

Say you have versions 1, 2, and 3. You can implement conversions from 1 -> 2 and 2 -> 3. You can then add a direct conversion from 1 -> 3 which internally delegates to the other two conversions in turn.

There's a helper macro for this actually - https://github.com/bobozaur/transitive

could add a performance penalty over simple version comparator (granted, this might be negligible with Rust)

i don't think so, i suspect it might actually be a bit faster, since you're not serialising into an intermediate JSON object. This has the side effect of only employing strongly-typed structures everywhere

serverlesstechnology / cqrs

Alternative pattern for Event upcasting #46