prooph / micro

Functional prooph for microservices
http://getprooph.org
BSD 3-Clause "New" or "Revised" License
53 stars 7 forks source link

[Question] Should we remove state and snapshots? #34

Closed codeliner closed 7 years ago

codeliner commented 7 years ago

While writing the Event Sourcing part of our walk-through guide I noticed that it is difficult to explain why we pass state to an aggregate function. A few questions came to mind: Where does the state come from (of course I know the answer but it requires a lot of explanation)? Why don't we work with the history instead?

I did a short research and found this excellent blog post by @mathiasverraes. I read the post long time ago but my focus was on object oriented ES and not the fp way.

Mathias uses f(history, command) => events to describe an aggregate function.

Our definition looks a bit different atm: f(state, command) => newState // events

It allows us to work with state snapshots and the aggregate function can inspect state to verify that command can be processed. But the aggregate function could also inspect history instead.

If we pass history and only return events our aggregate functions become even more simple. We no longer need the apply method as this would be only part of a projection. The aggregate can iterate history and filter events with required information. As long as the history of a single aggregate is not too big it would work quite well.

codeliner commented 7 years ago

Oh here is the link to the post: http://verraes.net/2014/05/functional-foundation-for-cqrs-event-sourcing/

prolic commented 7 years ago

As long as the history of a single aggregate is not too big it would work quite well.

And when the history is big or complex to recreate state there is a performace problem. We tested that already and snapshots every 5 events is required for good performance.

If we pass history and only return events our aggregate functions become even more simple.

The other way around. If we pass history you recreate state from it in the first step of that function in most cases, making repetitions all over the aggregate functions

Also we have f(state, events) for projections, having the same signature everywhere makes it more easy to remember the way of doing it.

codeliner commented 7 years ago

We tested it with the full OO stack: prooph/event-sourcing with repository, aggregate translator ... and it was ok for 20 - 30 events. We just defined 5 as a good default because we saw that every event replay takes its time.

Anyway, do you really need to replay the full history every time? Often we only carry information in events that are important for the projections. Some very simple examples:

ChangeEmailAddress -> user aggregate needs to check if email is unique and maybe wants to add the old email to EmailWasChanged event

RemoveItemFromCart -> cart aggregate needs to check if item is in the cart and checkout is not completed yet, but doesn't need all items replayed and also doesn't need shipping information etc. replayed if already entered.

I think there are many more examples. When looking at our own aggregates in the current project they would benefit from having the history available instead of replayed, injected state. In most of the aggregate methods we check status information like: can we move to the next status.

Performance problems could be solved by injecting only the part of the history that is of interest of the current aggregate function.

Let's take the simple user change email example:

//example code, not possible at the moment:
$eventMatcher = new EventMatcher();
$eventMatcher = $eventMatcher->withEventNames(['UserWasRegistered', 'EmailWasChanged']);
$eventMatcher = $eventMatcher->withMetadataMatcher(/* match aggregate id and type */);
$history = $eventStore->load('user_stream', $eventMatcher);

$command = new ChangeEmail($userId, $newEmail);

$emailGuard = $factories['emailGuard']();

$changeEmail = function($emailGuard, $history, $command): Message[] {
  $oldEmail = '';

  if(!$emailGuard->isUnique($command->email())) {
    return [ChangeEmailAddressFailed::withDuplicateEmail($command->email(), $command->userId())];
  }

  foreach($history as $event) {
    if(in_array($event->messageName(), ['UserWasRegistered', 'EmailWasChanged'])) {
      $oldEmail = $event->email();
    }
  }
  return [EmailWasChanged::with($command->email(), $oldEmail, $command->userId())];
}
prolic commented 7 years ago

AggregateResult

Maybe it's not a bad idea to remove state from the result for two reasons:

1) The Kernel itself only required the raised events, not the returned state 2) Giving the state as http response may show an inconsistent state, because some process managers still need to do their work.

That's way I am okay with removing aggregate result and simply return events from the aggregate.

History example VS state example

$oldEmail = '';
foreach ($history as $event) {
    if(in_array($event->messageName(), ['UserWasRegistered', 'EmailWasChanged'])) {
      $oldEmail = $event->email();
    }
}

VS:

$oldEmail = $state['old_email'];

This would be a plus for state, not for events in my opinion. Also state comes very fast from the snapshot store, the events are slower from the event store. Removing the snapshot feature completely seems like a bad idea to me. Can you provide a benchmark with 1000 events replay VS using a snapshot, if you think it's not a big deal?

More reasons for state

Imaging you have a command that only needs to get accepted, based on a lot of conditions, coming from state. You could either have something like:

if (in_array($state['foo'), $haystack)
    && $state['bar'] = 'baz'
    && $someService->isOkayWith($state['whatever'])
) {
    // accepted, to something with the command
}

or you can have

$state = [];
foreach ($history as $event) {
    if (in_array($event->messageName(), [
        'someEvent1', 
        'someOtherEvent2',
        /* damn, what else events are important? hard to say! */])
    ) {
        // build some kind of state for me first, so I can check later, after all events are applied
    }
}

// now same code as above?
codeliner commented 7 years ago

This would be a plus for state, not for events in my opinion.

You forget the apply function. Imagine you need to explain someone why the apply function is needed. I mean it is crazy. The aggregate writes history but it has no access to its own history. You cannot look back within the aggregate only projections can do. Instead of the history you have state, but state is something for the read model not for the aggregate (in theory).

Can you provide a benchmark with 1000 events replay VS using a snapshot, if you think it's not a big deal?

No, I won't. I never said that 1000 events replay VS snapshots make no difference. I spoke about the normal case that one aggregate records up to 30 events not more. To solve the performance problem if an aggregate records more events my suggestion is that events are filtered during EventStore::load. So only those events are passed to the aggregate function which are required to perform the actual step of the process. Even if this is more complex than dealing with snapshots it makes the whole concept more clear IMHO and you could also ask yourself if you did a design mistake if one aggregate is responsible for so many events.

/ damn, what else events are important? hard to say! /

That is the best argument for state and maybe it outweighs all cons. I just wanted to throw in the question before it is too late. I can live with injecting state but we need to be aware that it is not really the idea of a pure functional event sourcing style.

Maybe it's not a bad idea to remove state from the result for two reasons:

Good arguments. We should return events only

prolic commented 7 years ago

General statement

I am only pretending to know the right answer - clearly I have no clue. That's why I am arguing against it, so we can find out who has the better arguments :-)

Apply-method

The apply-method created some state. The handler function receives the state and can do checks against it, so it can know, which event to emit. Oh, that was pretty easy to explain :)

State as read model only

Let's look back to our OO aggregates - do we have no state there at all? No we have state there! Is it read model? Well, kind of maybe. But the aggregate root is still not bound to a state implementation. You can always modify your apply-method and drop the snapshot-table. You are ready to use any other kind of state at any time. The only source of truth is still the event store.

Now let's look back to FP: You can still change the apply-method and drop the snapshots. The state representation is not a something that you cannot change. I would say, the state is an in-memory representation of the aggregate root, that is based on a stream of events. You can alter the state creation at any time, reuse it for the read model (CQRS) or create a custom one.

Big event streams

I heard that argument really often: If you do have a large event stream for an aggregate root, you are doing something wrong. Well really? Let's look at an example, an User-Aggregate:

Events are:

UserRegistered UserChangedEmailAddress UserChangedPassword

... I see, more than 30 events are unlikely... really?

UserLoggedIn UserLoggedOut

Now what? Did I do something terribly wrong? Or do I simply have a large event stream, because I record so much information?

History as argument instead of state

When the aggregate root outputs events, does that mean I also need events as input? I am not sure about this. I never came across a scenario, where I needed an old event (not state!) to know what to do with the incoming command. I never hat the problem, that state alone cannot solve the problem, only the real event stream can do it for me.

Alternate solutions

A

We could have this

function (array $state, Command $command): Message[] as the default function requirement

If someone is able to come up with an example, where state is not enough and the real event stream is required, then.....

function (array $state, Command $command) use (Iterator $history): Message[] is still possible. The history can be fetched by the command handler who has access to the event store (injected dependency).

B

Let's go complete nuts, I don't know yet if this is really possible, maybe it's confusing, maybe it's working nicely.

function (array $sate, Command $command): Message[]

and

function (Iterator $history, Command $command): Message[]

are BOTH VALID! The micro-kernel could find out based on the function signature which way to go. You can even mix both types within a single aggregate maybe. Even if that would be a bigger problem, this can information can also be stored on the AggregateDefinition, but then you need to decide for one way (at least per aggregate type).

Something else

Do we really want to have the Command as argument? Isn't that supposed to be for the command handler only?

function login(string $password): Message[]

Here I think I only need the password as string, f.e., not the command itself. This can stay in the command handler.

function doSomething(): Message[]

Sometimes I don't even need any parameters.

The aggregate functions should always require the argument, that they require. The command handler is using the command class only. We did it this way in the past - why change that now? Is there a real reason for this?

prolic commented 7 years ago

Doing some google stuff and all I can find is, replay events to recreate state. State is not evil. It's an implementation detail of the aggregate root that can be changed at any time. Could not find something like Matthias described, besides his blogpost.

prolic commented 7 years ago

Btw, i found some functional aggregate root examples, most of them use objects for the aggregate root, all of them apply events and create state.

codeliner commented 7 years ago

The apply-method created some state. The handler function receives the state and can do checks against it, so it can know, which event to emit. Oh, that was pretty easy to explain :)

Ok, I copy & paste it into the walk-through guide. Let's see if someone without ES knowledge understands it :P I hope you know what I mean. It is easier to explain that an aggregate function raises events and the events are written to a stream. The same stream is passed as history to a follow up aggregate function so that it can inspect the previously raised events to make the next decision.

When the aggregate root outputs events, does that mean I also need events as input?

Not necessarily, but the concept is easier to explain and feels right.

I would say, the state is an in-memory representation of the aggregate root, that is based on a stream of events. You can alter the state creation at any time, reuse it for the read model (CQRS) or create a custom one.

That is a very good explanation. I have nothing to add.

Now what? Did I do something terribly wrong? Or do I simply have a large event stream, because I record so much information?

Not sure. Maybe it would be better to have a session aggregate. Lifecycle begins with log in and ends with log out. But yeah, it is the same problem as with large-cluster-aggregates. You cannot always avoid them.

I never hat the problem, that state alone cannot solve the problem, only the real event stream can do it for me.

Well, you can always workaround the issue and put something into state to remember a previous event. Had this situations two or three times already. Access to the history would have solved the problem in a more elegant way but this idea can help in those cases:

function (array $state, Command $command) use (Iterator $history): Message[]

Do we really want to have the Command as argument? Isn't that supposed to be for the command handler only?

The idea is to skip the command handler. It is boilerplate in most cases. You define the same arguments for the command as you define for the aggregate function. Then you write the command handler to get the values out of the command and pass them to the aggregate. But it is no hard rule (just always use a closure that invokes the aggregate function), just less code to write if you are lazy (like me ;))

Conclusion

We should return events from aggregate functions but not the state (Do we still need to call apply in the aggregate method?)

We should keep the current way of injecting state and explain the concept in the docs.

Thx for the discussion @prolic :+1:

gregoryyoung commented 7 years ago

command handler = f(state, command) -> events

all event handling functions are f(state,event) -> state theses are then matched via a function match(state,event)->state to get current state fold history match -> state

now replace

command handler = f(fold history match -> state, command) -> events

as example of why this is better. How would you memoize stuff in your examples ^^^

prolic commented 7 years ago

@gregoryyoung Thanks for your hint.

Unfortunately in PHP we cannot write a function declaration like this:

function handle(fold $history match, $command): array

That's why the fold happens first, like this:

$state = apply($state, $events);
$newEvents = handler($state, $command);

See: https://github.com/prooph/micro/blob/master/src/Kernel.php

Also when we get some state from a snapshot store, we only need to fold the few events newer than the snapshot, so the aggregate root has only state information, not the complete history of events, when a command comes in.

gregoryyoung commented 7 years ago

basically the same though I would prefer to be more explicit and pass in a function that returns state as there are other interesting things that can be done there compositionally

On Mon, Jan 30, 2017 at 5:58 PM, Sascha-Oliver Prolic < notifications@github.com> wrote:

@gregoryyoung https://github.com/gregoryyoung Thanks for your hint.

Unfortunately in PHP we cannot write a function declaration like this:

function handle(fold $history match, $command): array

That's why the fold happens first, like this:

$state = apply($state, $events); $newEvents = handler($state, $command);

See: https://github.com/prooph/micro/blob/master/src/Kernel.php

Also when we get some state from a snapshot store, we only need to fold the few events newer than the snapshot, so the aggregate root has only state information, not the complete history of events, when a command comes in.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/prooph/micro/issues/34#issuecomment-276138470, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXRWlAi2aZnEgMhCByaqQpLm30DvokSks5rXiTQgaJpZM4Lw7cK .

-- Studying for the Turing test

prolic commented 7 years ago

Thx! :+1:

codeliner commented 7 years ago

nice, sometimes it is good to just ask in the room ;) Thx @gregoryyoung for joining the discussion. Btw. how did you find the thread? Do you have a "ongoing ES discussion"-detector? :D

gregoryyoung commented 7 years ago

twitter.

On Mon, Jan 30, 2017 at 6:27 PM, Alexander Miertsch < notifications@github.com> wrote:

nice, sometimes it is good to just ask in the room ;) Thx @gregoryyoung https://github.com/gregoryyoung for joining the discussion. Btw. how did you find the thread? Do you have a "ongoing ES discussion"-detector? :D

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/prooph/micro/issues/34#issuecomment-276147281, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXRWmzNQiUAilksPoySNKWuiZg3uOBwks5rXiuTgaJpZM4Lw7cK .

-- Studying for the Turing test

codeliner commented 7 years ago

the world is so small ;) oh @gregoryyoung if you are there already: thank you so much for your work. I learned a lot from your papers and talks.

Ocramius commented 7 years ago

Awesome work over here 👍

codeliner commented 7 years ago

thx @Ocramius same for your work here: https://ocramius.github.io/blog/on-aggregates-and-external-context-interactions/#comment-3120095921

:+1: for the discussion with @mathiasverraes

We have a similar problem here. We try to apply some functional ideas but we need to keep in mind that PHP has some limitations that need to be addressed in a way that it is easy to use.