Closed codeliner closed 7 years ago
Oh here is the link to the post: http://verraes.net/2014/05/functional-foundation-for-cqrs-event-sourcing/
As long as the history of a single aggregate is not too big it would work quite well.
And when the history is big or complex to recreate state there is a performace problem. We tested that already and snapshots every 5 events is required for good performance.
If we pass history and only return events our aggregate functions become even more simple.
The other way around. If we pass history you recreate state from it in the first step of that function in most cases, making repetitions all over the aggregate functions
Also we have f(state, events) for projections, having the same signature everywhere makes it more easy to remember the way of doing it.
We tested it with the full OO stack: prooph/event-sourcing with repository, aggregate translator ... and it was ok for 20 - 30 events. We just defined 5 as a good default because we saw that every event replay takes its time.
Anyway, do you really need to replay the full history every time? Often we only carry information in events that are important for the projections. Some very simple examples:
ChangeEmailAddress -> user aggregate needs to check if email is unique and maybe wants to add the old email to EmailWasChanged
event
RemoveItemFromCart -> cart aggregate needs to check if item is in the cart and checkout is not completed yet, but doesn't need all items replayed and also doesn't need shipping information etc. replayed if already entered.
I think there are many more examples. When looking at our own aggregates in the current project they would benefit from having the history available instead of replayed, injected state. In most of the aggregate methods we check status information like: can we move to the next status.
Performance problems could be solved by injecting only the part of the history that is of interest of the current aggregate function.
Let's take the simple user change email example:
//example code, not possible at the moment:
$eventMatcher = new EventMatcher();
$eventMatcher = $eventMatcher->withEventNames(['UserWasRegistered', 'EmailWasChanged']);
$eventMatcher = $eventMatcher->withMetadataMatcher(/* match aggregate id and type */);
$history = $eventStore->load('user_stream', $eventMatcher);
$command = new ChangeEmail($userId, $newEmail);
$emailGuard = $factories['emailGuard']();
$changeEmail = function($emailGuard, $history, $command): Message[] {
$oldEmail = '';
if(!$emailGuard->isUnique($command->email())) {
return [ChangeEmailAddressFailed::withDuplicateEmail($command->email(), $command->userId())];
}
foreach($history as $event) {
if(in_array($event->messageName(), ['UserWasRegistered', 'EmailWasChanged'])) {
$oldEmail = $event->email();
}
}
return [EmailWasChanged::with($command->email(), $oldEmail, $command->userId())];
}
Maybe it's not a bad idea to remove state from the result for two reasons:
1) The Kernel itself only required the raised events, not the returned state 2) Giving the state as http response may show an inconsistent state, because some process managers still need to do their work.
That's way I am okay with removing aggregate result and simply return events from the aggregate.
$oldEmail = '';
foreach ($history as $event) {
if(in_array($event->messageName(), ['UserWasRegistered', 'EmailWasChanged'])) {
$oldEmail = $event->email();
}
}
VS:
$oldEmail = $state['old_email'];
This would be a plus for state, not for events in my opinion. Also state comes very fast from the snapshot store, the events are slower from the event store. Removing the snapshot feature completely seems like a bad idea to me. Can you provide a benchmark with 1000 events replay VS using a snapshot, if you think it's not a big deal?
Imaging you have a command that only needs to get accepted, based on a lot of conditions, coming from state. You could either have something like:
if (in_array($state['foo'), $haystack)
&& $state['bar'] = 'baz'
&& $someService->isOkayWith($state['whatever'])
) {
// accepted, to something with the command
}
or you can have
$state = [];
foreach ($history as $event) {
if (in_array($event->messageName(), [
'someEvent1',
'someOtherEvent2',
/* damn, what else events are important? hard to say! */])
) {
// build some kind of state for me first, so I can check later, after all events are applied
}
}
// now same code as above?
This would be a plus for state, not for events in my opinion.
You forget the apply
function. Imagine you need to explain someone why the apply
function is needed.
I mean it is crazy. The aggregate writes history but it has no access to its own history. You cannot look back within the aggregate only projections can do. Instead of the history you have state, but state is something for the read model not for the aggregate (in theory).
Can you provide a benchmark with 1000 events replay VS using a snapshot, if you think it's not a big deal?
No, I won't. I never said that 1000 events replay VS snapshots make no difference. I spoke about the normal case that one aggregate records up to 30 events not more. To solve the performance problem if an aggregate records more events my suggestion is that events are filtered during EventStore::load
. So only those events are passed to the aggregate function which are required to perform the actual step of the process. Even if this is more complex than dealing with snapshots it makes the whole concept more clear IMHO and you could also ask yourself if you did a design mistake if one aggregate is responsible for so many events.
/ damn, what else events are important? hard to say! /
That is the best argument for state and maybe it outweighs all cons. I just wanted to throw in the question before it is too late. I can live with injecting state but we need to be aware that it is not really the idea of a pure functional event sourcing style.
Maybe it's not a bad idea to remove state from the result for two reasons:
Good arguments. We should return events only
I am only pretending to know the right answer - clearly I have no clue. That's why I am arguing against it, so we can find out who has the better arguments :-)
The apply
-method created some state. The handler function receives the state and can do checks against it, so it can know, which event to emit. Oh, that was pretty easy to explain :)
Let's look back to our OO aggregates - do we have no state there at all? No we have state there! Is it read model? Well, kind of maybe. But the aggregate root is still not bound to a state implementation. You can always modify your apply
-method and drop the snapshot-table. You are ready to use any other kind of state at any time. The only source of truth is still the event store.
Now let's look back to FP: You can still change the apply
-method and drop the snapshots. The state representation is not a something that you cannot change. I would say, the state is an in-memory representation of the aggregate root, that is based on a stream of events. You can alter the state creation at any time, reuse it for the read model (CQRS) or create a custom one.
I heard that argument really often: If you do have a large event stream for an aggregate root, you are doing something wrong.
Well really? Let's look at an example, an User-Aggregate
:
Events are:
UserRegistered
UserChangedEmailAddress
UserChangedPassword
... I see, more than 30 events are unlikely... really?
UserLoggedIn
UserLoggedOut
Now what? Did I do something terribly wrong? Or do I simply have a large event stream, because I record so much information?
When the aggregate root outputs events, does that mean I also need events as input? I am not sure about this. I never came across a scenario, where I needed an old event (not state!) to know what to do with the incoming command. I never hat the problem, that state alone cannot solve the problem, only the real event stream can do it for me.
We could have this
function (array $state, Command $command): Message[]
as the default function requirement
If someone is able to come up with an example, where state is not enough and the real event stream is required, then.....
function (array $state, Command $command) use (Iterator $history): Message[]
is still possible. The history can be fetched by the command handler who has access to the event store (injected dependency).
Let's go complete nuts, I don't know yet if this is really possible, maybe it's confusing, maybe it's working nicely.
function (array $sate, Command $command): Message[]
and
function (Iterator $history, Command $command): Message[]
are BOTH VALID! The micro-kernel could find out based on the function signature which way to go. You can even mix both types within a single aggregate maybe. Even if that would be a bigger problem, this can information can also be stored on the AggregateDefinition
, but then you need to decide for one way (at least per aggregate type).
Do we really want to have the Command
as argument? Isn't that supposed to be for the command handler only?
function login(string $password): Message[]
Here I think I only need the password as string, f.e., not the command itself. This can stay in the command handler.
function doSomething(): Message[]
Sometimes I don't even need any parameters.
The aggregate functions should always require the argument, that they require. The command handler is using the command class only. We did it this way in the past - why change that now? Is there a real reason for this?
Doing some google stuff and all I can find is, replay events to recreate state. State is not evil. It's an implementation detail of the aggregate root that can be changed at any time. Could not find something like Matthias described, besides his blogpost.
Btw, i found some functional aggregate root examples, most of them use objects for the aggregate root, all of them apply events and create state.
The apply-method created some state. The handler function receives the state and can do checks against it, so it can know, which event to emit. Oh, that was pretty easy to explain :)
Ok, I copy & paste it into the walk-through guide. Let's see if someone without ES knowledge understands it :P
I hope you know what I mean. It is easier to explain that an aggregate function raises events and the events are written to a stream. The same stream is passed as history
to a follow up aggregate function so that it can inspect the previously raised events to make the next decision.
When the aggregate root outputs events, does that mean I also need events as input?
Not necessarily, but the concept is easier to explain and feels right.
I would say, the state is an in-memory representation of the aggregate root, that is based on a stream of events. You can alter the state creation at any time, reuse it for the read model (CQRS) or create a custom one.
That is a very good explanation. I have nothing to add.
Now what? Did I do something terribly wrong? Or do I simply have a large event stream, because I record so much information?
Not sure. Maybe it would be better to have a session aggregate. Lifecycle begins with log in and ends with log out. But yeah, it is the same problem as with large-cluster-aggregates. You cannot always avoid them.
I never hat the problem, that state alone cannot solve the problem, only the real event stream can do it for me.
Well, you can always workaround the issue and put something into state to remember a previous event. Had this situations two or three times already. Access to the history would have solved the problem in a more elegant way but this idea can help in those cases:
function (array $state, Command $command) use (Iterator $history): Message[]
Do we really want to have the Command as argument? Isn't that supposed to be for the command handler only?
The idea is to skip the command handler. It is boilerplate in most cases. You define the same arguments for the command as you define for the aggregate function. Then you write the command handler to get the values out of the command and pass them to the aggregate. But it is no hard rule (just always use a closure that invokes the aggregate function), just less code to write if you are lazy (like me ;))
We should return events from aggregate functions but not the state (Do we still need to call apply
in the aggregate method?)
We should keep the current way of injecting state and explain the concept in the docs.
Thx for the discussion @prolic :+1:
command handler = f(state, command) -> events
all event handling functions are f(state,event) -> state theses are then matched via a function match(state,event)->state to get current state fold history match -> state
now replace
command handler = f(fold history match -> state, command) -> events
as example of why this is better. How would you memoize stuff in your examples ^^^
@gregoryyoung Thanks for your hint.
Unfortunately in PHP we cannot write a function declaration like this:
function handle(fold $history match, $command): array
That's why the fold happens first, like this:
$state = apply($state, $events);
$newEvents = handler($state, $command);
See: https://github.com/prooph/micro/blob/master/src/Kernel.php
Also when we get some state from a snapshot store, we only need to fold the few events newer than the snapshot, so the aggregate root has only state information, not the complete history of events, when a command comes in.
basically the same though I would prefer to be more explicit and pass in a function that returns state as there are other interesting things that can be done there compositionally
On Mon, Jan 30, 2017 at 5:58 PM, Sascha-Oliver Prolic < notifications@github.com> wrote:
@gregoryyoung https://github.com/gregoryyoung Thanks for your hint.
Unfortunately in PHP we cannot write a function declaration like this:
function handle(fold $history match, $command): array
That's why the fold happens first, like this:
$state = apply($state, $events); $newEvents = handler($state, $command);
See: https://github.com/prooph/micro/blob/master/src/Kernel.php
Also when we get some state from a snapshot store, we only need to fold the few events newer than the snapshot, so the aggregate root has only state information, not the complete history of events, when a command comes in.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/prooph/micro/issues/34#issuecomment-276138470, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXRWlAi2aZnEgMhCByaqQpLm30DvokSks5rXiTQgaJpZM4Lw7cK .
-- Studying for the Turing test
Thx! :+1:
nice, sometimes it is good to just ask in the room ;) Thx @gregoryyoung for joining the discussion. Btw. how did you find the thread? Do you have a "ongoing ES discussion"-detector? :D
twitter.
On Mon, Jan 30, 2017 at 6:27 PM, Alexander Miertsch < notifications@github.com> wrote:
nice, sometimes it is good to just ask in the room ;) Thx @gregoryyoung https://github.com/gregoryyoung for joining the discussion. Btw. how did you find the thread? Do you have a "ongoing ES discussion"-detector? :D
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/prooph/micro/issues/34#issuecomment-276147281, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXRWmzNQiUAilksPoySNKWuiZg3uOBwks5rXiuTgaJpZM4Lw7cK .
-- Studying for the Turing test
the world is so small ;) oh @gregoryyoung if you are there already: thank you so much for your work. I learned a lot from your papers and talks.
Awesome work over here 👍
thx @Ocramius same for your work here: https://ocramius.github.io/blog/on-aggregates-and-external-context-interactions/#comment-3120095921
:+1: for the discussion with @mathiasverraes
We have a similar problem here. We try to apply some functional ideas but we need to keep in mind that PHP has some limitations that need to be addressed in a way that it is easy to use.
While writing the Event Sourcing part of our walk-through guide I noticed that it is difficult to explain why we pass state to an aggregate function. A few questions came to mind: Where does the state come from (of course I know the answer but it requires a lot of explanation)? Why don't we work with the history instead?
I did a short research and found this excellent blog post by @mathiasverraes. I read the post long time ago but my focus was on object oriented ES and not the fp way.
Mathias uses
f(history, command) => events
to describe an aggregate function.Our definition looks a bit different atm:
f(state, command) => newState // events
It allows us to work with state snapshots and the aggregate function can inspect
state
to verify thatcommand
can be processed. But the aggregate function could also inspecthistory
instead.If we pass
history
and only returnevents
our aggregate functions become even more simple. We no longer need theapply
method as this would be only part of a projection. The aggregate can iteratehistory
and filter events with required information. As long as the history of a single aggregate is not too big it would work quite well.