spatie / laravel-event-sourcing

The easiest way to get started with event sourcing in Laravel
https://docs.spatie.be/laravel-event-sourcing
MIT License
774 stars 165 forks source link

Duplicate uuid/version events prevent further updates #170

Closed UncertaintyP closed 3 years ago

UncertaintyP commented 3 years ago

First I want to thank you guys for your awesome packages and the hard work you put into them. :heart:

I was trying out this package in local development (Nginx + PHP-FPM, MySQL). After some time this system goes dormant and the next request takes some time to be processed. In this state I (accidentally) made 2 concurrent requests to update the same aggregate. The corresponding controller function:

    public function update(UserUpdateRequest $request, User $user)
    {
        UserAggregateRoot::retrieve($user->id)
            ->updateUser($request->validated() + ['id' => $user->id])
            ->persist();

        return UserResource::make($user->refresh());
    }

This resulted in a duplicate entry in the database, see:

image

Note: The UserAggregateRoot does NOT have protected static bool $allowConcurrency = true;

Now every update request to that specific aggregate (user) results in the following error:

[2020-11-19 17:39:58] local.ERROR: Could not persist aggregate UserAggregateRoot (uuid: 8be7ace5-8c98-3823-b858-11cdff2974e7) because it seems to be changed by another process after it was retrieved in the current process. Expect to persist events after version 2, but version 1 was already persisted. {"userId":"9e33aa4f-6efa-3505-ab49-9d667e5b2c39","exception":"[object] (Spatie\\EventSourcing\\Exceptions\\CouldNotPersistAggregate(code: 0): Could not persist aggregate UserAggregateRoot (uuid: 8be7ace5-8c98-3823-b858-11cdff2974e7) because it seems to be changed by another process after it was retrieved in the current process. Expect to persist events after version 2, but version 1 was already persisted. at base/vendor/spatie/laravel-event-sourcing/src/Exceptions/CouldNotPersistAggregate.php:18)
[stacktrace]
#0 base/vendor/spatie/laravel-event-sourcing/src/AggregateRoots/AggregateRoot.php(189): Spatie\\EventSourcing\\Exceptions\\CouldNotPersistAggregate::unexpectedVersionAlreadyPersisted()
#1 base/vendor/spatie/laravel-event-sourcing/src/AggregateRoots/AggregateRoot.php(90): Spatie\\EventSourcing\\AggregateRoots\\AggregateRoot->ensureNoOtherEventsHaveBeenPersisted()
#2 base/vendor/spatie/laravel-event-sourcing/src/AggregateRoots/AggregateRoot.php(79): Spatie\\EventSourcing\\AggregateRoots\\AggregateRoot->persistWithoutApplyingToEventHandlers()
#3 base/Modules/User/Http/Controllers/UserController.php(91): Spatie\\EventSourcing\\AggregateRoots\\AggregateRoot->persist()
  1. Should this be handled by the package automatically (is this a bug) or should this be prevented in another part of the system?
  2. Should I enforce unique constraints in the database on [aggregate_uuid, aggregate_version]? If so, should this be highlighted in the docs?
  3. FYI: adding allowConcurrency afterwards to the root allows for updating and inserts a new event with version 3.

Best regards, Marcel

dilab commented 3 years ago

We had the same issue and we had to manually delete one of the duplicated events.

Wonder why this happens too.

dilab commented 3 years ago

@UncertaintyP turns out this is a design issue. we as developers should design the functions to be idempotent to protect this. it is a valid exception the aggregate throws.

@UncertaintyP would love to discuss your use case. you can reach me via my profile.

UncertaintyP commented 3 years ago

@dilab Sorry I don't see how to contact you.

My use case is pretty simple. I want to update a User model. I was relying on the documentation stating: If that highest version id differs from the highest one that the aggregate remembers, an exception will be thrown. This can happen when there are two concurrent requests that try to update the same aggregate. Obviously, that has not been the case and I struggle to understand why or how to protect against it.

dilab commented 3 years ago

@UncertaintyP Basically we have to guard this scenario, prevent this from happening. Things like eventual consistency and saga comes to play.

UncertaintyP commented 3 years ago
  1. The docs already say that this is protected by the package as stated above (which is probably not).
  2. This is an atomic transaction - only a single event has been recorded - nothing you can order or manage. I fail to see how this is related to eventual consistency or saga.
  3. What do you suggest in this case specifically - a record of a single event? Again if this should be enforced in some way by the user (locking tables, enforcing constraints, ...) I feel like the docs should mention this and maybe even provide a solution as this is the most basic usage of an aggregate.
dilab commented 3 years ago

@freekmurze is it possible to add lock mechanism to this package to prevent concurrent requests?

booni3 commented 3 years ago

I also had this same issue. My fix as I was up against it at the time was to manually delete the event in the stored events table, which allowed my system to unlock. My duplicate event appeared within the events table only, so the aggregate root was out of sync to my projections (although this time it was the aggregate root at fault).

I do have protected static bool $allowConcurrency = true on at the moment, although this was a duplicated single event, not two events firing at the same time, so I do not think this is relevant.

image

Where you see 4 lines, there should only be 2. Only 2 events fired as reflected in my projections.

dilab commented 3 years ago

@UncertaintyP are you using transaction somewhere in this particular request?

UncertaintyP commented 3 years ago

@dilab No. This is simply recording the event and the projector uses Laravels User::update function. Also no queues. Will probably go with a unique constraint on both columns because I don't need concurrency in this project.

booni3 commented 3 years ago

@UncertaintyP @dilab This is the same in my case too. I am having concurrent requests being generated via a standard button click through to a controller. I do have queues running too but have isolated certain cases to actions where no queues are running. I do not really understand how a concurrent request can happen in these cases. I am not using transactions either.

I am wondering whether adding an atomic lock prior to calling any methods on the aggregate root will fix this or whether something is going on past this first call. i.e.

Cache::lock('MyAction')->get(function () {
    InventoryAggregateRoot::retrieve($this->uuid)->allocateOrderItem($this->item)->persist();
});

I am trying this next.

booni3 commented 3 years ago

I have found another instance where the aggregate has 2 events under the same aggregate_version, but they were created 20 mins apart. So these are certainly not the same process and an atomic lock will not help.

What could cause the same aggregate version to be used across this length of time?

Same aggregate_uuid, same aggregate_version, created at 21:05 & 21:26 respectively

image

erikgaal commented 3 years ago

@booni3 are you using (non-sync) queues?

booni3 commented 3 years ago

@erikgaal I have seen the issue in 2 scenarios:

Same aggregate version, same timestamp: In these cases, one event had the user saved in the metadata in and the other event did not. This is being run via a livewire button without queues but there may be something going on with livewire that is causing a double call of the method. To remove this as an option I have added atomic locks to the action class that calls the AR method.

Same aggregate version, different timestamps: This one was likely run within a async job, however it runs on a queue that only allows a single process at a time (config below) so I am unsure how it could have created a duplicate event, especially this far apart.

'supervisor-allocation' => [
    'connection' => 'redis-long-running',
    'queue' => ['allocation-high', 'allocation'],
    'balance' => 'false', // process queues in order
    'processes' => 1,
    'tries' => 1,
    'timeout' => 300, // Timeout after 5 minutes
]
dilab commented 3 years ago

@booni3 what is the size of the aggregate? I meant how many events are recorded for that particular UUID.

dilab commented 3 years ago

@freekmurze Did some research over the weekend. This package is unable to deal with the race condition. For instance if you have a race condition like two events with the same ID and the same version will be saved at nearly the same time. This can be simulated using apache benchmark, for example ab -n 20 -c 10

One way to solve this is to use a composite primary key consisting of the aggregate ID and the version number, however, this package does not save unique version number per aggregate.

So my suggestion is to do:

image

What do you think? If okay, I will send a PR .

brendt commented 3 years ago

Hi all, sorry for the late reply. I was reading through all your comments and was thinking that a composite key would indeed be the best solution, let the database deal with the race conditions in order to prevent invalid state. This means that the developer should still handle those cases if they occur.

@dilab thanks for the thorough investigation, yes feel free to send a PR!

brendt commented 3 years ago

FYI I'm working on a PR myself.

brendt commented 3 years ago

Status update: unfortunately adding a unique constraint breaks aggregate roots with $allowConcurrency enabled. I wasn't working on this package when $allowConcurrency was added but it does exactly the opposite as what we want to achieve, and I believe it's a feature that should be removed in the next major update.

There are two options in dealing with this breaking change:

I prefer option 2, but would like some input.

riasvdv commented 3 years ago

I think the deprecation and optional migration is a good solution for people who need the fix right away.

We can then remove the concurrency functionality in the v5 branch

brendt commented 3 years ago

PR is here: https://github.com/spatie/laravel-event-sourcing/pull/212

robertvansteen commented 3 years ago

@brendt awesome, thanks!

Any plans to add a retry mechanism to this package? Before we switched to this package we had a custom event sourcing implementation that caught the unique constraint violation and retried X times before giving up, allowing you to still have some concurrency but without giving up the integrity of the aggregate.

brendt commented 3 years ago

@rovansteen We could add a function that retries x-amounts of time but it sounds to me that it would only solve part of the problem.

My preference would be to have a mechanism that ensures only one persist can be done simultaneously for specific operations that you know might happen concurrently, and that will automatically refresh the aggregate root as well. We already have an excellent queuing system built-into Laravel, so I'd prefer to look in that direction first.

brendt commented 3 years ago

As a sidenote: we were already planning on adding a command bus in v5, which would be one way of solving the issue: only allow one concurrent command at a time, running in a job.

dilab commented 3 years ago

@brendt great, any timeline to release this?

dilab commented 3 years ago

@rovansteen we are using the laravel built-in queue retry mechanism to handle the case you have mentioned.

brendt commented 3 years ago

Released as 4.10.0: https://github.com/spatie/laravel-event-sourcing/releases/tag/4.10.0

brendt commented 3 years ago

Here's the followup discussion about better handling concurrency: https://github.com/spatie/laravel-event-sourcing/discussions/214