prooph / pdo-event-store

PDO implementation of ProophEventStore http://getprooph.org
BSD 3-Clause "New" or "Revised" License
111 stars 56 forks source link

Add gap detection to projections #221

Closed codeliner closed 4 years ago

codeliner commented 4 years ago

Fixes https://github.com/prooph/pdo-event-store/issues/189

The PR adds a "gap detection" mechanism to projections. This means, that if the projection detects a gap in the stream position while processing events, it pauses processing for a configurable time, before rereading events from the last known position. Gap detection is disabled by default.

Enabling gap detection for a projection:

$gapDetection = new GapDetection();

$projection = $projectionManager->createProjection('test_projection', [
            PdoEventStoreProjector::OPTION_GAP_DETECTION => $gapDetection,
        ]);

Enabling gap detection for a read model projection with custom configuration:

$gapDetection = new GapDetection(
    // Configure retries in case a gap is detected
    [
        0, // First retry without any sleep time
        10, // Second retry after 10 ms
        30, // Wait another 30 ms before performing a third retry
    ],
    \DateInterval('PT60S') //Set detection window to 60s
);

$projection = $projectionManager->createReadModelProjection('test_projection', [
            PdoEventStoreReadModelProjector::OPTION_GAP_DETECTION => $gapDetection,
        ]);

Detection window

If you set a detection window, gap detection is only performed on events not older than NOW - window If the detection window is set to PT60S, only events created within the last 60 seconds are checked. Be careful when setting Event::createdAt manually (for example during a migration).

codeliner commented 4 years ago

Some more tests + documentation is missing, but I'd like to collect some feedback on the PR. We will run some experiments next week, to check if gap detection solves the issue of skipped events.

coveralls commented 4 years ago

Coverage Status

Coverage increased (+0.3%) to 84.267% when pulling 0c7fc051e4cbc4827b8f539b542f8ffef75ead54 on feature/gap_detection into 0681a637ce1e2db5f8737b07ebe355b60cc78415 on master.

prolic commented 4 years ago

I'm not sure about those predefined timing windows. Some transactions might be long-running and the sleep times are fairly short. From what I saw in the past, those gabs are pretty rare, so even if the projection falls behind for 5 secs, that's still acceptable in most situations, so maybe we should add some longer retry times as well. But without data that's only speculation.

codeliner commented 4 years ago

"normal gaps" can occur very easily. They are caused by concurrency exceptions. That's the reason why I'd like to avoid a long sleep time by default. 50ms are already a very long time for 99% of write operations in a typical event sourced implementation.

fritz-gerneth commented 4 years ago

This is pretty awesome stuff and I'll put this to testing on our side this week too!

codeliner commented 4 years ago

Cool, looking forward to your results @fritz-gerneth We performed first tests yesterday but still had gaps with the default sleep times. So I guess @prolic is right that we need longer defaults. But we're testing Postgres event-store without GET_LOCK. Hopefully we can perform some more tests tomorrow.

GET_LOCK + gap detection should work with short sleep times, right? Because in that case faulty gaps only occur when one aggregate records multiple events with different payload size in one transaction.

fritz-gerneth commented 4 years ago

GET_LOCK + gap detection should work with short sleep times, right? Because in that case faulty gaps only occur when one aggregate records multiple events with different payload size in one transaction.

I think so.

We ran error-prone operations a couple of thousand times today without any issue (default gap detection settings). Error-prone in the magnitude of "once every 1k - 2k saves" though..

codeliner commented 4 years ago

Great news @fritz-gerneth :tada:

We can also confirm that for our case a longer sleep time fixed the issue (No GET_LOCK, only gap detection). We're trying to balance sleep time vs. gaps now.

So I'll definitely continue with the PR and finish it. Sleep times need to be documented anyway, but I think I'd go with @prolic suggestion and increase the default sleep time. If it is too slow for someone, they can simply override the setting and define their own.

prolic commented 4 years ago

I think you can still use the sleep times you have right now, just add more retries with larger intervals. I would be totally fine with a projection hanging 5 secs every now and then, when it means it processes correctly. Way better than handling stuff quick and still having gabs.

ahmed-alaa commented 4 years ago

@codeliner I tried the solution on Postgres with max retry of 100ms as well as windows time of 120s and without GET_LOCK strategy.

So far It works good without missing events within 10,000 events in average of 15-20 events/second.

Please Note: for example with our current implementation to override the PostgresSingleStreamStrategy or MySqlSingleStreamStrategy, this might break any service that does the same because of these changes https://github.com/prooph/pdo-event-store/commit/2b2f943b717487a2a045a168186246102c787b86#diff-38bb66b2f052fb5ca610f76079f40f3f, would be nice if could take that into consideration during the release :)

Thanks alot for the fix!

codeliner commented 4 years ago

thx @ahmed-alaa I started a discussion about BC break for persistence strategies. This will not be included in a 1.12 release: https://github.com/prooph/pdo-event-store/pull/217#issuecomment-592454046

codeliner commented 4 years ago

@prolic I guess you mean this PR in your question, right?

Open Todos

I try to finish it on Sunday.

prolic commented 4 years ago

Perfect, thanks

On Fri, Feb 28, 2020, 18:34 Alexander Miertsch notifications@github.com wrote:

@prolic https://github.com/prolic I guess you mean this PR in your question https://github.com/prooph/pdo-event-store/pull/217#issuecomment-592737725, right? Open Todos

  • more tests
  • Documentation

I try to finish it on Sunday.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/prooph/pdo-event-store/pull/221?email_source=notifications&email_token=AADAJPDIM5FTACPVOXTIN6LRFF7PLA5CNFSM4KZVGKQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENKHY2I#issuecomment-592739433, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADAJPB6NY26EDQ2ZBTYWATRFF7PLANCNFSM4KZVGKQA .

codeliner commented 4 years ago

@prolic ready to merge

keywinf commented 4 years ago

Very nice work, I personally used a Fibonacci sequence (0, 5, 5, 10, 15, 25, ... until 2s) as GapDetection options. Thanks

webdevilopers commented 3 years ago

Open Todos

* [x]  more tests

* [x]  Documentation

I guess the OPTION_GAP_DETECTION option should be added here: