Open fritz-gerneth opened 3 years ago
Sounds like a valid solution. Can you submit a pull request please?
Will so once I have had this in production for a while to verify.
Sent from Ninehttp://www.9folders.com/
Von: Sascha-Oliver Prolic @.***> Gesendet: Dienstag, 28. September 2021 23:39 An: prooph/pdo-event-store Cc: Fritz Gerneth; Author Betreff: Re: [prooph/pdo-event-store] Calling stop() can cause projections to loose stream positions (#233)
Sounds like a valid solution. Can you submit a pull request please?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/prooph/pdo-event-store/issues/233#issuecomment-929645471, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAJ4DCYOEESZW4PXEWUVQUDUEIY2BANCNFSM5E6AXKBQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
In an additional thought, I think it is also possible to delete/reset/.. the projection before having acquired the lock for it. This can happen if the locking projection (1) is sleeping, status is set to RESETTING, and another instance of the projection (2) is started. Before we acquire the lock in projection2, we reset the projection and set the state to IDLE. When projection 1 wakes up again it checks for state, and does continues to run without knowing it had been resetted.
Projection 1 CLI Projection 2
RUNNING
|
(sleep) RESET
| Start Projection
| |
| Check remote status
| |
| Reset, set remote status to IDLE
| |
| Try to acquire lock (and fail)
| |
(wakeup)
|
Check remote status (IDLE)
|
Continue to run with previous
stream positions
The problematic section is this block here, where we do writing operations before we acquired a lock. I suggest to remove this block without replacement, and move the same block in the inner loop at the top of the loop (so resets/deletes after projection start happen before we start applying events).
I have observed this issue over some time, but had a hard time reproducing it. In rare instances during deployments, projections could loose their stream positions and start from the beginning. This can happen in instances where a pcntl stop handler invokes a projection's
stop()
method before the projection has loaded the stream positions.The problematic section is in the
run()
method, before the actual main loop:Calling
stop()
from the signal handler beforeload()
loaded the stream positions, causes a call topersist(()
. The persist call saves the uninitialized stream positions (an empty array). This issue becomes more frequent / likely the longer it takes the projection to load the stream positions. Putting asleep()
call after theacquireLock()
call in the example above makes it rather easy to reproduce (we have replaced the locking mechanism with metadata locks on a 60 second timeout, hence more likely for us to run into this issue). The current workaround for us seems to be to introduce a new flag$this->streamPositionsLoaded
and have an early return inpersist()
if positions have not been loaded yet.We are running this in testing at the moment to see if this actually fixes the issue. Any other input / feedback / solutions are welcome though.