msemys / esjc

EventStore Java Client
MIT License
108 stars 27 forks source link

Odd behaviour of CatchupSubscription #26

Closed Yspadadden closed 7 years ago

Yspadadden commented 7 years ago

The CatchupSubscription uses a queue for processing live events it receives via a VolatileSubscription internally. If this queue becomes full the received event is dropped and a close subscription "event" is enqueued in the live queue:

https://github.com/msemys/esjc/blob/master/src/main/java/com/github/msemys/esjc/CatchUpSubscription.java#L174

Looking at the code I think the intention is, that this event is the last one in the queue and no further events will be added to the live queue or sent to the client via the subscription listener. However, this is not the case. It is possible (although rare) that another event will be received and added to the queue after the drop subscription "event". Also line 180 ensures, that the processing of the queue continues, even when the drop "event" should have stopped this processing.

The observed output looks something like this (given events 0 - 5): 0, 1, 2, 3, closed, 5.

There is nothing in the javadocs for the expected behaviour of onClose(...) on a SubscriptionListener, so it is kind of open, if this is a bug or not. It is certainly unintuitive. Is it a bug?

msemys commented 7 years ago

I'll look into it next week, I'm traveling now. But I think that it is normal behaviour, because onClose, you have subscription reference, and you could get lastProcessedEventNumber() or lastProcessedPosition() and resubscribe from that number/position

On May 24, 2017 18:40, "Mario Eberhard" notifications@github.com wrote:

The CatchupSubscription uses a queue for processing live events it receives via a VolatileSubscription internally. If this queue becomes full the received event is dropped and a close subscription "event" is enqueued in the live queue:

https://github.com/msemys/esjc/blob/master/src/main/ java/com/github/msemys/esjc/CatchUpSubscription.java#L174

Looking at the code I think the intention is, that this event is the last one in the queue and no further events will be added to the live queue or sent to the client via the subscription listener. However, this is not the case. It is possible (although rare) that another event will be received and added to the queue after the drop subscription "event". Also line 180 ensures, that the processing of the queue continues, even when the drop "event" should have stopped this processing.

The observed output looks something like this (given events 0 - 5): 0, 1, 2, 3, closed, 5.

There is nothing in the javadocs for the expected behaviour of onClose(...) on a SubscriptionListener, so it is kind of open, if this is a bug or not. It is certainly unintuitive. Is it a bug?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/msemys/esjc/issues/26, or mute the thread https://github.com/notifications/unsubscribe-auth/AG9f_lv9j_T3WMHX8mvWjecsBgvZIuKDks5r9F1igaJpZM4NlX1K .

Yspadadden commented 7 years ago

Happy traveling.

I looked at lastProcessedPosition and I think it would be wrong as well, depending on when you get it from the subscription. During the onClose call it would probably be correct. Later, after the extraneous events get processed it would still be wrong. In the example above the lastProcessedPosition would become 5. What bothers me most about this is that 4 is never processed at all.

Anyway, we fixed it in our code by ignoring onEvent calls that come after onClose. So if you say this is normal behaviour we can live with that. But I would still like either a fix or a clarification in the javadoc about this.

msemys commented 7 years ago

If subscription still calls CatchUpSubscriptionListener.onEvent() after CatchUpSubscriptionListener.onClose(), that's a bug.

It's really strange, because when DROP_SUBSCRIPTION_EVENT is processed, the internal (volatile) subscription is closed https://github.com/msemys/esjc/blob/master/src/main/java/com/github/msemys/esjc/CatchUpSubscription.java#L248, and no more events should appear https://github.com/msemys/esjc/blob/master/src/main/java/com/github/msemys/esjc/subscription/AbstractSubscriptionOperation.java#L247.

Do you have a code to reproduce this issue?

Yspadadden commented 7 years ago

Currently I don't have a code sample. The problem occurs in our code which has too many dependencies to easily create a code example.

Anyway, I think the following problem happens:

I think the last line might be the underlying cause of the bug. Why is it there in the first place? Processing should already have been started on line https://github.com/msemys/esjc/blob/master/src/main/java/com/github/msemys/esjc/CatchUpSubscription.java#L215 So line 180 seems redundant to me. Maybe there is a border case I cannot see right now which requires this line?

Tell me, if you can agree with this reasoning.

msemys commented 7 years ago

L180 is ok, because after enqueue we have to initiate queue processing (if it is not started or it was ended).

I think that https://github.com/msemys/esjc/blob/master/src/main/java/com/github/msemys/esjc/CatchUpSubscription.java#L49 should be marked as volatile, because without it, different threads could see different state of subscription.

If you are able to reproduce it on your side, could you try it?

I'll try to reproduce it on weekend

msemys commented 7 years ago

while esjc uses the same logic as .net client, it is very possible that bugs in .net client exists in esjc as well. I think we have the same issue https://github.com/EventStore/EventStore/issues/929

Yspadadden commented 7 years ago

Marking the subscription reference as volatile wont help, because only the reference and not the fields inside the instance are then volatile. See also here.

I've managed to create a test project with only your library as reference that usually reproduces the error. The test will try to reproduce the error 500 times. In my tests in 3 out of 4 attempts the error occurred somewhere in the first 100 tries.

esjc-tests.zip

Wouldn't the easiest way to solve this be a volatile boolean that is set to true, once the subscriptioni closed event is processed? The boolean would then be used as guard for the event processing. Kind of like the example listener in the uploaded test.

msemys commented 7 years ago

Thanks for sharing the reproducible code!

I'll do some more testing on recent fix, and if everything is ok, I'll release v1.8.1