siddhi-io / distribution

Siddhi streaming runtime and tooling distribution
http://siddhi.io
Apache License 2.0
24 stars 26 forks source link

Events out of order during distributed deployment recovery #817

Open suhothayan opened 4 years ago

suhothayan commented 4 years ago

Description: Events are out of order during distributed deployment recovery (replaying data from NATS Streaming Server).

Better if we know the reason why, and fix this if it will not introduce performance issues.

2020-01-03 15:30:29 INFO  LoggerService:42 - {event={name=Cake, amount=380.0}}
2020-01-03 15:30:30 INFO  LoggerService:42 - {event={name=Cake, amount=400.0}}
2020-01-03 15:30:31 INFO  LoggerService:42 - {event={name=Cake, amount=420.0}}
2020-01-03 15:30:31 INFO  LoggerService:42 - {event={name=Cake, amount=440.0}}
2020-01-03 15:30:45 INFO  LoggerService:42 - {event={name=Cake, amount=460.0}}
2020-01-03 15:30:46 INFO  LoggerService:42 - {event={name=Cake, amount=480.0}}
2020-01-03 15:30:48 INFO  LoggerService:42 - {event={name=Cake, amount=500.0}}
2020-01-03 15:30:55 INFO  LoggerService:42 - {event={name=Cake, amount=380.0}}
2020-01-03 15:30:55 INFO  LoggerService:42 - {event={name=Cake, amount=400.0}}
2020-01-03 15:30:55 INFO  LoggerService:42 - {event={name=Cake, amount=440.0}}
2020-01-03 15:30:55 INFO  LoggerService:42 - {event={name=Cake, amount=480.0}}
2020-01-03 15:30:55 INFO  LoggerService:42 - {event={name=Cake, amount=420.0}}
2020-01-03 15:30:55 INFO  LoggerService:42 - {event={name=Cake, amount=500.0}}
2020-01-03 15:30:55 INFO  LoggerService:42 - {event={name=Cake, amount=460.0}}
2020-01-03 15:31:51 INFO  LoggerService:42 - {event={name=Cake, amount=520.0}}
2020-01-03 15:32:09 INFO  LoggerService:42 - {event={name=Cake, amount=540.0}}
2020-01-03 15:32:10 INFO  LoggerService:42 - {event={name=Cake, amount=560.0}}
pcnfernando commented 4 years ago

I tried to reproduce the scenario with a testcase as in https://github.com/siddhi-io/siddhi-io-nats/pull/36/. But the events are retrieved in the correct order at the Source. Furthermore, I encountered a bug related to duplicating an event during persisting and restoring. It was fixed in the above PR itself.

Tried the same by publishing through a Nats Sink instead of NatsClient. Couldn't reproduce the out-of-order scenario.

Will try this on distributed deployment and update the thread

suhothayan commented 4 years ago

Steps to reproduce.

Setup the distribution deployment with file based persistence. Have a counting query without window. Send some events and see how the count is increasing. On one terminal kill the stateful pod wile continuing to publish the messages to the non stateful pod from the other terminal. You should be able to see the numbers printed out of order.

On Mon, Jan 6, 2020 at 11:45, Chiran Fernando notifications@github.com wrote:

I tried to reproduce the scenario with a testcase as in siddhi-io/siddhi-io-nats#36 https://github.com/siddhi-io/siddhi-io-nats/pull/36. But the events are retrieved in the correct order at the Source. Furthermore, I encountered a bug related to duplicating an event during persisting and restoring. It was fixed in the above PR itself.

Tried the same by publishing through a Nats Sink instead of NatsClient. Couldn't reproduce the out-of-order scenario.

Will try this on distributed deployment and update the thread

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/siddhi-io/distribution/issues/817?email_source=notifications&email_token=AA44D6ARCRACMMVH7N7GI3DQ4LEANA5CNFSM4KCMVEMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIEQ36A#issuecomment-571018744, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA44D6DU5L7ARNRXA7WMXSTQ4LEANANCNFSM4KCMVEMA .

-- S. Suhothayan | Senior Director | WSO2 Inc. https://wso2.com/ (m) (+94) 779 756 757 | (e) suho@wso2.com | (t) @suhothayan https://twitter.com/suhothayan GET INTEGRATION AGILE Integration Agility for Digitally Driven Business