haoch / flink-siddhi

A CEP library to run Siddhi within Apache Flink™ Streaming Application (Not maintained)
Apache License 2.0
243 stars 96 forks source link

Dynamic Logical Partitioning on event and control stream #50

Open pranjal0811 opened 5 years ago

pranjal0811 commented 5 years ago

Hi @haoch

How the operator state behaves in parallelism?

As I can see in the class org.apache.flink.streaming.siddhi.operator.AbstractSiddhiOperator we are using the managedOperatorState.

Is there a way to partition the rules and events based on some key and use the keyedManagedState? Just curious about the parallelism and performance. Do we have any performance measures?

For example: I have the siddhi query which expects 3 Failure events followed by 1 successful event like this - from every e1=firewallStream[name == 'FAILURE']<3> -> e2=firewallStream [ name == 'SUCCESS'] within 1 min select 'AAAAAAAA-AAAA-AAAA-BBBB-AAAAAAAAAAAA' as ruleId, e1.externalId as eventIds insert into outputStream

If I set the Siddhi-CEP operator parallelism to 1. The above query generates the correct result as all the events go to one operator instance. But If I increase the parallelism to the operator more than 1. then the events are distributed to more instance of the operator. The query is failed to generate the result as the state is not shared between the parallel instance of the operator.

SMART2016 commented 3 years ago

Have you tried using siddhi partitions in anyway... because siddhi has the concept of partitions and may be that can be applied here. https://siddhi.io/en/v5.1/docs/query-guide/#partition

Or Flink based data partitioning which diverts data based on the partitioning scheme to respective operator task if parallelism is enable for an operator.