pystorm / streamparse

Run Python in Apache Storm topologies. Pythonic API, CLI tooling, and a topology DSL.
http://streamparse.readthedocs.io/
Apache License 2.0
1.5k stars 218 forks source link

any way to expire tuples after some time #322

Closed mbande closed 7 years ago

mbande commented 7 years ago

i need tuples in streams to expire if there is no task to process them for a while is there any way to implement this in storm concepts?

Darkless012 commented 7 years ago

Can you please specify more about "expiration" and how do you process tuples?

Apache Storm (and streamparse) works like: Spout emits some new data (tuples) into topology and series of bolts will do their job of transforming the data. There is no space for "expiration".

Maybe you are talking about "failing" (NACKING) the tuple.

One way is to use for example RabbitMQ from which Spout is reading and when there is problem in topology - eg. some Bolt can't process the data, they will fail(NACK) the tuple. From here also the Spout fail the tuple and send it back to RabbitMQ (either with configured deadletter exchange or post into another exchange).

see http://streamparse.readthedocs.io/en/master/api.html#streamparse.Bolt.fail

mbande commented 7 years ago

i'm looking for a way to expire tuples when they are waiting for a bolt to process them. when incoming input tuples rate is higher than bolt processing rate, there is some sort of queue that holds tuples until a bolt take them, right? i want to expire waiting tuples after some time,

Darkless012 commented 7 years ago

I don't know if there is some expiration option based on waiting time. But I think there are 3 ways to deal with this. (Ideally combined together.)

1) There is an option "topology.message.timeout.secs" but it is the time from emitting tuple into topology to NACKing the tuple.

2) Use "topology.max.spout.pending" option. This option limits number of tuples emitted into topology. But you need to find optimal value for this option by yourself. So you don't have too much / too less tuples in topology

3) Use back-pressure mechanism. Which should limit number of tuples in topology based on some metrics (read more about this in another article - sorry for external link but the topic is too vast)

mbande commented 7 years ago

thank you @Darkless012