salesforce / storm-dynamic-spout

A framework for building spouts for Apache Storm and a Kafka based spout for dynamically skipping messages to be processed later.
BSD 3-Clause "New" or "Revised" License
41 stars 13 forks source link

First pass at a permanently failed output stream from the spout. #90

Closed Crim closed 6 years ago

Crim commented 6 years ago

Use Case We'd like the ability to handle tuples that are considered "permanently failed". We're defining "permanently failed" meaning that the topology attempted to process the tuple at least once and the RetryManager implementation has determined that the tuple should never be retried. When this occurs, the tuple will be emitted un-anchored out a "failed" stream. Bolts within the topology can subscribe to this "failed" stream and do its own error handling.

It should be noted (again) that tuples emitted on the "failed" stream are un-anchored and should be considered unreliable. IE. The spout will not track it or be notified of failed or acked tuples emitted down this "failed" stream.

Notes

One idea was to move the RetryFailedMessageManager out of the VirtualSpouts and only have a single instance living within DynamicSpout (instead of one per VSpout). Doing so would simplify the logic around being able to easily identify and emit out permanently failed tuples. The trade off is this alters how we enforce fairness across VirtualSpouts. If we continue to give priority to failed tuples being replayed over new tuples means that a single VSpout emitting tons of tuples that end up failing could overwhelm the spout's output.

Instead I added an additional flag on the Message object that allows you to mark it as permanently failed, and DynamicSpout uses that flag to determine which stream to emit down. Not entirely the most elegant solution, so open to other ideas.

Thoughts?

Todo Update changelog & readme