twitter / summingbird

Streaming MapReduce with Scalding and Storm
https://twitter.com/summingbird
Apache License 2.0
2.14k stars 267 forks source link

Addition of aggregation(summers) capability in the spout. #679

Closed NPraneeth closed 8 years ago

NPraneeth commented 8 years ago

When FMMergeableWithSource is enabled, we try minimizing the FlatMapNodes. This may lead to increase in the emit count of spout depending on the tuple fanOut from flatMap operation in the spout. Also, there is no way to have map-side aggregation in some topologies.


We can eliminate these problems by adding aggregation to the spout. This aggregation would add some map-side aggregation ( localized to a spout ) in the absence of FlatMapNode, also we can decrease the emit Count by aggregating it before the emit.