twitter / summingbird

Streaming MapReduce with Scalding and Storm
https://twitter.com/summingbird
Apache License 2.0
2.14k stars 267 forks source link

Storm supports multiple output streams, we could use this #678

Open johnynek opened 8 years ago

johnynek commented 8 years ago

Consider the case of:

val s1 = src.map(f1)
val s2 = src.map(f2)

s1.sumByKey(store1)
  .also(s2.sumByKey(store2))

In this case, we will serialize the data from src to a node to run f1 and another to run f2. In fact, storm (and I think heron) support multiple output streams from a node. So we could have a single node run both f1 and f2 and put them on different named outputs. Then the stores subscribe to only one of the two outputs.

This could reduce a storm/heron node in the graph and remove some serializations.

/cc @pankajroark @npraneeth

pankajroark commented 8 years ago

Yes, this will be good to optimize as well. There are a bunch more places where we can eliminate nodes, good to keep track of them via issues. I'll create more.