twitter / summingbird

Streaming MapReduce with Scalding and Storm
https://twitter.com/summingbird
Apache License 2.0
2.14k stars 267 forks source link

storm platform sometimes makes nodes with only merges #749

Open johnynek opened 6 years ago

johnynek commented 6 years ago

in some cases, we see storm nodes that only have merges, then it finally fans out.

Something like:

val src1 = ...
val src2 = ...
val merged = src1 ++ src2
val sink1 = merged.sumByKey(store1)
val sink2 = merged.sumByKey(store2)

Things like this have created a node that just has the merge. There is no need to do this. Storm could just subscribe to all the dependencies of the "all-merge" node.

If there is a cheap fix to this, it would be great, because we have some giant topologies that have something like 5 of these.

cc @ianoc

ianoc commented 6 years ago

From https://github.com/twitter/summingbird/blob/develop/summingbird-online/src/main/scala/com/twitter/summingbird/planner/OnlinePlan.scala#L194

this looks like its not supported i think/is the expected behavior. Thats probably the location to fix too

ianoc commented 6 years ago

(i think conceptually what you have requires creating multiple merge nodes, I don't think we generated any new nodes in planning before which is probably what this hits)

johnynek commented 6 years ago

I think we could allow merge nodes to be in multiple physical nodes.

On Fri, Sep 15, 2017 at 09:03 ianoc notifications@github.com wrote:

(i think conceptually what you have requires creating multiple merge nodes, I don't think we generated any new nodes in planning before which is probably what this hits)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/twitter/summingbird/issues/749#issuecomment-329872264, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEJdpOqBS1iHHgr6QtuwJ-cbz_d4EpWks5sisoQgaJpZM4PZVBo .

-- P. Oscar Boykin, Ph.D. | http://twitter.com/posco | http://pobox.com/~boykin

ianoc commented 6 years ago

Yep I think that's the likely fix here. Some tests/rules will need special casing but it does seem like logically what we want to do. (This applies to flat maps too I suspect when they are interacting with merge nodes. But I think the tests will shake some of this out)

On Sat, Sep 16, 2017 at 10:42 AM P. Oscar Boykin notifications@github.com wrote:

I think we could allow merge nodes to be in multiple physical nodes.

On Fri, Sep 15, 2017 at 09:03 ianoc notifications@github.com wrote:

(i think conceptually what you have requires creating multiple merge nodes, I don't think we generated any new nodes in planning before which is probably what this hits)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/twitter/summingbird/issues/749#issuecomment-329872264>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AAEJdpOqBS1iHHgr6QtuwJ-cbz_d4EpWks5sisoQgaJpZM4PZVBo

.

-- P. Oscar Boykin, Ph.D. | http://twitter.com/posco | http://pobox.com/~boykin

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/twitter/summingbird/issues/749#issuecomment-329983962, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbQvGdUIot3352KkFnIqPM3kOG2wPjeks5sjAhagaJpZM4PZVBo .

-- Sent from Gmail Mobile