twitter / summingbird

Streaming MapReduce with Scalding and Storm
https://twitter.com/summingbird
Apache License 2.0
2.14k stars 267 forks source link

optionMap not being fused with flatMap in storm #743

Open johnynek opened 6 years ago

johnynek commented 6 years ago

We have a very complex graph that is not getting optionMap and flatMap in the same node on storm.

See this graph https://www.dropbox.com/s/ld60911fot1f177/optimized_5.png?dl=0

notice the optionMap nodes being left by themselves.

Any ideas what might be going wrong? This is 0.10.0-RC1

cc @ianoc @ttim @non

ianoc commented 6 years ago

I haven't looked at any of this code in 2years or more, so this could be nonsense... but..

https://github.com/twitter/summingbird/blob/develop/summingbird-online/src/main/scala/com/twitter/summingbird/planner/OnlinePlan.scala#L180

I think looks like it could cause what your seeing. It looks like the check splits a flatMap from anything above it, if everything above it can be merged with the source. Now that doesn't consider I don't think from a quick read if any of those parents have a fork involved and so won't be merged with the source.

johnynek commented 6 years ago

yeah, good catch @ianoc. I think you are right. These optionMaps could have in principle been merged up, except that the thing above them has fanOut, so it can't.

I think that does explain it and hints at an easy repro in a test.