twitter / summingbird

Streaming MapReduce with Scalding and Storm
https://twitter.com/summingbird
Apache License 2.0
2.14k stars 267 forks source link

avoid name computation unless there is a cache miss #752

Closed oscar-stripe closed 6 years ago

oscar-stripe commented 6 years ago

This is eating a ton of time on our big graphs: we always recompute names even if we hit the cache. For graphs that fork and join a bunch, this can make planning in summingbird exponentially complex.

By moving it, we only compute names when needed and only on a cache miss.

oscar-stripe commented 6 years ago

cc @ttim @ianoc

oscar-stripe commented 6 years ago

cc @non @erik-stripe

codecov-io commented 6 years ago

Codecov Report

Merging #752 into develop will increase coverage by 0.02%. The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #752      +/-   ##
===========================================
+ Coverage    72.23%   72.26%   +0.02%     
===========================================
  Files          154      154              
  Lines         3742     3742              
  Branches       209      209              
===========================================
+ Hits          2703     2704       +1     
+ Misses        1039     1038       -1
Impacted Files Coverage Δ
...witter/summingbird/scalding/ScaldingPlatform.scala 75.59% <100%> (ø) :arrow_up:
.../main/scala/com/twitter/summingbird/Producer.scala 77.27% <0%> (+1.51%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7c6805c...bbb7517. Read the comment docs.