twitter / summingbird

Streaming MapReduce with Scalding and Storm
https://twitter.com/summingbird
Apache License 2.0
2.14k stars 267 forks source link

Use dagon for dag rewrites #758

Closed johnynek closed 6 years ago

johnynek commented 6 years ago

Scalding 0.18 is using dagon to implement its new backend. We might as well use dagon here too, which was taken out from summingbird about 6 months or so ago.

We have found a number of performance improvements in the code since then, so it should be a win.

johnynek commented 6 years ago

@ttim @erik-stripe @non ptal

johnynek commented 6 years ago

note we ran these rules in production on storm and significantly reduced the number of storm nodes we created.

I think you should consider using these to optimize your storm graphs as well.

non commented 6 years ago

Yeah I can second this recommendation. 👍

johnynek commented 6 years ago

These tests are in a very bad place.

I have been clicking rebuild to deal with storm or hadoop timeouts for about 24 hours now. Each trial is another 20-60 minutes. I am almost ready to merge without the tests...

We need to really think about improving the platform tests.

codecov-io commented 6 years ago

Codecov Report

Merging #758 into develop will decrease coverage by 1.07%. The diff coverage is 62.82%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #758      +/-   ##
===========================================
- Coverage     72.5%   71.43%   -1.08%     
===========================================
  Files          154      151       -3     
  Lines         3783     3606     -177     
  Branches       211      209       -2     
===========================================
- Hits          2743     2576     -167     
+ Misses        1040     1030      -10
Impacted Files Coverage Δ
.../scala/com/twitter/summingbird/memory/Memory.scala 94.73% <ø> (ø) :arrow_up:
.../twitter/summingbird/memory/ConcurrentMemory.scala 95.77% <100%> (+0.06%) :arrow_up:
...com/twitter/summingbird/planner/DagOptimizer.scala 60% <62.33%> (-14.02%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 57312f7...f78f056. Read the comment docs.