Open johnynek opened 7 years ago
@ttim can you take a look?
Merging #745 into develop will increase coverage by
0.1%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## develop #745 +/- ##
==========================================
+ Coverage 72.23% 72.34% +0.1%
==========================================
Files 154 154
Lines 3742 3742
Branches 209 209
==========================================
+ Hits 2703 2707 +4
+ Misses 1039 1035 -4
Impacted Files | Coverage Δ | |
---|---|---|
...witter/summingbird/scalding/ScaldingPlatform.scala | 76.19% <0%> (+0.59%) |
:arrow_up: |
.../main/scala/com/twitter/summingbird/Producer.scala | 77.27% <0%> (+1.51%) |
:arrow_up: |
...a/com/twitter/summingbird/scalding/LoopState.scala | 75% <0%> (+25%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 0f06d22...39fa04c. Read the comment docs.
okay, this now fails for me (both the fanOut and the idempotency test).
cc @non
actually, I can't get idempotency to fail now... maybe it is just fanOut being broken being the issue, which some of the rules use...
Okay, I don't see a bug actually. It was an issue with the test actually, and an inconsistency between how fanOut was defined in Dependants and ExpressionDag.
This test actually seems to pass for me now. I'll try to find any counter example next week, but so far, I guess everything looks correct still.
okay, idempotency failure:
arg0 = Summer(IdentityKeyedProducer(NamedProducer(IdentityKeyedProducer(MergedProducer(IdentityKeyedProducer(FlatMappedProducer(LeftJoinedProducer(IdentityKeyedProducer(NamedProducer(IdentityKeyedProducer(NamedProducer(IdentityKeyedProducer(OptionMappedProducer(IdentityKeyedProducer(FlatMappedProducer(Source(List(-483916215, 1947511647, -1, 2147483647, -1094725117, -1107074050, 1316607071, -894421428, 438074163, -1618870436, 2147483647, -927761159, 1000599281, 1, 2147483647, 1, -941762984, 1, -2147483648, 1134838592, -2147483648, -1360150793, -1774434453, -1968350011, -1999846660, 1, 940697649, -561677987, 1326923701, 734987219, 0, -1731750202, 782845389, 0, 158805678, -1, 1, -560692419, -1, 879491512, 1, 685151277, 333970011, -2147483648, 961509966, -2147483648, -1, 870342926, 2051572436, 2147483647, -178556721, 0, -1028446598, -2147483648, 1, 2147483647, 1, -1915898801, 1834110173, 519019579, 2147483647, 0, 1514822104, -1, -877887704, 2147483647, -224941626, -675798454, 1, 2092027473, -487281773, 638669148, -2147483648, 1, 2147483647, -2147483648, 1, -567098335, -1, 795549827, -1995849024, -577239589, 1867920629, 1, 0, -1839762855, 0, 1650235957, -385664255, 1676297418, 2147483647, -1701772912, -1, -1, 2147483647, 330749634, 1, 2147483647, -2147483648, 1462163691)),org.scalacheck.GenArities$$Lambda$3183/1815517983@3d533ae3)),org.scalacheck.GenArities$$Lambda$3183/1815517983@368308bf)),tjiposzOlkplcu)),tvpwpdyScehGnwcaVjjWvlfuwxatxhdjhozscucpbq)),Map(1122506458 -> -422595330, 985940285 -> 1773535903, -1012102957 -> 577191710, 1865284784 -> 0, 1133011483 -> -1, -1571538327 -> -106754684, -883421086 -> -1843831487, -1186741108 -> -1297188435, 511615026 -> 892838107, 2147483647 -> 0, -1 -> 0, 654842167 -> 2147483647, 766280486 -> 0, 46050994 -> -2147483648, -146388742 -> -1080181663, 1883141387 -> 2147483647, -1170327451 -> 41927314, -1265823639 -> 2038185509, -1531812404 -> -1, -1426771473 -> 0, 107420900 -> 1, -778321474 -> -613855929, 730121132 -> -1824061783, -193486826 -> 0, -1603389086 -> -1, -1663776702 -> 2025651619, 1791509260 -> -1, -249195648 -> 0, 1795996912 -> 470882984, -1635961338 -> 995590274, -1315404816 -> 1, 303482491 -> 1, -1120629917 -> -913091299, 1135859794 -> 0, 1 -> 431843073, -1055837163 -> -428420920, 1920175445 -> -2147483648, -248886982 -> -2003982440, 1518754224 -> -1, 796959542 -> 1, -1658050660 -> -1870329784, 1129081426 -> -1878450200, 992361458 -> -1, -315984592 -> 1918575397, -1137239383 -> 1, -102255667 -> -1, -734906869 -> -611180863, -473252927 -> 2147483647, 886258153 -> 1, -916284450 -> 1, 1559234564 -> -1740554658, -2147483648 -> -1363669838, 0 -> -1)),org.scalacheck.GenArities$$Lambda$3183/1815517983@2bb51e67)),IdentityKeyedProducer(OptionMappedProducer(IdentityKeyedProducer(FlatMappedProducer(Source(List(-483916215, 1947511647, -1, 2147483647, -1094725117, -1107074050, 1316607071, -894421428, 438074163, -1618870436, 2147483647, -927761159, 1000599281, 1, 2147483647, 1, -941762984, 1, -2147483648, 1134838592, -2147483648, -1360150793, -1774434453, -1968350011, -1999846660, 1, 940697649, -561677987, 1326923701, 734987219, 0, -1731750202, 782845389, 0, 158805678, -1, 1, -560692419, -1, 879491512, 1, 685151277, 333970011, -2147483648, 961509966, -2147483648, -1, 870342926, 2051572436, 2147483647, -178556721, 0, -1028446598, -2147483648, 1, 2147483647, 1, -1915898801, 1834110173, 519019579, 2147483647, 0, 1514822104, -1, -877887704, 2147483647, -224941626, -675798454, 1, 2092027473, -487281773, 638669148, -2147483648, 1, 2147483647, -2147483648, 1, -567098335, -1, 795549827, -1995849024, -577239589, 1867920629, 1, 0, -1839762855, 0, 1650235957, -385664255, 1676297418, 2147483647, -1701772912, -1, -1, 2147483647, 330749634, 1, 2147483647, -2147483648, 1462163691)),org.scalacheck.GenArities$$Lambda$3183/1815517983@3d533ae3)),org.scalacheck.GenArities$$Lambda$3183/1815517983@368308bf)))),ncn)),Map(),com.twitter.algebird.IntRing$@4deb204c),
arg1 = com.twitter.summingbird.planner.DagOptimizer$RemoveNames$@7d710ac9.orElse(com.twitter.summingbird.planner.DagOptimizer$RemoveIdentityKeyed$@2a6e2829).orElse(com.twitter.summingbird.planner.DagOptimizer$FlatMapFusion$@728d18e6).orElse(com.twitter.summingbird.planner.DagOptimizer$OptionMapFusion$@5dba444a).orElse(com.twitter.summingbird.planner.DagOptimizer$OptionToFlatMap$@28e82a76).orElse(com.twitter.summingbird.planner.DagOptimizer$KeyFlatMapToFlatMap$@731c6022).orElse(com.twitter.summingbird.planner.DagOptimizer$FlatMapKeyFusion$@69cc1c8f).orElse(com.twitter.summingbird.planner.DagOptimizer$ValueFlatMapToFlatMap$@66db9de8).orElse(com.twitter.summingbird.planner.DagOptimizer$FlatMapValuesFusion$@f09a0df).orElse(com.twitter.summingbird.planner.DagOptimizer$FlatThenOptionFusion$@1f04dc6c).orElse(com.twitter.summingbird.planner.DagOptimizer$DiamondToFlatMap$@16258367).orElse(com.twitter.summingbird.planner.DagOptimizer$MergePullUp$@2c2b7896).orElse(com.twitter.summingbird.planner.DagOptimizer$AlsoPullUp$@6ee63a94)
)
I'll try to reproduce that failure and fix.
This seems to fix the issue.
The bug was with fanOut. It was working in the Expr[_, _]
and Id[_]
space, which is not meaningful to the rules, which operate on the original N[_]
space. The problem is that rewrites can give the same node two different Ids. When that happens, it looks like the fanOut is greater than it is, and this can prevent rules from applying.
If you jump back to the N[_]
space, and then back into the optimizer, you reset the Ids and you may have a chance that the rule would apply then. This seems to be the problem.
Along the way I cleaned things up slightly.
Can you take a look @ttim
had a storm flake. Restarted a 2.12 build.
I'd like to cherry pick this when we merge and publish a version of 0.10.1
which includes this fix. We are using the optimizer and it would be helpful to us.
Also, I'm considering breaking this code out into a standalone library. I copied it into scalding, but it is inconvenient to have to publish these large projects to update this totally generic DAG rewriting tool.
We are using the DagOptimizer at stripe before planning to reduce the size of some online graphs (went from 115 storm bolts or so to 69 in one example).
However, even in the case where we reach 69, there are rules that don't seem to be fully applied and I have not yet found out why.
Anyway, more test coverage never hurts.