twitter / summingbird

Streaming MapReduce with Scalding and Storm
https://twitter.com/summingbird
Apache License 2.0
2.14k stars 267 forks source link

Add performance tests for flatMap operation in OperationContainer #740

Open ttim opened 7 years ago

ttim commented 7 years ago

This PR adds tests for flatMap / flatMap composition in context of OperationContainer. Tests are implemented using JMH.

Tests are not ideal, but they give us some estimation of costs our abstractions introduce.
Most interesting overheads: IntermediateFlatMap with FlatMapOperation introduces about 1s per 1KK operation overhead. FlatMapOperation composition adds another ~1-2s per 1KK.

codecov-io commented 7 years ago

Codecov Report

Merging #740 into develop will decrease coverage by 2.1%. The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           develop    #740      +/-   ##
==========================================
- Coverage    72.21%   70.1%   -2.11%     
==========================================
  Files          153     158       +5     
  Lines         3736    3824      +88     
  Branches       204     205       +1     
==========================================
- Hits          2698    2681      -17     
- Misses        1038    1143     +105
Impacted Files Coverage Δ
...d/online/executor/ComposedFlatMapPerformance.scala 0% <0%> (ø)
...er/summingbird/online/executor/SimpleFlatMap.scala 0% <0%> (ø)
.../executor/ComposedFlatMapWithTimePerformance.scala 0% <0%> (ø)
...witter/summingbird/online/executor/TestUtils.scala 0% <0%> (ø)
...d/online/executor/IdentityFlatMapPerformance.scala 0% <0%> (ø)
...ain/scala/com/twitter/summingbird/TestGraphs.scala 77.7% <0%> (-10.83%) :arrow_down:
...ala/com/twitter/summingbird/scalding/Service.scala 73.84% <0%> (-4.62%) :arrow_down:
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update fecdfa3...f3e4845. Read the comment docs.

johnynek commented 7 years ago

I'm fine with merging this, but should we add a test that the SimpleFlatMap is identical in performance to the IntermediateFlatMap? Someone might use it as is, and currently it is untested.

Is this a candidate for running in our actual planner, or just a limit to shoot for?

ttim commented 7 years ago

@johnynek According to tests SimpleFlatMap is much more performant than IntermediateFlatMap =) Anyway I moved it to perf module for now (and made it private). Let's see later do we want to use it instead of IntermediateFlatMap in special cases.

johnynek commented 7 years ago

Sorry, I meant identical in function, I realize it is much faster.

ttim commented 7 years ago

@johnynek yeah =) issue is - it's not identical because our FlatMapOperation able to handle leftJoin in a same way as flatMaps, so it covers only simple cases. I guess it's safe to merge, because it's private from external users for now.

johnynek commented 7 years ago

@ttim we are starting to really push on summingbird performance.

Any more work here? Would it make sense to have a video sync sometime soon as we are currently investing in summingbird at Stripe (but also considering if that makes sense or to shift investment somewhat).

If we can share effort, that is a big win. If we each are just going to work privately on our forks maybe each should make their own call.

Thoughts @pankajroark ?

pankajroark commented 7 years ago

@johnynek Great to hear about Summingbird performance push. Happy to set up a video sync. As you've seen we've made some optimizations and we wanted to make more. Unfortunately we have to focus resources on other projects for the time being, but we plan to come back and make more optimizations. It will be great to learn about what you've planned. Let me sync up with you offline to set up a time for video sync.

johnynek commented 7 years ago

@pankajroark sounds good. I replied offline about a meeting time.

CLAassistant commented 4 years ago

CLA assistant check
All committers have signed the CLA.