twitter / summingbird

Streaming MapReduce with Scalding and Storm
https://twitter.com/summingbird
Apache License 2.0
2.14k stars 267 forks source link

Increase default value to SummerBatchMultiplier #691

Closed pankajroark closed 7 years ago

pankajroark commented 7 years ago

Value of 1 means that we create the same number of keys as the number of downstream bolts. In such a scenario collisions are very common and some summer bolts end up getting double, triple the data and others don't get any data at all. Increasing this value means that the number of events sent downstream is more and thus reduces batching but practically we haven't seen that make much difference.

Practically almost every user runs into this skew, discovers this setting and most of the time they set it to a very high value like 10k. I think 100 is a much better default.

johnynek commented 7 years ago

This seems fine to me.

Looks like our flakey tests are biting you.