twitter / summingbird

Streaming MapReduce with Scalding and Storm
https://twitter.com/summingbird
Apache License 2.0
2.14k stars 267 forks source link

Summingbird doesn't honor ValueCombinerCacheSize setting. #701

Closed pankajroark closed 7 years ago

pankajroark commented 7 years ago

This setting is not connected to anything https://github.com/twitter/summingbird/blob/8a821295d61801fb313c663a3fd8a5488c6b08fd/summingbird-storm/src/main/scala/com/twitter/summingbird/storm/BuildSummer.scala#L103. One can neither enable compactions for Async cache nor specify compaction size. Compaction can be really useful for many jobs where keeping around data for say 30 seconds can result in too much memory usage, whereas reducing this period can increase qps to KVS. Large flush period with early compactions is perfect for many use cases.

It's actually possible to specify a custom summer but adding counters becomes problematic. Outside of the job node names are not available so different counters will need to be defined. These don't show up in the automatically generated viz charts and are thus a hassle. It's best to connect the ValueCombinerCacheSize setting as was meant to be.