twitter / summingbird

Streaming MapReduce with Scalding and Storm
https://twitter.com/summingbird
Apache License 2.0
2.14k stars 267 forks source link

Storm Store type should take `Semigroup[V]` #677

Closed johnynek closed 7 years ago

johnynek commented 8 years ago

Right now, the producer graph has a Semigroup[V] in sumByKey, but actually, the storm platform ignores it.

Scalding does not.

https://github.com/twitter/summingbird/blob/develop/summingbird-online/src/main/scala/com/twitter/summingbird/online/MergeableStoreFactory.scala

A fix for this is to have MergeableStoreFactory really be something like Semigroup[V] => Mergeable[K, V] rather than () => Mergeable[K, V].

We can be somewhat backwards compatible by giving the old methods (but marking them as deprecated) and just ignoring the Semigroup passed in.

Without something like this #562 is pretty dangerous because online is would be really easy to have a mismatch between the store semigroup and the aggregator semigroup (this is actually already pretty dangerous and not clear how we didn't address this before now).

pankajroark commented 8 years ago

I agree, it sounds like a good idea to supply semigroup used during producer graph creation to users for creating the Mergeable. It does reduce risk of users using a different semigroup even though it doesn't eliminate it completely. With proper documentation we can reduce the chances of making a mistake.