Open avibryant opened 10 years ago
Something like this?
trait Accumulator[T] {
def apply(t: T): Unit
def get: T
}
trait Semigroup[T] { sg =>
...
def accumulator(initial: T): Accumulator[T] = new Accumulator[T] {
var item = initial
def apply(t: T) {
item = sg.plus(item, t)
}
def get = item
}
def sumOption(s: Seq[T]): Option[T] = s match {
case head :: tail => {
val acc = accumulator(head)
tail foreach acc
Some(acc.get)
}
case _ => None
}
}
I think we might want ... trait Accumulator[T] { def apply(t: T) : Accumulator[T] def get: T }
to account for the case your just using another data structure and don't need to have it all immutable. returning the mutable this if its mutable is fine then too.
@ianoc: but you could always just use a mutable binding inside the Accumulator for that, like in @jnievelt's default example. I think I like the explicitly mutable API on the trait.
Looked over this, and there are a few related concepts in this space, IMO:
1 above is done concretely in HLL but is abstracted by StatefulSummer. 2 and 3 are done in SketchMap but are also abstract in ArrayBufferedOperation.
I'm thinking it might make sense to a) put HLL#updateInto more into a StatefulSummer context, b) let Semigroup own the summing of buffered elements, via a #bufferedSumOption (default to current #sumOption implementation), c) let Semigroup own defining a specialized #statefulSummer (default to SumAll()(this)), and d) implement Semigroup#sumOption in terms of #statefulSummer.
But I'm a little worried that this over-complicates Semigroup. What do you guys think?
Seems like this is a duplicate of #515 .
HyperLogLog has an updateInto method which updates a mutable buffer representing its current state, instead of doing immutable operations. This allows for an efficient implementation of sumOption. It would be helpful to standardize this so that we could use it from eg SummingCache. see #290
https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/HyperLogLog.scala#L241