twitter / algebird

Abstract Algebra for Scala
https://twitter.github.io/algebird
Apache License 2.0
2.29k stars 345 forks source link

implement standard pattern for updateInto #291

Open avibryant opened 10 years ago

avibryant commented 10 years ago

HyperLogLog has an updateInto method which updates a mutable buffer representing its current state, instead of doing immutable operations. This allows for an efficient implementation of sumOption. It would be helpful to standardize this so that we could use it from eg SummingCache. see #290

https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/HyperLogLog.scala#L241

jnievelt commented 10 years ago

Something like this?

trait Accumulator[T] {
  def apply(t: T): Unit
  def get: T
}

trait Semigroup[T] { sg =>
  ...
  def accumulator(initial: T): Accumulator[T] = new Accumulator[T] {
    var item = initial
    def apply(t: T) {
      item = sg.plus(item, t)
    }
    def get = item
  }
  def sumOption(s: Seq[T]): Option[T] = s match {
    case head :: tail => {
      val acc = accumulator(head)
      tail foreach acc
      Some(acc.get)
    }
    case _ => None
  }
}
ianoc commented 10 years ago

I think we might want ... trait Accumulator[T] { def apply(t: T) : Accumulator[T] def get: T }

to account for the case your just using another data structure and don't need to have it all immutable. returning the mutable this if its mutable is fine then too.

avibryant commented 10 years ago

@ianoc: but you could always just use a mutable binding inside the Accumulator for that, like in @jnievelt's default example. I think I like the explicitly mutable API on the trait.

jnievelt commented 10 years ago

Looked over this, and there are a few related concepts in this space, IMO:

  1. Alternate mode of storage for optimizing individual addition (HLL -> Array, and then Array + HLL is faster)
  2. Alternate way of adding a buffered set of elements by #sumOption on sub-elements
  3. Means of buffering a stream of elements to leverage 2 above

1 above is done concretely in HLL but is abstracted by StatefulSummer. 2 and 3 are done in SketchMap but are also abstract in ArrayBufferedOperation.

I'm thinking it might make sense to a) put HLL#updateInto more into a StatefulSummer context, b) let Semigroup own the summing of buffered elements, via a #bufferedSumOption (default to current #sumOption implementation), c) let Semigroup own defining a specialized #statefulSummer (default to SumAll()(this)), and d) implement Semigroup#sumOption in terms of #statefulSummer.

But I'm a little worried that this over-complicates Semigroup. What do you guys think?

sritchie commented 8 years ago

Seems like this is a duplicate of #515 .