Formalization of approximate interface with an exact backing

jnievelt commented 9 years ago

This idea seems to crop up in a few places. Essentially, exact structures can be taken as a special case of our sketch structures that take up less space.

In theory, Monoids could take advantage of this, and maybe we could simplify some logic by having implicit conversions? And perhaps we could simplify some EventuallyMonoids.

Some examples would be:

String or Set[String] as exact BloomFilter
K or Set[K] as exact HLL
K, (K, Long), or TraversableOnce[(K, Long)] as exact CMS
(K, V) or TraversableOnce[(K, V)] as exact SketchMap

johnynek commented 9 years ago

I'd say CMS[K] ~ Map[K, Long] and SketchMap[K, V] ~ Map[K, V]. But how would we use this?

These approximations are generally for datastructures and methods/functions. How do we codify that?

jnievelt commented 9 years ago

Yeah it's not completely clear to me, either.

But it occurs to me that all of these Monoids have/need a 'create' method that does the conversion involved here (more or less). Perhaps we could have types like:

trait Exact[A, E] extends A {
  def exactData: E
}
trait ApproximationMonoid[A, E] extends Monoid[A] {
  def approximate(exactData: E): A
}
trait ExactMonoid[A, E] extends Monoid[Exact[A, E]]

Though this is already problematic due to 'extends A', because these things aren't really traits (SketchMap is a case class, even).

johnynek commented 9 years ago

A minor tweak to your suggestion:

// A approximates E
  // ideally there would be some metric here, like maybe:
  // if we have |approx(f(e1, e2)) - f'(approx(e1), approx(e2))| < eps
  // for some approximating transformation f -> f' and some Metric[A].
trait Approximates[A, E, M[_]] {
  def approximate(e: E): A
  // get the approximate typeclass from the exact one.
  def transform(m: M[E]): M[A]
  def metric: Metric[A]
  def error: Double
}

sealed trait MaybeApprox[+A, +E] extends Any
case class Exact[E](exact: E) extends MaybeApprox[Nothing, A] with AnyVal
case class Approx[A](approx: A) extends MaybeApprox[A, Nothing] with AnyVal

// This is basically a renamed version of the EventuallyMonoid to be more what we are using it for
// with the addition of a new Typeclass: Approximates[A, E]
case class ApproximateMonoid[A, E](implicit exact: Monoid[E], approx: Approx[A, E]) extends Monoid[MaybeApprox[A, E]] ...
  // we can get Monoid[A] from approx and exact.

jnievelt commented 9 years ago

I think the signature of the Monoid would be a bit longer. Something like:

class EventuallyApproxMonoid[A, E](
  makeApprox: E => A,
  p: E => Boolean
)(
  implicit
  exact: Monoid[E],
  approx: Monoid[A]
) extends Monoid[MaybeApprox[A, E]]

sid-kap commented 9 years ago

Something like this could be a great way to clean up our test code. I'm thinking about making something like

trait ApproximateProperty {
  type Params
  type Exact
  type Approximate
  type ExactResult
  type ApproximateResult

  def makeApproximate(p: Params, e: Exact): Approximate
  def exactResult(e: Exact): ExactResult
  def approximateResult(a: Approximate): ApproximateResult
  def claim(e: ExactResult, a: ApproximateResult): Boolean
  def probability(p: Params): Double
}

Example implementation:

CmsProperty extends ApproximateProperty {
  type Params = CMSParams
  type Exact = List[T]
  type Approximate = CountMinSketch[T]
  type ExactResult = Int
  type ApproximateResult = Int

  def makeApproximate(p: CMSParams, exact: List[T]) = {
    val cmsMonoid = CountMinSketchMonoid(params)
    cmsMonoid.sum(exact.map(cmsMonoid.create(_)))
  }
  def exactResult(list: List[T]) = list.getFrequency(x)
  def approximateResult(cms: CountMinSketch[T]) = cms.frequency(x)
  def claim(e: Int, a: Int) = e == a

  // probability that `claim` is true, using eps and delta in the params
  def probability(p: CMSParams) = ??? 
}

This example doesn't check out because I don't know where to get the x in exactResult and approximateResult. Maybe there should be a def getArbitraryTestInput which returns a suitable x which we can feed into exactResult and approximateResult?

Does this seem like a good idea? My goal is to make something similar to scalacheck's Properties but for approximate properties. This could be useful in bounding the probabilities of test failures and such for each test.

sritchie commented 8 years ago

Do we all mind if I close this and merge the discussion into #44 ?

twitter / algebird

Formalization of approximate interface with an exact backing #465