Open jnievelt opened 9 years ago
I'd say CMS[K] ~ Map[K, Long]
and SketchMap[K, V] ~ Map[K, V]
. But how would we use this?
These approximations are generally for datastructures and methods/functions. How do we codify that?
Yeah it's not completely clear to me, either.
But it occurs to me that all of these Monoids have/need a 'create' method that does the conversion involved here (more or less). Perhaps we could have types like:
trait Exact[A, E] extends A {
def exactData: E
}
trait ApproximationMonoid[A, E] extends Monoid[A] {
def approximate(exactData: E): A
}
trait ExactMonoid[A, E] extends Monoid[Exact[A, E]]
Though this is already problematic due to 'extends A', because these things aren't really traits (SketchMap is a case class, even).
A minor tweak to your suggestion:
// A approximates E
// ideally there would be some metric here, like maybe:
// if we have |approx(f(e1, e2)) - f'(approx(e1), approx(e2))| < eps
// for some approximating transformation f -> f' and some Metric[A].
trait Approximates[A, E, M[_]] {
def approximate(e: E): A
// get the approximate typeclass from the exact one.
def transform(m: M[E]): M[A]
def metric: Metric[A]
def error: Double
}
sealed trait MaybeApprox[+A, +E] extends Any
case class Exact[E](exact: E) extends MaybeApprox[Nothing, A] with AnyVal
case class Approx[A](approx: A) extends MaybeApprox[A, Nothing] with AnyVal
// This is basically a renamed version of the EventuallyMonoid to be more what we are using it for
// with the addition of a new Typeclass: Approximates[A, E]
case class ApproximateMonoid[A, E](implicit exact: Monoid[E], approx: Approx[A, E]) extends Monoid[MaybeApprox[A, E]] ...
// we can get Monoid[A] from approx and exact.
I think the signature of the Monoid would be a bit longer. Something like:
class EventuallyApproxMonoid[A, E](
makeApprox: E => A,
p: E => Boolean
)(
implicit
exact: Monoid[E],
approx: Monoid[A]
) extends Monoid[MaybeApprox[A, E]]
Something like this could be a great way to clean up our test code. I'm thinking about making something like
trait ApproximateProperty {
type Params
type Exact
type Approximate
type ExactResult
type ApproximateResult
def makeApproximate(p: Params, e: Exact): Approximate
def exactResult(e: Exact): ExactResult
def approximateResult(a: Approximate): ApproximateResult
def claim(e: ExactResult, a: ApproximateResult): Boolean
def probability(p: Params): Double
}
Example implementation:
CmsProperty extends ApproximateProperty {
type Params = CMSParams
type Exact = List[T]
type Approximate = CountMinSketch[T]
type ExactResult = Int
type ApproximateResult = Int
def makeApproximate(p: CMSParams, exact: List[T]) = {
val cmsMonoid = CountMinSketchMonoid(params)
cmsMonoid.sum(exact.map(cmsMonoid.create(_)))
}
def exactResult(list: List[T]) = list.getFrequency(x)
def approximateResult(cms: CountMinSketch[T]) = cms.frequency(x)
def claim(e: Int, a: Int) = e == a
// probability that `claim` is true, using eps and delta in the params
def probability(p: CMSParams) = ???
}
This example doesn't check out because I don't know where to get the x
in exactResult and approximateResult. Maybe there should be a def getArbitraryTestInput
which returns a suitable x
which we can feed into exactResult
and approximateResult
?
Does this seem like a good idea? My goal is to make something similar to scalacheck's Properties but for approximate properties. This could be useful in bounding the probabilities of test failures and such for each test.
Do we all mind if I close this and merge the discussion into #44 ?
This idea seems to crop up in a few places. Essentially, exact structures can be taken as a special case of our sketch structures that take up less space.
In theory, Monoids could take advantage of this, and maybe we could simplify some logic by having implicit conversions? And perhaps we could simplify some EventuallyMonoids.
Some examples would be:
String
orSet[String]
as exact BloomFilterK
orSet[K]
as exact HLLK
,(K, Long)
, orTraversableOnce[(K, Long)]
as exact CMS(K, V)
orTraversableOnce[(K, V)]
as exact SketchMap