twitter / algebird

Abstract Algebra for Scala
https://twitter.github.io/algebird
Apache License 2.0
2.29k stars 346 forks source link

Self Learning bitmap #404

Open MansurAshraf opened 9 years ago

MansurAshraf commented 9 years ago

I was wondering if someone has seen this paper from google. it is solving the same problem as HLL but is scale invariant and has better memory consumption. Below is the abstract.

Abstract: In this paper, we develop a novel approach, called self-learning bitmap (S-bitmap) that is scale-invariant for cardinalities in a specified range. S-bitmap uses a binary vector whose entries are updated from 0 to 1 by an adaptive sampling process for inferring the unknown cardinality, where the sampling rates are reduced sequentially as more and more entries change from 0 to 1. We prove rigorously that the S- bitmap estimate is not only unbiased but scale-invariant. We demonstrate that to achieve a small RRMSE value of ǫ or less, our approach requires significantly less memory and consumes similar or less operations than state-of-the- art methods for many common practice cardinality scales. Both simulation and experimental studies are reported.

screen shot 2015-01-05 at 12 07 29 pm 1

http://arxiv.org/pdf/1107.1697v1.pdf

Here is an implementation in Java : https://github.com/travisbrady/self-learning-bitmap

avibryant commented 9 years ago

Looks interesting, but I suspect it's not associative (or commutative).

jnievelt commented 9 years ago

There doesn't seem to be any reduction possible (associative, commutative, or otherwise). It only defines:

B2 = B1 + x

not

B3 = B1 + B2

johnynek commented 9 years ago

Should we add things like this as Fold instances? You can't get any parallelism, but at least you can still easily run them on scalding/spark, etc...?

sritchie-stripe commented 7 years ago

Agreed about fold.