Open MansurAshraf opened 9 years ago
Looks interesting, but I suspect it's not associative (or commutative).
There doesn't seem to be any reduction possible (associative, commutative, or otherwise). It only defines:
B2 = B1 + x
not
B3 = B1 + B2
Should we add things like this as Fold
instances? You can't get any parallelism, but at least you can still easily run them on scalding/spark, etc...?
Agreed about fold
.
I was wondering if someone has seen this paper from google. it is solving the same problem as HLL but is scale invariant and has better memory consumption. Below is the abstract.
Abstract: In this paper, we develop a novel approach, called self-learning bitmap (S-bitmap) that is scale-invariant for cardinalities in a specified range. S-bitmap uses a binary vector whose entries are updated from 0 to 1 by an adaptive sampling process for inferring the unknown cardinality, where the sampling rates are reduced sequentially as more and more entries change from 0 to 1. We prove rigorously that the S- bitmap estimate is not only unbiased but scale-invariant. We demonstrate that to achieve a small RRMSE value of ǫ or less, our approach requires significantly less memory and consumes similar or less operations than state-of-the- art methods for many common practice cardinality scales. Both simulation and experimental studies are reported.
http://arxiv.org/pdf/1107.1697v1.pdf
Here is an implementation in Java : https://github.com/travisbrady/self-learning-bitmap