HyperLogLog is an algorithm to approximately count distinct elements.
The HyperLogLog algorithm is able to estimate cardinalities of > 10^9 with a typical error rate of 2%, using 1.5 kB of memory
Google Cloud's data warehouse, BigQuery uses HyperLogLog++ which provides a bunch of improvements like better accuracy for very large cardinalities (> 1 billion).
Article
Notes
HyperLogLog is an algorithm to approximately count distinct elements.
Google Cloud's data warehouse, BigQuery uses HyperLogLog++ which provides a bunch of improvements like better accuracy for very large cardinalities (> 1 billion).
Examples
How many unique users did GitHub have in 2016?
Distinct count on more than 3 billion Reddit comments