iand675 / datasketches-haskell

4 stars 2 forks source link

DataSketeches

Build status data-sketches on stackage data-sketches on hackage

A port of subsets of the Apache DataSketches library.

Overview

In the analysis of big data there are often problem queries that don’t scale because they require huge compute resources and time to generate exact results. Examples include count distinct, quantiles, most-frequent items, joins, matrix computations, and graph analysis.

Stocastic streaming algorithms for operating on large datasets. If approximate results are acceptable, there is a class of specialized algorithms, called streaming algorithms, or sketches that can produce results orders-of magnitude faster and with mathematically proven error bounds. For interactive queries there may not be other viable alternatives, and in the case of real-time analysis, sketches are the only known solution.