tvondra / tdigest

PostgreSQL extension for estimating percentiles using t-digest
PostgreSQL License
87 stars 13 forks source link

Functions to compute histograms from the approximated CDF #8

Open tvondra opened 4 years ago

tvondra commented 4 years ago

Another topic discussed with Matt Watson (@sporty81) on e-mail was generating histograms, which are essentially just another way to visualize the CDF approximated by the t-digest. I've hacked together two simple SQL functions in pull request #5 which calculate equi-width and equi-height histograms (there are probably more types, but I think those are the most common).

I think the functions are mostly fine, but I wonder how accurate the histograms can be. Probably good enough on the tails, but t-digests are intentionally constructed with lower accuracy in the middle part so the histograms have the same issue.