zero-one-group / geni

A Clojure dataframe library that runs on Spark
Apache License 2.0
281 stars 28 forks source link

Add support for the creation of histograms #307

Open andres-moreno opened 3 years ago

andres-moreno commented 3 years ago

Creating histograms is a very common activity. Geni offers cut which supports the creation of histograms as a function of bins, an array of values, but the user has to compute these bins manually.

Geni provides qcut to help users determine how wide each bin should be.

It would be helpful to provide support for a function, (g/histogram :column {:n-bins :bins-vector}) that would either compute the bins automatically if provided with an :n-bins parameter, or compute the histogram on the basis of the supplied :bins-vector.

Using the form with just the :n-bins argument is very useful for data analysis and review, while being able to provide a :bins-vector addresses the use case where histogram use is informed by business domain needs (e.g., bin populations into age brackets that align with survey methodology).