heliosdrm / GRUtils.jl

Tools for using the GR framework in Julia
Other
31 stars 6 forks source link

Log the bin edges when histogramming on a log scale #87

Closed ericphanson closed 3 years ago

ericphanson commented 3 years ago

This PR changes histogram to use log-scaled bin edges for making histograms with log-scale axes, using (the simplest version of) @mcabbott's code from https://github.com/JuliaLang/julia/pull/39071 (@mcabbott: is it OK to use your code for this here?).

I imagine it's breaking; if that's unacceptable, my first commit just adds an edges keyword argument to allow the user to pass custom histogram edges instead, so they can achieve the same thing by passing log-scaled edges themselves, so that would be a non-breaking way to be able to get this kind of plot.

Example

With AnalyzeRegistry.jl I counted the number of lines of source code of all Julia packages, and was interested in plotting a histogram.

Without using a log scale, one can tell most packages have relatively few lines of code, and some packages have a ton:

histogram(lines_of_code, xlabel="Lines of Julia source code", ylabel="Number of packages")

hist_no_log

But it's hard to get a good sense of the distribution. By making the x-axis a log-scale, on release GRUtils, I get

histogram(lines_of_code, xlog=true, xlabel="Lines of Julia source code", ylabel="Number of packages")

hist

which is not very nice looking. But with this PR, I get

hist_bins_logged

which shows a nice distribution and seems to be the most informative plot.

heliosdrm commented 3 years ago

Thanks. I don't consider it breaking, but rather a bugfix. And in any case, the linear binning can still be shown in log scale, if xlog is not passed as argument to histogram, but applied later, e.g.:

histogram(x)
xlog(true)