Plotting normalised histograms

chrisburr commented 6 years ago

Is it possible to plot normalised overlayed histograms using something like:

histogram.overlay("x").marker("y", error=True, normed=True).to(canvas)
histogram.overlay("x").normalize().marker("y", error=True).to(canvas)

jpivarski commented 6 years ago

Probably the first syntax: ...marker("y", error=True, normalized=True).vegascope(). I'm not sure what it would mean to normalize the overlay, unless you mean "normalize everything under this tree." It's probably better to have explicit normalized=True.

jpivarski commented 6 years ago

I've been putting in a lot of quick changes recently, but this one will require a bit of thought. Different types of axes normalize differently, and I have to manually apply the normalization in the numbers that I give to Vega-Lite. (I'm not using Vega-Lite's built-in aggregation because I want to plot pre-filled histograms. Thus, I don't get to use its built-in normalization.)

jpivarski commented 6 years ago

I put this into the "feature-booking-for-pyhf" branch because that's the one I'm working on. It will eventually make it into a main release, along with a lot of new Book features (version 1.2.x).

The numbers for plots come from Hist.table, which also feeds Hist.pandas. The natural place to put the normalization feature was in Hist.table, so that everybody gets it. Now you can

histogram.pandas(normalized=True)    # the counts() and err(counts()) are normalized

and

histogram.marker("x", normalized=True).vegascope()

to do a normalized plot. normalized is just an option on terminal plot-chains: bar, step, line, area, marker. I have not added normalization to heatmap, though I suppose I could because that comes from Hist.table as well. (You wouldn't see a difference, except in the range of the color legend.)

To define what exactly "normalized" means for non-uniform bins, I did this: total = sum(count[i] / binwidth[i]) and normalized[i] = count[i] / (total * binwidth[i]). The errors were scaled by the same factor as the counts because bin width and 1.0 are known exactly. Underflow and overflow bins are zero after normalization because of their infinite bin widths. This normalization method behaves the same as ROOT's TH1::DrawNormalized() for fixed-size bins (TH1::DrawNormalized() doesn't seem to do anything for variable width bins).

scikit-hep / histbook

Plotting normalised histograms #31