scikit-hep / mplhep

Extended histogram plotting on top of matplotlib and HEP collaboration compatible styling
https://mplhep.readthedocs.io
MIT License
189 stars 66 forks source link

Use unbinned values as input to `histplot` #511

Open JamesJieranShen opened 3 months ago

JamesJieranShen commented 3 months ago

Hello! Thanks for creating this great library (along with uproot, awkward, and many others!).

In my daily workflow, I have found many of the features provided by histplot to be extremely useful. However, I find myself using this as a replacement to plt.hist a lot when it comes to simple histograms. However, I would have to create this in two separate steps, first binning the values with np.histogram, and then feed the output to histplot. It would be great if I can directly pass an array of values and have histplot create a nice histogram, with the proper mplhep styling and features.

In short, I am looking for something like

hep.histplot(values, bins=100, range=(0, 200), yerr=True, color='k')

Does something like this already exist? If not, I think this would be a great feature to add. As far as I can tell, implementing something like this would simply involve calling np.histogram and then histplot.

andrzejnovak commented 3 months ago

Hi @JamesJieranShen welcome! Can you explain why plt.hist doesn't suffice for your use-case? This library and histplot in particular was created to plot already existing histograms, such as the ones created via hist

JamesJieranShen commented 3 months ago

There's many features provided by histplot that's not available in plt.hist. Off the top of my head the most useful ones include the automatic calculation and plotting of error bars as well as histograms with arbitrary bin width normalization. Plus, the default histtype of "fill" in plt.hist is also not ideal for many HEP-style plots (although this could just be a personal preference). Allowing all of the nice quality-of-life features provided by histplot to be quickly used just like plt.hist would be very useful.

andrzejnovak commented 3 months ago

Thanks for the clarification. I am happy to hear you find histplot useful.

The main reason why something like that is not currently supported is because histplot([1,2,4,5,2,6,6,7]) or histplot([[1,1,1], [2,2,3], [2,3,4]]) get in interpreted as bin values or a set of bin values. Keeping the API sane while allowing a switch to treat such inputs as values to histogram seems a bit tricky.

However, one way I can see that this could work without being confusing to users would be a new mplhep.hist function that could implement the functionality you want (either as np.histogram or hist.Hist) under the hood and pass the results to histplot. I'd be happy to merge such functionality if you'd want to take a stab at a PR.