scikit-hep / hist

Histogramming for analysis powered by boost-histogram
https://hist.readthedocs.io
BSD 3-Clause "New" or "Revised" License
127 stars 25 forks source link

External package integration #4

Open LovelyBuggies opened 4 years ago

LovelyBuggies commented 4 years ago

@henryiii We are going to add some shortcuts for analysis. Could you please specify which kinds of analysis are needed? And what tools or packages do you think are proper?

There are problems with the above fitting models: GPU-oriented, C++ based, and externally dependent relied. We expect a less dependent, more pythonic solution for common use. I recommend Scipy. Scipy's optimizer module gives us the flexibility to solve problems related to fitting and other data analysis (though it may not perform as well as the more specialized solutions like maximum-likelihood fits).

In addition to this, it is not clear whether our shortcuts should include classification, regression, clustering, etc. (I did not find any questions on the channel.) If yes, scikit-learn could be a wonderful solution.

LovelyBuggies commented 4 years ago

Scipy is not dependency-relied and could provide analyzing methods other than fitting, such as integration ... (though I am not sure whether they are of use for HEP). The points are: 1) It might not be specific as GooFit... 2) Using a Scikit-HEP package might be more, umm... HEP-ecosystemic.

lukasheinrich commented 4 years ago

hi @LovelyBuggies if you would like to have a histogram-based statistics model, https://github.com/scikit-hep/pyhf might be interesting and only depends on scipy + numpy

LovelyBuggies commented 4 years ago

@lukasheinrich Thanks for your suggestions, I will dive into pyhf and see whether it is proper for the functionality in hist.

henryiii commented 4 years ago

This is two separate issues: Shortcuts for easy interaction, and adaptors/integration into other packages (which could also be called shortcuts). In general, we should be able to implement some of them / many of them without adding a dependency on the package, though we will have to be careful when we do.

henryiii commented 4 years ago

I think we should focus on how to "feed" our histograms to these other packages. Maybe come up with a standard histogram API? Then boost-histogram (and maybe others, like Physt) could also support it.

lukasheinrich commented 4 years ago

One thing that might be important for all but the most simple clients is feeding a structured set of histograms. I started some work along those lines with @jpivarski with histbook and the idea of a "book" / nest-able structure of histograms would be useful. cc @matthewfeickert @kratsg

LovelyBuggies commented 4 years ago

@lukasheinrich An initiative concerning 'nest' was put forward here.

HDembinski commented 4 years ago

What exactly is the problem with iminuit's interface? What is not pythonic enough about it? iminuit has little in common with the interface of C++ MINUIT, it is pretty pythonic already.

HDembinski commented 4 years ago

Besides, if you like scipy.optimize.minimize, you may also like https://iminuit.readthedocs.io/en/latest/reference.html#iminuit.minimize

HDembinski commented 4 years ago

@lukasheinrich boost-histogram supports integer and category axes, which can be used to bundle histograms together. I use these axes to have a common histogram with signal, background, different data subsets, etc. What can histbook do that boost-histogram with these axes cannot do?

HDembinski commented 4 years ago

@LovelyBuggies I disagree with your initial list of "shortcomings". GPU support is not a problem, it is a feature. Any package that supports the GPU should also fall back to CPU computing when GPUs are not available, of course, like numba and jax.

I hope you got from my previous comment that we cannot replace iminuit with scipy.optimize.

"We expect a less dependent, more pythonic solution for common use." Having well-justified dependencies is ok, if they can be loaded from PyPI and installed automatically. jax and jupyter are high-quality software and they depend on a gazillion of other packages.

LovelyBuggies commented 4 years ago

@HDembinski Thanks for the correction! Looks like I misunderstand them: integrating iminuit to Hist is feasible and reasonable.

lukasheinrich commented 4 years ago

@HDembinski yes some of these axes types are perfectly suitable. Would 'jagged' data work as well? Consider this case: 2 phase phase region (one has data, bkg histoograms with 10 bins), the other has [data, signal, bkg] histograms with 5 bins

2 event categodies    / \
                     /   \ 
   2 samples      / |   / | \    3 samples
                 /  |  /  |  \
     10 bins     |  |  |  |   |    5 bins
LovelyBuggies commented 4 years ago

@henryiii I have some tries and make a new demo concerning this topic HERE :)

LovelyBuggies commented 4 years ago

We can encapsulate the work into funcs like h.to_numpy(), e.g., h.to_aghast(), h.to_mplhep(), h.to_root(), etc.