scikit-hep / probfit

Cost function builder. For fitting distributions.
http://probfit.readthedocs.io/
MIT License
51 stars 30 forks source link

Binned data #104

Closed mzks closed 3 years ago

mzks commented 5 years ago

Interface of BinnedLH and BinnedChi2 for binned data

I would like to manage already binned (histogramed) data, and fit them using BinnedLH and BinnedChi2. I think that these cost function do not use raw values of events by events. In my experiment, I prefer to handle a small data as a histogram, not a raw array. Such array data reading is sometime very low-speed. Thus, I would like an interface to pass binned data directly to these cost functions.

I guess we've already had another way to use bin value array and its weight array. However, this way is not simple. I think it is better that these functions have an interface to connect easily NumPy, Matplotlib, and root-numpy(hist2array).

I extended these function constructor to describe what I want. If you have a better way or idea, please let me know.

mu22le commented 4 years ago

What is the current status of this merge request? I have a large quantity of binned data to process and unrolling it with np.repeat() is not really efficient.

marinang commented 4 years ago

@mu22le I think probfit will be marked as deprecated soooooon. See https://github.com/scikit-hep/scikit-hep.github.io/pull/49. A future alternative is zfit, unfortunately it is not yet implemented but I invite you to follow its development.

mbaak commented 3 years ago

I've been looking around for Python package that can do binned fits and also histogram fits. I see that zfit has been busy with a binned fit interface for quite some time, but the development pace of this is slow, unfortunately. (There's an unreviewed branch but for me it doesn't work.) probfit works well for this though.

Perhaps good to accept this branch and release a new version? The modifications are useful. I see that 27 other people forked this branch, so there seems to be a general need for this. It would be nice imho to keep the package alive for this.

Thanks!

mzks commented 3 years ago

Thank you very much for your feedbacks. In fact, I've not been in the team, I'm just a developer who want nice fitting package. We seem to agree for this pull-request, but this tool doesn't seem to be updated no longer. I, really, really miss the probfit because this package is my favorite.

If you excuse my impolite comments, there is no enough package for fitting on python "NOW", . Goofit is too large to try to fit small data easily. Scipy curve_fit is general package, but I would like to control the way (cost function, minimizer, etc.), it's hard. zfit is better, but it doesn't have binned-fitting way and nice user interface (Recently, I made small interface of that as a trial, https://github.com/mzks/zfitter/blob/main/notebook/zfitter1.ipynb). iminuit is minimizer, not fitting tool. I found some issues, thus there may be many issues in the sense of statistic. I guess the issue comes from our good memories of ROOT.

Considering this situation, probfit was the best but scikit-hep marked it deprecated before major release of the affiliated new tools thus the probfit hasn't been updated for example, it can't load the latest iminuit.

This is not actually what should not be discussed here. I think it should be discussed among hep-community. That is, my point is hep-community doesn't pay enough (human-effort) for python software development like ROOT. In other words, there is no green light for pythonic way now. When I complain that, developers said me "Hey, are you interested in the development? If you want something, you can join us!" it is ridiculous that student should work for such basic tool or every python-user physists makes their own fitting tool.

I'm really sorry for my selfish comments. I would like to respect every developers working for the packages. I understand these comments make great developers anger, but it might be helpful for people seeking fitting tools on python. Here, I think we may not expect great probfit package for our future as a conclusion @mbaak .

mbaak commented 3 years ago

I asked the zfit author and he wrote that the binned fits "are coming". (But my impression is it's been like that for some time now.)

In the mean time, I don't see the harm in making a minor update of the probfit package, so the histogram fit functionality is available to others as well.

Let's see if the author can consider this. (I'd be happy to do the work for this release this as well.)

mu22le commented 3 years ago

I ended up using Minuit directly for my project. Someone recently suggested I take a look at Sherpa https://cxc.cfa.harvard.edu/contrib/sherpa/ you may also be interested.

marinang commented 3 years ago

You know there is an object binned likelihood in iminuit that might help you.

mbaak commented 3 years ago

Right, I've seen some of the (helpful) code in iminuit. Indeed it's not hard to write a binned fit that works, but ideally I prefer a well-developed version. (It easily turns complex if you want to do more advanced things.)

I hope the author is open to making a (small) release update.

marinang commented 3 years ago

I was contributing to probfit years ago and then moved to zfit. The problem with probfit is the cython implementation which is quite tedious to maintain. Also probfit does not a good job in terms of fitting speed, it is very slow compared to zfit. That might be some of the reasons why Scikit-HEP marked the project as deprecated, so don't expect any release ... but we never know.

marinang commented 3 years ago

This is not actually what should not be discussed here. I think it should be discussed among hep-community. That is, my point is hep-community doesn't pay enough (human-effort) for python software development like ROOT. In other words, there is no green light for pythonic way now. When I complain that, developers said me "Hey, are you interested in the development? If you want something, you can join us!" it is ridiculous that student should work for such basic tool or every python-user physists makes their own fitting tool.

I don't think that's ridiculous at all .... I joined the Scikit-HEP effort because I decided to do my data analysis for my thesis in Python. That was my choice and I knew that not all the tools a particle physicist needs existed back then in Python. So there was a need to create those libraries and I jumped in and that was great. I learned a lot of useful skills in programming or in statistics when developping for zfit and hepstats. But ok this is not the point of this thread.

mzks commented 3 years ago

Thank you very much for your comments. I agree with you, and I respect for great work of @marinang . And thanks also for your explaination. However, some users may (in particular, who don't want to implement theirselve, but want to use widely-used well-established package) have been ignored by the community. By the way, I've also waited for zfit's binned fit. I will continue to wait...

henryiii commented 3 years ago

A few thoughts: support for PlottableHistogram would be nice. The interface is a bit clunky; how about accepting a NumPy style tuple for data when binned_data is true? This is basically how mplhep works (in fact, it can detect a bins, edges tuple and doesn't need an extra binned_data parameter). Then eventually "data" could also be a PlottableHistogram as defined in UHI, say an Uproot or boost-histogram/hist histogram.

There's still a TODO.

henryiii commented 3 years ago

Take a look at that, I think this is simpler, doesn't add lots of new keywords that all have to be right, and follows other packages in using the output tuple from np.histogram. Check the TODO, too, please; there are no tests for this so I'll merge if I get some positive feedback.

mbaak commented 3 years ago

I'll give it a go.

henryiii commented 3 years ago

Any news? Does it work as is?

mbaak commented 3 years ago

@henryiii It works once you replace h = weights by h = h weight in lines 470 and 801. Without this numpy complains that the datatypes are not the same in the multiplication. Will you patch it directly? Thanks!

henryiii commented 3 years ago

Ahh, missed this. Sorry! Will patch it. I saw the email, but thought it was for boost-histogram or hist and then couldn't find the issue or the email again.

mbaak commented 3 years ago

Thanks.