Open jonas-eschle opened 4 years ago
I'll extend here with an example:
class UnbinnedPDF(...):
...
class BinnedPDF(BaseBinnedPDF):
_UnbinnedPDFClass = UnbinnedPDF # can be specified, otherwise auto conversion
_unnormalized_binned_prob(x):
implementation...
_yield = False # e.g. need to set it explicitly since Binned/Template PDFs are often extended.
_convert_to_unbinned(...):
... allow to customize the conversion
binned_pdf = BinnedPDF(...)
unbinned_pdf = binned_pdf.create_unbinned(... customization args)
It implies to extend the PDFs with a conversion functionality. But this is anyway needed, either in this way or otherwise implemented as logic (in the two-in-one case).
Maybe it's slightly more code and accounting needed in exchange for a stronger decoupling.
We already had a discussion in #38, doing a somewhat fresh start here with a different approach.
Brought to the point
I think it boils down to the following: Either we have a model offering binned and unbinned methods ("combined model") or we have two kinds of models, a binned and an unbinned ("two models").
I think we should aim for two goals in zfit regarding binned and unbinned fits:
Concerns combined model:
binned_integration
?binned_prob
,binned_sampling
,binned_integration
etcetc are factually decoupled from the continuous case) -> increases complexityConcerns two models:
I tend towards two models:
Extended thoughts and discussion (for documentation, no need to read)
What is clear
we need two kind of data,
Data
(unbinned) andBinnedData
orHist
or something like that. Given a conversion function, they can be converted into each other.What we need
The discussion is therefore about the pdf and methods mostly.
I think we should aim for two goals in zfit regarding binned and unbinned fits:
Mostly the latter, assuming we want to be able to do template pdfs, changes quite the current ideas from #38.
Three possibilities are there that I can see, starting with the least feasible as I think:
Loss responsible
tl;dr: does not work with template pdfs reasonable (section can be skipped)
All the responsibility could be moved to the loss, we keep things as they are otherwise, basically what was discussed in #38. While this works for simple cases, it fails for others, namely:
_unnormalized_pdf
method and then be binned again by the loss. We would stay way beyond the efficiency of what is achievable.One PDF, two methods
tl;dr: seems like having two mostly independent pdfs in one. Complexity too high?; undefined behavior (sampling binned/unbinned?); hard to extend specifically (e.g. integration for continuous, morphing for templates etc.).
We can extend the functionality of the current PDF with a method
binned_prob
or something. This takes only a binned Data.While this works nicely for simpler cases, it has a few shortcommings, namely that it's either disambiguous what a method returns or we need two versions:
sample
return? A binned or an unbinned object? What if the pdf was defined binned, we cannot just return an unbinned data.integrate
do? Integrate the binned or unbinned pdf?My suspicion is that we end up with a pdf that has
binned_prob
,binned_integral
,binned_partial_integral
,binned_sampling
etc. basically "doubling" the methods (setting flags may be unfeasible since binned methods may take other arguments such as the binsize).PDF and BinnedPDF
Having two distinct classes and a converter (
create_binned
,create_pdf
or similar) solves the above: each class has distinct methods and e.g.BinnedPDF
returns aBinnedData
, sampling from a PDF returns aData
, same for integration etc.In short, it's the maximum decoupling. If our goals are as stated in the beginning and we want to be maximal efficient with unbinnedPDF + unbinnedData and binnedPDF + binnedData, this architecture seems the most appropriate.
Comparison and conclusion
The second approach and the third are actually similar: In the second, "two pdfs" are contained in one (the logic of two), while in the third, they are two distinct pdf depending on each other (e.g. if a binnedPDf is created from a PDF, it keeps a dependence).
Advantage of the two-in-one:
Disadvantage of the two-in-one:
Advantage of the two-pdfs:
Disadvantage:
Conclusion: I fear more an overly complex two-in-one pdf and an inefficient implementation of e.g. template pdfs than code duplication or the dependencies on other pdfs.
(N.B: this does not exclude to have a BinnedLoss that uses continuous pdfs and Data, it could e.g. convert it implicitly to a BinnedPDF)
Your thoughts @apuignav, @rsilvaco?