Open ikrommyd opened 2 months ago
That's an interesting issue, because in hepstats, this creates currently a binned asymov set, also for unbinned data, which is not optimal and should not happen. But it could be a very high stats dataset, that could be a possibility.
The question is a bit conceptually, what is needed?
Creating a binned is probably already possible with the to_binned
and to_binneddata
(I think, right?)
What's the weighted unbinned events, why weighted? Not sure about where the weigths are coming from.
And it reminds me of another discussion about the "best binning", as we're doing a lot of unbinned fits in LHCb that could, in prinziple, be binned. So implementing something like this https://arxiv.org/abs/2210.02848 could be useful.
I guess things are currently possible already to do, hepstats should have an automatic binning, or zfit itself. And modulo that it isn't as easy accessible to the user as it maybe should be. It is, but more in a way of how to communicate this to the user?
I was just looking at the zfit code (not hepstats). Cool so maybe a shortcut of to_binned
-> to_binneddata
would be nice and easy to have for unbinned models. For the weighted unbinned events, look at the "Pseudo-Asimov" dataset of the combine docs I linked above.
When it comes to visibility to, If I search the code for the word "Asimov", I would find to_binneddata
. However If I didn't search like that, I would expect something like model.create_asimov
. Or even as part of the sampler. Since we do sampler = model.create_sampler()
to generate toys, we could have a sampler method other than resample that is make_asimov
or something like that.
I would expect something like model.create_asimov
I think this is a crucial difference between having a nicely named API and good enough docs: the problem with adding this is that the expectations may be different. Should it be binned, unbinned? But what is more crucial is to have something where this is explained I think
How did you come across "asimov", just to collect a bit of data?
And agree, the to_binneddata
as a shortcut wouldn't harm!
There is also a way to create Asimov datasets from unbinned models either by: 1) Making a binning on the spot 2) Generating it from many weighted unbinned events. Check the combine docs for more info: https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest/part3/runningthetool/#toy-data-generation