scikit-hep / pyhf

pure-Python HistFactory implementation with tensors and autodiff
https://pyhf.readthedocs.io/
Apache License 2.0
283 stars 83 forks source link

Summing bins per channel #1198

Open alexander-held opened 3 years ago

alexander-held commented 3 years ago

Question

It is quite common to publish tables with yields per channel, instead of yields per bin in the analysis. One way to calculate the uncertainty per sample (and for the sum of all samples) for a channel is to sum bins in that channel, and then to do the same calculation that is done to get the per-bin uncertainty for the resulting one-bin channel. I am not sure whether there is another way that correctly takes bin-by-bin correlations into account.

Is this something that can be done in pyhf, or something that is in scope?

This could also be used more generally for dynamic re-binning (could be useful to e.g. start with histograms with 100 bins and then optimize binning without having to re-build histograms every time).

Relevant Issues and Pull Requests

https://github.com/scikit-hep/pyhf/discussions/1187

alexander-held commented 3 years ago

The first part of this question is not particularly relevant anymore. Uncertainties for yields per channel can be calculated via model.expected_data to get model predictions for all variations, and then summing those per channel.