scikit-hep / pyhf

pure-Python HistFactory implementation with tensors and autodiff
https://pyhf.readthedocs.io/
Apache License 2.0
279 stars 83 forks source link

Documentation for staterror modifier #1018

Open alexander-held opened 4 years ago

alexander-held commented 4 years ago

Question

The staterror documentation https://scikit-hep.org/pyhf/likelihood.html#mc-statistical-uncertainty-staterror shows a modifier example:

{ "name": "mod_name", "type": "staterror", "data": [0.1] }

but does not clarify what exactly it is that should be specified under data. In particular, a user may wonder whether they should put the relative or absolute stat. uncertainty (the absolute is needed).

Users may also wonder why a Gaussian constraint term is used, and/or how to switch to a Poisson term which is available in ROOT HistFactory (see https://github.com/scikit-hep/pyhf/issues/760).

In the formula provided in the text, is the summation not in quadrature?

small typo in the text: "constrained term" -> "constraint term"

came up in conversation with @kratsg

Relevant Issues and Pull Requests

https://github.com/scikit-hep/pyhf/issues/760

matthewfeickert commented 4 years ago

@paulgessinger given your comments on the staterror Stack Overflow question any thoughts or feedback on making things more clear would be helpful here as well.

paulgessinger commented 4 years ago

I guess mentioning that it's the absolute error would be a good idea (although that's what I personally assumed to be the case).

Apart from that, if my understanding that the alternative to staterror is a shapesys is correct, maybe it would be helpful to make the general reference "adding uncertainties for each sample would yield a very large number of nuisance parameters" in the doc right now more concrete.

kratsg commented 4 years ago

Apart from that, if my understanding that the alternative to staterror is a shapesys is correct, maybe it would be helpful to make the general reference "adding uncertainties for each sample would yield a very large number of nuisance parameters" in the doc right now more concrete.

It's a matter of how you model the uncertainty. For example, a shapesys is useful to apply an uncertainty on a particular sample in a channel... whereas, if all your MC samples are generated with some correlated (generator-based, statistical) uncertainty -- then a staterror is more appropriate.

sambklein commented 5 months ago

Related to this, there is also an issue with the Modifiers and Constraints table, where the form of the gaussian and the input would have the data in the modifier definition be the relative variance rather than the absolute standard deviation.

I think the constraint term should be $\prodb\mathrm{Gaus}\left(a{\gamma_b} = 1\middle|\,\gamma_b,\delta_b / \nu_b\right)$ and the input should be $\delta_b = \sqrt{\sums\delta^2{sb}}$.