scikit-hep / pyhf

pure-Python HistFactory implementation with tensors and autodiff
https://pyhf.readthedocs.io/
Apache License 2.0
283 stars 83 forks source link

`AsymptoticCalculator.teststatistic` issues warning and returns `nan` when `qmuA == 0` #1992

Closed masonproffitt closed 2 years ago

masonproffitt commented 2 years ago

Summary

When poi_test == 0, pyhf.infer.calculators.AsymptoticCalculator.teststatistic often issues a warning and returns nan because of a division by zero on this line:

https://github.com/scikit-hep/pyhf/blob/cbd68c829748fff11bf44adac2ad9e9b15068af2/src/pyhf/infer/calculators.py#L418

This would disappear by changing < to <= on this line:

https://github.com/scikit-hep/pyhf/blob/cbd68c829748fff11bf44adac2ad9e9b15068af2/src/pyhf/infer/calculators.py#L422

As far as I can tell, there's no reason not to make this change. These lines only get run for test_stat == 'qtilde', and the only time $\tilde{q}_{\mu, \textrm{A}} = 0$ should be when $\mu = 0$. By definition, $\tilde{q}_\mu$ is always 0 for this case, and _true_case would cover this:

https://github.com/scikit-hep/pyhf/blob/cbd68c829748fff11bf44adac2ad9e9b15068af2/src/pyhf/infer/calculators.py#L411-L413

It's actually interesting that this issue doesn't always occur for poi_test == 0. I think that's only because of small but non-zero errors in fitting to the generated Asimov data before calculating qmuA. This issue has been present for years--I think since version 0.1.1 or so.

OS / Environment

$ cat /etc/os-release
NAME=Gentoo
ID=gentoo
PRETTY_NAME="Gentoo Linux"
ANSI_COLOR="1;32"
HOME_URL="https://www.gentoo.org/"
SUPPORT_URL="https://www.gentoo.org/support/"
BUG_REPORT_URL="https://bugs.gentoo.org/"

Steps to Reproduce

import pyhf
model = pyhf.simplemodels.uncorrelated_background(signal=[1.0], bkg=[1.0], bkg_uncertainty=[1.0])
observations = [2]
mu_test = 0.0
data = observations + model.config.auxdata
init_pars = model.config.suggested_init()
par_bounds = model.config.suggested_bounds()
fixed_params = model.config.suggested_fixed()

print('test_statistics.qmu_tilde result:')
print(pyhf.infer.test_statistics.qmu_tilde(mu_test, data, model, init_pars, par_bounds, fixed_params))

print()

print('AsymptoticCalculator.teststatistic result:')
asymptotic_calculator = pyhf.infer.calculators.AsymptoticCalculator(data, model, test_stat="qtilde")
print(asymptotic_calculator.teststatistic(mu_test))

File Upload (optional)

No response

Expected Results

I would expect pyhf.infer.test_statistics.qmu_tilde and pyhf.infer.calculators.AsymptoticCalculator.teststatistic to both return 0, the correct value of $\tilde{q}_\mu$ (regardless of normalization factors).

Actual Results

test_statistics.qmu_tilde result:
0.0

AsymptoticCalculator.teststatistic result:
/home/user/iris-hep/src/pyhf/src/pyhf/infer/calculators.py:418: RuntimeWarning: invalid value encountered in double_scalars
  teststat = (qmu - qmu_A) / (2 * self.sqrtqmuA_v)
nan

pyhf Version

$ pyhf --version
pyhf, version 0.7.0rc4.dev6

Code of Conduct

masonproffitt commented 2 years ago

I guess this is actually probably an updated and more specific version of #529. So it looks like it can also occur for non-zero POI values if the raw signal yield is small enough. The < to <= switch should fix that case as well.

kratsg commented 2 years ago

Looking at the docs, it seems $\hat{\mu} > \mu$ is the only time we can get $\tilde{q}_{\mu} = 0$ (whether asimov or not).

Since this is $\tilde{q}_{\mu}$, we're typically dealing with $\hat{\mu} \geq 0$ but it's not clear to me what to do in the case that $\mu \equiv \hat{\mu}$ which I suspect is possibly why you run into this scenario?

masonproffitt commented 2 years ago

Looking at the docs, it seems μ^>μ is the only time we can get q~μ=0 (whether asimov or not).

The docs don't actually match equation (16) in the paper, where the first line is $\hat{\mu} \leq \mu$.

kratsg commented 2 years ago

Actually, the code matches the paper, but the docs don't match the code

https://github.com/scikit-hep/pyhf/blob/cbd68c829748fff11bf44adac2ad9e9b15068af2/src/pyhf/infer/test_statistics.py#L30-L32