Closed tseiss closed 4 years ago
hi @tseiss can you try adding a small background uncertainty. It could be that the profiling gets confused if there are really no NPs
Hi Lukas,
Thank you for the prompt feedback!
I tried to add a shapesys modifier to the background, but had issues initializing the model. This could be my unfamiliarity with pyhf.
I tried changing the background to
'name': 'background',
'data': bkgrd,
'modifiers':[
{'name': "uncorr_bkguncrt", "type": "shapesys", "data": bkgrd_uncertainty}
],
where
bkgrd_uncertainty=[1e-7 for _ in bkgrd]
When I tried initializing the model, I got
Traceback (most recent call last):
File "LimitTest.py", line 59, in <module>
mod = pyhf.Model(spec)
File "/home/tseiss/.local/lib/python3.6/site-packages/pyhf/pdf.py", line 425, in __init__
batch_size=self.batch_size,
File "/home/tseiss/.local/lib/python3.6/site-packages/pyhf/pdf.py", line 292, in __init__
for k, c in modifiers.combined.items()
File "/home/tseiss/.local/lib/python3.6/site-packages/pyhf/pdf.py", line 292, in <dictcomp>
for k, c in modifiers.combined.items()
File "/home/tseiss/.local/lib/python3.6/site-packages/pyhf/modifiers/shapesys.py", line 58, in __init__
self.finalize(pdfconfig)
File "/home/tseiss/.local/lib/python3.6/site-packages/pyhf/modifiers/shapesys.py", line 114, in finalize
assert len(factors) == pdfconfig.param_set(pname).n_parameters
AssertionError
Since that didn't work, I tried using a shapefactor modifier so I didn't have to pass any extra data.
{ "name": "mod_name", "type": "shapefactor", "data": None }
but running with this, I got the same convergence issue reported above.
Thanks, Todd
Hi Todd,
three things
import pyhf
from pyhf import Model, infer
import numpy as np
signal=[0.00000000e+00,2.16147594e-04,4.26391320e-04,8.53157029e-04,
7.95947245e-04,1.85458682e-03,3.15515589e-03,4.22895664e-03,
4.65887617e-03,7.35380863e-03,8.71947686e-03,7.94697901e-03,
1.02721341e-02,9.24346489e-03,9.38926633e-03,9.68742497e-03,
8.11072856e-03,7.71003446e-03,6.80873211e-03,5.43234586e-03,
4.98376829e-03,4.72218222e-03,3.40645378e-03,3.44950579e-03,
2.61473009e-03,2.18345641e-03,2.00960464e-03,1.33786215e-03,
1.18440675e-03,8.36366201e-04,5.99855228e-04,4.27406780e-04,
2.71607026e-04,1.81370902e-04,1.03710513e-04,4.42737056e-05,
2.25835175e-05,1.04470885e-05,4.08162922e-06,3.20004812e-06,
3.37990384e-07,6.72843977e-07,0.00000000e+00,9.08675772e-08,
0.00000000e+00]
bkgrd=[1.47142981e+03,9.07095061e+02,9.11188195e+02,7.06123452e+02,
6.08054685e+02,5.23577562e+02,4.41672633e+02,4.00423307e+02,
3.59576067e+02,3.26368076e+02,2.88077216e+02,2.48887339e+02,
2.20355981e+02,1.91623853e+02,1.57733823e+02,1.32733279e+02,
1.12789438e+02,9.53141118e+01,8.15735557e+01,6.89604141e+01,
5.64245978e+01,4.49094779e+01,3.95547919e+01,3.13005748e+01,
2.55212288e+01,1.93057913e+01,1.48268648e+01,1.13639821e+01,
8.64408136e+00,5.81608649e+00,3.98839138e+00,2.61636610e+00,
1.55906281e+00,1.08550560e+00,5.57450828e-01,2.25258250e-01,
2.05230728e-01,1.28735312e-01,6.13798028e-02,2.00805073e-02,
5.91436617e-02,0.00000000e+00,0.00000000e+00,0.00000000e+00,
0.00000000e+00]
spec = {
'channels':[
{
'name': 'singlechannel',
'samples' : [
{
'name': 'signal',
'data': (np.asarray(signal)*500).tolist(),
'modifiers': [
{'name': 'mu', 'type': 'normfactor', 'data': None}
],
},
{
'name': 'background',
'data': (np.asarray(bkgrd) + 1e-7).tolist(),
'modifiers':[
{'name': 'uncrt', 'type': 'shapesys', 'data': (0.01*(np.asarray(bkgrd)+1e-7)).tolist()}
],
},
],
}
]
}
mod = pyhf.Model(spec)
hypo_tests = pyhf.infer.hypotest(1.0, mod.expected_data(mod.config.suggested_init()), mod,
return_expected_set=True,
return_test_statistics=True,
qtilde=True)
print(hypo_tests)
Hi Lukas,
Great! That works! Thank you!
However, I do need to be able to set limits with an expected limit on signal strength much greater than 10. There seems to be a hard cutoff at mu=10 in the hypotest.
pyhf.infer.hypotest(11,...) fails no matter how I scale the signal.
ERROR:pyhf.optimize.opt_scipy: fun: 505.70924080624684
jac: array([ 2.04551697e-01, 0.00000000e+00, 2.13623047e-04, -2.36511230e-04,
-7.62939453e-06, -7.62939453e-06, 1.29699707e-04, 7.62939453e-06,
-1.52587891e-05, -6.10351562e-05, 2.28881836e-05, 6.10351562e-05,
-3.05175781e-05, -1.52587891e-05, 0.00000000e+00, 1.52587891e-05,
3.05175781e-05, -7.62939453e-06, 3.05175781e-05, 3.81469727e-05,
-7.62939453e-06, 6.10351562e-05, -3.81469727e-05, -1.52587891e-05,
-3.81469727e-05, 1.14440918e-05, -2.28881836e-05, -3.05175781e-05,
2.67028809e-05, -7.62939453e-06, 0.00000000e+00, -3.81469727e-06,
-3.05175781e-05, 0.00000000e+00, -3.43322754e-05, 2.28881836e-05,
3.81469727e-05, 9.15527344e-05, -7.62939453e-06, -3.05175781e-05,
-2.28881836e-05, 4.57763672e-05, 4.57763672e-05, -7.62939453e-06,
-1.52587891e-05, -1.52587891e-05])
message: 'More than 3*n iterations in LSQ subproblem'
nfev: 2351
nit: 46
njev: 46
status: 3
success: False
x: array([10. , 1. , 0.99990347, 0.99981023, 0.9995238 ,
0.99949421, 0.99866214, 0.99738054, 0.99620117, 0.99544463,
0.99225924, 0.98992898, 0.98979699, 0.9856766 , 0.98585522,
0.98379665, 0.98156352, 0.98313751, 0.98260288, 0.98351114,
0.98586593, 0.98606199, 0.98585622, 0.98934788, 0.98865894,
0.99099685, 0.99213976, 0.99256636, 0.99486679, 0.9953864 ,
0.99667016, 0.99757958, 0.99826403, 0.99889171, 0.99925408,
0.99957531, 0.99981904, 0.99990426, 0.99995477, 0.9999821 ,
0.99998647, 0.99999869, 1.00000026, 0.99999993, 0.99999993,
0.99999994])
Traceback (most recent call last):
File "LimitTest2.py", line 82, in <module>
return_expected_set=True)
File "/home/tseiss/.local/lib/python3.6/site-packages/pyhf/infer/__init__.py", line 84, in hypotest
qmu_v = qmu(poi_test, data, pdf, init_pars, par_bounds)
File "/home/tseiss/.local/lib/python3.6/site-packages/pyhf/infer/test_statistics.py", line 34, in qmu
mu, data, pdf, init_pars, par_bounds, return_fitted_val=True
File "/home/tseiss/.local/lib/python3.6/site-packages/pyhf/infer/mle.py", line 62, in fixed_poi_fit
**kwargs,
File "/home/tseiss/.local/lib/python3.6/site-packages/pyhf/optimize/opt_scipy.py", line 47, in minimize
assert result.success
AssertionError
How can I get around this?
Thank you! Todd
Hi @tseiss
what's the use case? you should always be able to rescale the signal to appropriate units no?
I need to do a mu scan to find the expected 95% confidence limit on the signal strength. You're saying that I could, for example, just scan the normalization of the signal in the model and keep the POI in the hypotest fixed to 1.0? I think it is equivalent.
It's just that in the original script I was scanning the POI in the hypotest up to 80 (I do have a normalization problem upstream -- I expect the limit to be around mu=20, but that doesn't matter for this). From the examples I've seen, it seems that scanning the POI is how this is done. Is there a reason this was changed?
@tseiss my point is that with an appropriate choice of units it should always be possible to make sure that the reasonable interval for µ is [0,10] .. if you expect to need to scan up to µ=100 you could consider just scaling up your signal by say 20x and then you need to only scale up to µ=5. .. to get the upper limit in your original units just divide by 20 again
It's just that in the original script I was scanning the POI in the hypotest up to 80 (I do have a normalization problem upstream -- I expect the limit to be around mu=20, but that doesn't matter for this). From the examples I've seen, it seems that scanning the POI is how this is done. Is there a reason this was changed?
the default bounds for normfactor (mu) is [0,10]
and they can be changed to widen the bounds.. however, the math starts getting wonky when you consider incredibly large mu
values.
Ok, thanks Lukas and Giordon. This makes sense. I fixed my normalization issues and just confirmed that scaling the signal in the model works (no errors thrown) and get me in the ballpark of the correct answer.
One last question: if I want a stat-only limit, do I need to manually set the shape error in each bin to sqrt(N)? Or do I just set a small systematic error in each bin?
Thanks guys!
depends on what you mean by that. a statistical variance is implied in the model which assumes a poisson distribution. If you would further like to include the statistical uncertainty in the background sample due to MC statistics you would need to add "staterror" modifiers.
Ok, that makes sense. Thank you!
seems like we've solved most of the issues here. closing, feel free to reopen or open new issues later.
For posterity, this question is now Fit convergence failure in pyhf for small signal model on Stack Overflow (which helps us with PR #759 — thanks @tseiss!)
Description
I am trying to do a relatively simple hypotest with pyhf 0.4.0, but I am consistently getting a convergence issue that I did not have before I updated pyhf (I had been using 0.1.2). Either there is a change in the syntax between versions that I don't know about, or there is a newly introduced bug.
Expected Behavior
It should converge.
Actual Behavior
I get the divide by zero and invalid value in multiply errors immediately. Then it processes for a while, then exits with the below message.
Steps to Reproduce
I setup LCG_96bpython3, then run the following script:
(Note that when I updated to 0.4.0 I changed pyhf.utils.hypotest to pyhf.infer.hypotest)
Checklist
git fetch
to get the most up to date version ofmaster