Closed ekourlit closed 2 years ago
Hi @ekourlit, thanks for reporting this! My guess is that some of the bins are empty after changing the binning, and they will have zero staterror
. This causes the crash during model building that you observe because zero-rate Poissons were not supported by pyhf
in v0.6.3
. https://github.com/scikit-hep/pyhf/pull/1657 changed this, which would explain why you do not see the crash anymore with master
.
Here is an example workspace for this:
{
"channels": [
{
"name": "SR",
"samples": [
{
"data": [10.0, 0.0],
"modifiers": [
{
"data": [1.0, 0.0],
"name": "staterror_SR",
"type": "staterror"
},
{"data": null, "name": "mu", "type": "normfactor"}
],
"name": "Signal"
}
]
}
],
"measurements": [
{
"config": {"parameters": [], "poi": "mu"},
"name": "minimal_example"
}
],
"observations": [{"data": [15, 0], "name": "SR"}],
"version": "1.0.0"
}
which will fail at the assertion you point out:
assert len(sigmas[sigmas > 0]) == pdfconfig.param_set(mod).n_parameters
If this is indeed the explanation for what you observe, then the next release of pyhf
should fix the behavior. Note that mixing the master
of pyhf
with cabinetry
version 0.4.1
can cause problems due to the paramset.suggested_fixed_as_bool
API change, so in that case it may be best to use the fix/pyhf-070
branch of cabinetry
with the master
of pyhf
.
Two other things that can help:
staterror
and the nominal sample yield in the empty bins to some small number (e.g. 10e-6
), which could be done in cabinetry
(or manually as a post-processing step after building a workspace),staterror
terms, which would need to be supported by pyhf
(https://github.com/scikit-hep/pyhf/issues/662).A post-processing step may be a decent stop-gap solution until the next pyhf
release. The pruning approach will only work if there is at least one other sample with non-zero nominal yield, since otherwise the total model yield per bin will still be zero and minimization will fail (in pyhf
version 0.6.3
, also fixed in master).
Thanks @alexander-held for the quick reply! That was exactly the problem! I previously checked for zero yields in each sample/bin but the problem was actually the staterror
of the sum of the samples/bin.
As a temporary workaround I modified the workspace with a post-processing step as you mentioned. I looped over the bins (nominal yields and staterror
modifier) and substituted zeros with 1e-5
. Below is the utility function for reference. Although I would probably need to add another loop if I include systematic variations.
def zeroless_ws(workspace):
'''
Utility function to look all the bins of the workspace and substitute zeros (nominal and statterror) with 1e-5
this is a temporary fix unlti the next release of cabinetry+pyhf that could handle zero statterror
'''
# loop over regions
for region_idx in range( len(workspace['channels']) ):
region = workspace['channels'][region_idx] #dict
# print("region name %s" % region['name'])
# loop over samples
for sample_idx in range( len(region['samples']) ):
sample = region['samples'][sample_idx] #dict
# print(sample) #dict
# loop over sample data bins and replace zeros
for bin_idx in range( len(sample['data']) ):
counts = sample['data'][bin_idx]
# print(counts)
if counts == 0.0:
print("found an empty bin! \t substituting with 1e-5...")
sample['data'][bin_idx] = 1e-5
staterror_modifier = sample['modifiers'][0] #dict
# double check this is a statterror type
if not staterror_modifier['type'] == 'staterror':
print("another type of modifier found! \t breaking!")
break
# loop over sample modifiers staterror bins and replace zeros
for bin_idx in range( len(staterror_modifier['data']) ):
counts = staterror_modifier['data'][bin_idx]
# print(counts)
if counts == 0.0:
print("found an empty staterror bin! \t substituting with 1e-5...")
staterror_modifier['data'][bin_idx] = 1e-5
# fix workspace, i.e. replace zero bin yields and staterrors with 1e-5
zeroless_ws(ws)
Thank you for confirming that this was the issue in your setup. Given that the next release of pyhf
will resolve this, I'll link this issue to be resolved by https://github.com/scikit-hep/cabinetry/pull/301 (no changes needed in that PR to address this specifically, but it introduces compatibility with the next pyhf
release and thereby will solve this issue too).
We could also consider adding this workaround to cabinetry
directly. I am personally in favor of avoiding these kind of "magic" fixes where possible, since they may be unintuitive to users. I am happy to do so though if there is interest.
I'm opening this issue for an error I face when I change the bins in one of my regions.
I have a multiple region fit setup, where all the regions are single binned, except one which is multi binned. When I change the binning of the multi bin region, I get an
AssertionError
from thispyhf
line. This is not happening with every binning, I have other bin configurations that work fine (even with empty bins): example1, example2.I am using
cabinetry
version 0.4.1 andpyhf
version 0.6.3. I note that the exact above erroneous line do not exist in currentmaster
branch ofpyhf
.The fit configuration can be found here. The steering notebook here. And the error text here.
Any insight will be helpful! Thanks in advance!