alexander-held commented 4 years ago

Description

I observed a larger than expected difference in observed CLs values between pyhf and ROOT for a simple workspace. The workspaces in .root format (two versions, built with ROOT 6.20 and 6.18) and .json are here: /afs/cern.ch/user/a/alheld/public/pyhf_limit_workspaces

pyhf CLs

Using this script, I obtain the pyhf result:

import sys
import json
import pyhf

with open(sys.argv[-1]) as f:
    spec = json.load(f)

ws = pyhf.Workspace(spec)
model = ws.model()
data = ws.data(model)

# pyhf.set_backend("numpy", pyhf.optimize.minuit_optimizer(verbose=False))
poi = 3.58
results = pyhf.infer.hypotest(poi, data, model, qtilde=True, return_expected_set=True)
print(f"observed CLs at POI = {poi} is {results[0]:.6f}")

-> observed CLs at POI = 3.58 is 0.083035 Uncomment the backend change to get an equivalent result with MINUIT.

ROOT CLs

The corresponding ROOT result is obtained with validation/run_single.py. The only modification is to use

calc.RunFixedScan(1, 3.58, 3.58)

to evaluate the CLs at the same POI setting. The result is -> "CLs_obs": 0.05083447835045844

Further comments

The difference between 5.1% with ROOT and 8.3% with pyhf seems significant. The expected CLs values also differ. The scan over POI values looks smooth, and does not suggest that this is a convergence issue for that particular POI value.

I have not tested sufficiently many setups yet to conclude anything with certainty, but the difference between pyhf and ROOT seemed a lot smaller when not including additional normalization factors beyond the POI. When removing the extra normalization factor, the ROOT observed limit is 3.15, the observed CLs at that point is "CLs_obs": 0.05045563620180348. With pyhf, the result matches: observed CLs at POI = 3.15 is 0.050751. The corresponding .root / .json workspaces are also in the same folder.

Additional interesting piece of information: in the workspace including the additional normfactor, a fit with the POI fixed to 3.58 results in a set of best-fit parameters that predicts a negative background contribution in the last bin.

Expected Behavior

The observed CLs value should agree between pyhf and ROOT.

Actual Behavior

The values disagree.

Steps to Reproduce

See above for more, pyhf version is 0.5.3.

Checklist

[X] Run git fetch to get the most up to date version of master
[X] Searched through existing Issues to confirm this is not a duplicate issue
[X] Filled out the Description, Expected Behavior, Actual Behavior, and Steps to Reproduce sections above or have edited/removed them in a way that fully describes the issue

lukasheinrich commented 4 years ago

the following seems also has a similar problem (CLs matches around µ=3 and diverges by µ=4). I trued to cut on the number of bins/modiifiers

{
    "channels": [
        {
            "name": "Signal_region",
            "samples": [
                {
                    "data": [50],
                    "modifiers": [
                        {"data": null,"name": "Bg_norm","type": "normfactor"},
                        {"data": {"hi_data": [60],"lo_data": [40]},"name": "Modeling","type": "histosys"}
                    ],
                    "name": "Background"
                },
                {
                    "data": [25.0],
                    "modifiers": [
                        {"data": null,"name": "Signal_norm","type": "normfactor"}
                    ],
                    "name": "Signal"
                }
            ]
        }
    ],
    "measurements": [
        {
            "config": {
                "parameters": [
                    {"auxdata": [1.0], "bounds": [[0.5,1.5]], "fixed": true, "inits": [1.0], "name": "lumi", "sigmas": [0.1]}
                ],
                "poi": "Signal_norm"
            },
            "name": "minimal_example"
        }
    ],
    "observations": [
        {
            "data": [60.0],
            "name": "Signal_region"
        }
    ],
    "version": "1.0.0"
}

lukasheinrich commented 4 years ago

interestingly changing

                        {"data": {"hi_data": [100],"lo_data": [12]},"name": "Modeling","type": "histosys"}

too

                        {"data": {"hi_data": [100],"lo_data": [40]},"name": "Modeling","type": "histosys"}

seems to fix it.. solildifying the hunch that maybe negative yields are the issue

scikit-hep / pyhf

Difference in observed CLs values between pyhf and ROOT #1140