Ga0l commented 3 years ago

Description

Using pyhf.infer.hypotest with poi_test < 1. for ATLAS-SUSY-2018-06 returns a NaN as observed CLs. Moreover, for poi_test > 1. observed and expected CLs seem to be very close.

Expected Behavior

Observed CLs shouldn't be a NaN for poi_test = 0.999

Actual Behavior

Observed CLs is a NaN for poi_test = 0.999

Steps to Reproduce

Running the following code with pyhf 0.6.0 (with background only json file from ATLAS-SUSY-2018-06) :

import pyhf
pyhf.set_backend(b"pytorch")
import json
import jsonpatch

# ATLAS-SUSY-2018-06 json file
with open("./SUSY-2018-06_likelihoods/BkgOnly.json", "r") as f:
    bkg = json.load(f)
# A sample patch produced by SModelS
patch = [
    {
        "op": "add",
        "path": "/channels/2/samples/0",
        "value": {
            "data": [
                3.155344731368271
            ],
            "modifiers": [
                {
                    "data": None,
                    "type": "normfactor",
                    "name": "mu_SIG"
                },
                {
                    "data": None,
                    "type": "lumi",
                    "name": "lumi"
                }
            ],
            "name": "bsm"
        }
    },
    {
        "op": "add",
        "path": "/channels/3/samples/0",
        "value": {
            "data": [
                21.84465526863173
            ],
            "modifiers": [
                {
                    "data": None,
                    "type": "normfactor",
                    "name": "mu_SIG"
                },
                {
                    "data": None,
                    "type": "lumi",
                    "name": "lumi"
                }
            ],
            "name": "bsm"
        }
    },
    {
        "op": "remove",
        "path": "/channels/1"
    },
    {
        "op": "remove",
        "path": "/channels/0"
    }
]
#Trying to reproduce the issue
llhdSpec = jsonpatch.apply_patch(bkg, patch)
msettings = {'normsys': {'interpcode': 'code4'}, 'histosys': {'interpcode': 'code4p'}}
workspace = pyhf.Workspace(llhdSpec)
model = workspace.model(modifier_settings=msettings)
result = pyhf.infer.hypotest( .9999, workspace.data(model), model, test_stat="qtilde", return_expected=True)
print(result)

returns (tensor(nan), tensor(1.)) increasing poi_test above 1.0 seems to give very close observed and expected CLs :

mu = 1
(tensor(0.9845), tensor(0.9847))
mu = 2
(tensor(0.0070), tensor(0.0070))
mu = 3
(tensor(1.1504e-07), tensor(1.1509e-07))
mu = 4
(tensor(1.0611e-13), tensor(1.0617e-13))
mu = 5
(tensor(1.8170e-20), tensor(1.8179e-20))

P.S. : with numpy as backend, problems seem to appear near poi_test = 0.6 P.P.S : similar issues seem to appear with ATLAS-SUSY-2018-22

Checklist

[ ] Run git fetch to get the most up to date version of master (updated to 0.6.0)
[x] Searched through existing Issues to confirm this is not a duplicate issue
[x] Filled out the Description, Expected Behavior, Actual Behavior, and Steps to Reproduce sections above or have edited/removed them in a way that fully describes the issue
[x] Filled out a requirements.txt

matthewfeickert commented 3 years ago

@Ga0l thanks for the Issue and for reporting this. :+1: What version of torch do you have installed? Can you give us a requirements.txt?

Ga0l commented 3 years ago

Here it is : requirements.txt (also editing the main description) so torch version is 1.6.0

Ga0l commented 3 years ago

I've updated torch to 1.7.1 and it's even worth : the observed CLs is a NaN at least up to poi_test = 1.2.

alexander-held commented 3 years ago

This looks like it could be a convergence issue similar to #1278. In that case a temporary solution may be lowering the POI bound from 0 to something like -5 or so, and using test_stat="q" in infer.hypotest.

Ga0l commented 3 years ago

Using test_stat="q" indeed solved the NaN problem, however changing the lower bound to a negative number doesn't seem to be necessary but it slightly change the returned CLs. Thanks for the tip!

kratsg commented 3 years ago

Using test_stat="q" indeed solved the NaN problem, however changing the lower bound to a negative number doesn't seem to be necessary but it slightly change the returned CLs. Thanks for the tip!

https://pyhf.readthedocs.io/en/v0.6.0/_generated/pyhf.infer.test_statistics.qmu.html vs https://pyhf.readthedocs.io/en/v0.6.0/_generated/pyhf.infer.test_statistics.qmu_tilde.html.

Essentially, if $\hat{\mu} < 0$ - then you would want to allow $\mu$ (your POI) to float below zero in order to capture that behavior. In this case, $q_\mu$ is a more appropriate test statistic. It is perhaps possible that the particular signal you picked corresponds to a slightly negative $\hat{\mu}$.

However, they should be similar to each other (negligible differences in most cases) so I'm not fully sure that switching the test stat truly explains what goes on.

kratsg commented 3 years ago

For what it's worth, I took the existing code and ran it through minuit by changing the top line of the code to set the minuit optimizer in the backend: pyhf.set_backend(b"pytorch", b"minuit")

┌──────────────────────────────────┬──────────────────────────────────────┐
│ FCN = 125.7                      │             Nfcn = 1355              │
│ EDM = 0.0773 (Goal: 0.0002)      │              Ngrad = 15              │
├───────────────┬──────────────────┼──────────────────────────────────────┤
│ Valid Minimum │ Valid Parameters │        No Parameters at limit        │
├───────────────┴──────────────────┼──────────────────────────────────────┤
│ Below EDM threshold (goal x 10)  │           Below call limit           │
├───────────────┬──────────────────┼───────────┬─────────────┬────────────┤
│  Covariance   │     Hesse ok     │APPROXIMATE│NOT pos. def.│   FORCED   │
└───────────────┴──────────────────┴───────────┴─────────────┴────────────┘

and this returns a different result:

(tensor(0.8436), tensor(0.8448))

which is certainly different from the scipy variation. I also repeated this check with jax instead, and I see the same behavior as with torch - so this might be the scipy optimizer having problems.

Note: this is a pyhf.infer.mle.fixed_poi_fit rather than the hypotest, but...

kratsg commented 3 years ago

For some more progress, seems like some of the normfactors do hit the parameters at the limit during the non-gradient version of the optimization (do_grad=False).

This corresponds to one of the normfactors, and it seems like minuit notices it hitting a boundary at some point...

bounds = model.config.suggested_bounds()
bounds[40] = (-10, 10)

might need to do something similar to the others.

Ga0l commented 3 years ago

For some more progress, seems like some of the normfactors do hit the parameters at the limit during the non-gradient version of the optimization (do_grad=False).

This corresponds to one of the normfactors, and it seems like minuit notices it hitting a boundary at some point...
bounds = model.config.suggested_bounds()
bounds[40] = (-10, 10) 
might need to do something similar to the others.

With minuit, I get the following error : TypeError: __init__() got multiple values for keyword argument 'grad' is that related to the do_grad=False you're talking about?

alexander-held commented 3 years ago

This sounds like you may have iminuit<2 installed? pyhf 0.6.0 raised the requirement: https://github.com/scikit-hep/pyhf/blob/6c28066e15758f8a641c7b233f90185aae6b8baa/setup.py#L15

Ga0l commented 3 years ago

Indeed it works now with the latest iminuit version. However, the (pytorch, minuit) duo is very very slow (~20 secs per hypotest call) and I also have to use test_stat = 'q'. In the end, pytorch with the default optimizer seems to be the fastest (with test_stat = 'q'). Thanks to that I've been able to run a scan using the SModelS/pyhf interface and I am under-excluding a lot as you can see on the following plot : TChiWZ_2EqMassAx_EqMassBy_combined_pretty and actually, by looking at a few points in details, observed and expected CLs only differ to less than 1% so I would get the same grey contour for the expected exclusion.

As it was the first analysis to have several fixed parameters, and all previous analyses were working great, maybe it could be related to that? Maybe I should add more informations in the patch or use hypotest differently? (the patch showed in Steps to reproduce was generated by SModelS and hypotest is used exactly in the same way it is in SModelS)

kratsg commented 3 years ago

As it was the first analysis to have several fixed parameters, and all previous analyses were working great, maybe it could be related to that?

likely not. fixed parameters shouldn't impact anything, since it's just keeping that particular variable fixed in a fit. I will note that you bumped to minuit v2 which is noticeably different from minuit v1 -- whereas scipy hasn't seen such a large change at the same time.

I don't think the test stat should have a speed difference included here (no idea why, since they're similar calculations across the board). It does sound like we might need to see if we're using minuit wrong as of v2.0 perhaps, but it's passed all of our tests (1000+) so unsure right now.

sabinekraml commented 3 years ago

It might actually be good to ask the authors of ATLAS-SUSY-2018-06 for a closure test, to make sure the problem is indeed technical and there is no issue with the json file itself.

kratsg commented 3 years ago

@sabinekraml @Ga0l -- I've checked this against our latest release (0.6.0) and I get the following

$ python test.py # mu = 0.9999
(tensor(0.9869), tensor(0.9868))
$ python test.py # mu = 1.0
(tensor(0.9891), tensor(0.9892))
$ python test.py # mu = 2.0
(tensor(0.0070), tensor(0.0069))

and I suspect you just hit the scipy bug that's now fixed in scipy 1.6.0+. Please confirm if things look ok now. Note: likelihoods when published with ATLAS go through a rigorous validation procedure I can describe in more detail offline - but the serialization of the likelihood should be quite robust. However, our pyhf implementation will always have numerical issues as we depend on other packages that may or may not run into weird issues.

scipy in this case had problems respecting the bounds of specific variables during it's optimization routine which got completely rewritten recently, a few months ago, and will generally be a lot better now.

kratsg commented 3 years ago

If everyone looks ok, please feel free to close the issue. Specifically, scipy/scipy#13009 is one of the big improvements we see in 1.6.0 of scipy.

jackaraz commented 2 years ago

Hi all, I would like to iterate on the same topic from MadAnalysis 5 side. I did experience similar nan observed CLs results as @Ga0l did and following your advice I managed to fix that problem. However, for some reason, this particular analysis gives us quite off looking results when pyhf is used (the exclusion is about 100 GeV less than both observed values and Ma5's native prediction) where with MadAnalysis 5's internal CLs computation (does not use profile likelihoods simply via uncorrelated SRs) we manage to match observed results with good accuracy. In order to exemplify the problem I choose a mass point cha/n2 = 250 GeV n1 = 25 GeV. With Ma5's calculator, I find that this point is excluded with 1-CLs = 99%. The jsonpatch for this benchmark is as follows

# Generated by Ma5
sig = \
[{'op': 'add',
  'path': '/channels/2/samples/4',
  'value': {'name': 'MA5_signal_2',
   'data': [43.05427939752551],
   'modifiers': [{'data': None, 'name': 'lumi', 'type': 'lumi'},
    {'data': None, 'name': 'mu_SIG', 'type': 'normfactor'}]}},
 {'op': 'add',
  'path': '/channels/3/samples/4',
  'value': {'name': 'MA5_signal_3',
   'data': [4.595283390608944],
   'modifiers': [{'data': None, 'name': 'lumi', 'type': 'lumi'},
    {'data': None, 'name': 'mu_SIG', 'type': 'normfactor'}]}},
 {'op': 'remove', 'path': '/channels/1'},
 {'op': 'remove', 'path': '/channels/0'}]

using the same json file for the background as @Ga0l mentioned earlier, I get 1-CLs (observed) = 0.0244 and expected = 0.0244. I used the following piece of code to get this result;

def get_cls(bkg, sig, bounds = None, mu = 1.):
    workspace = pyhf.Workspace(bkg)
    model     = workspace.model(patches=[sig],
                                modifier_settings={'normsys': {'interpcode': 'code4'},
                                                   'histosys': {'interpcode': 'code4p'}})
    CLs_obs, CLs_exp = pyhf.infer.hypotest(mu, 
        workspace.data(model),
        model, 
        test_stat="qtilde",
        par_bounds=bounds if bounds is not None else model.config.suggested_bounds(),
        return_expected=True
    )
    return 1 - CLs_obs, 1- CLs_exp

Here bkg refers to the JSON file that @Ga0l refers to. so I'm not sure if there is something wrong with my approach (because with ATLAS-SUSY-2018-031 and ATLAS-SUSY-2019-08 I managed to get pretty close results with pyhf ) or should I approach ATLAS conveners as @sabinekraml suggested? It's not just this particular benchmark by the way that's why I'm quite concerned; I have attached two plots to exemplify my meaning.

Thanks

Specifications:

Scipy v1.7.1 (@kratsg I, unfortunately, do not see any difference in the results with v1.6+) NumPy v1.21.2 JSON v2.0.9 jsonpatch v1.32 pyhf v0.5.4 -> v0.6.4.dev7 also gives the same result

Screen Shot 2021-09-10 at 11 34 49 PM

jackaraz commented 2 years ago

Hi all, would it be possible to get a confirmation about my previous message that there is nothing wrong code-wise in pyhf or my usage of pyhf? I would like to ask ATLAS conveners, as @sabinekraml suggested if there is no bug in the implementation. Thanks

lukasheinrich commented 2 years ago

Hi @jackaraz

Can you point me to the MA5 Computation? What statistical model is used there?

jackaraz commented 2 years ago

Hi @lukasheinrich

you can find the JSON file in HEPData and an example signal point that we generate in ma5 is given in the previous message. You can find the wrapper that we use in Ma5 in this link but the simplified version is in my previous message above. The input of the function is simply dictionary (given JSON file for bkg and ma5-generated dict for signal). I believe the code should be correct unless there are things that I'm missing for pyhf. The results for other analyses are perfectly matching with the ATLAS results (ATLAS SUSY 2018 031 and 2019 - 08 are validated), it is only this analysis leading to quite strange results and I don't know why.

thanks

sabinekraml commented 2 years ago

Hi, Is anybody looking into this from the pyhf side, or should we contact the ATLAS conveners about that JSON file giving strange results? Thanks, Sabine

matthewfeickert commented 2 years ago

Hi, Is anybody looking into this from the pyhf side, or should we contact the ATLAS conveners about that JSON file giving strange results? Thanks, Sabine

I can try to get to this over the weekend or on Monday — I'm sorry that we've been slow in responding to all the Issues that people have opened up over the last 2 weeks but the dev team is just a bit backlogged across other work. :/

Thank you @jackaraz and @sabinekraml for following up on this though. Having people ping on Issues like this helps keep them visible so please don't stop!

jshahinian commented 2 years ago

Hi @sabinekraml,

Just want to say that we have looked into this within ATLAS, where these serialized likelihoods were thoroughly validated using minuit as the optimizer. If your near term goal is to reproduce the published contours, then I suggest using --optimizer minuit while the pyhf developers continue to investigate technical differences observed between minuit and scipy.

Take, for example, the (300, 100) point, which is excluded by the published ATLAS search. Using the default scipy optimizer:

> pyhf patchset extract patchset.json --name "ERJR_300p0_100p0" > ERJR_300p0_100p0.json
> jsonpatch BkgOnly.json ERJR_300p0_100p0.json | pyhf cls
{
    "CLs_exp": [
        0.9987474975959262,
        0.9991948729820783,
        0.9995786812818501,
        0.9998480942549817,
        0.9999708112483539
    ],
    "CLs_obs": 0.999576563194346
}

Compared to minuit:

> jsonpatch BkgOnly.json ERJR_300p0_100p0.json | pyhf cls --optimizer minuit
{
    "CLs_exp": [
        1.1482873747075978e-05,
        0.00018507568004640695,
        0.0025455767115708685,
        0.025913724305779498,
        0.1579634438837901
    ],
    "CLs_obs": 0.018130848669640688
}

We plan to update the README on HEPData for this result to reflect the use of minuit, given the current pyhf status.

Best, Jeff

sabinekraml commented 2 years ago

Dear Jeff,

Thanks for this feedback. Our goal is to use serialized likelihoods in SModelS. Reproducing the published contours is just a validation step, not a goal in itself. And switching optimisers depending on the analysis/likelihood is not a workable solution for us -- it would be technically feasible but it's contrary to the approach taken in SModelS, so it's not an option. By the way, if the evaluation of the likelihood depends on the optimiser used, that should be somewhat worrisome, shouldn't it. (By the way, we did try minuit at some point early on, but it did not resolve the issues with the ATLAS-SUSY-2018-06 likelihoods. @jackaraz might confirm.

Best, Sabine

On Thu, Nov 11, 2021 at 3:07 PM jshahinian @.***> wrote:

Hi @sabinekraml https://github.com/sabinekraml,

Just want to say that we have looked into this within ATLAS, where these serialized likelihoods were thoroughly validated using minuit as the optimizer. If your near term goal is to reproduce the published contours, then I suggest using --optimizer minuit while the pyhf developers continue to investigate technical differences observed between minuit and scipy.

Take, for example, the (300, 100) point, which is excluded by the published ATLAS search. Using the default scipy optimizer:

pyhf patchset extract patchset.json --name "ERJR_300p0_100p0" > ERJR_300p0_100p0.json jsonpatch BkgOnly.json ERJR_300p0_100p0.json | pyhf cls { "CLs_exp": [ 0.9987474975959262, 0.9991948729820783, 0.9995786812818501, 0.9998480942549817, 0.9999708112483539 ], "CLs_obs": 0.999576563194346 }

Compared to minuit:

jsonpatch BkgOnly.json ERJR_300p0_100p0.json | pyhf cls --optimizer minuit { "CLs_exp": [ 1.1482873747075978e-05, 0.00018507568004640695, 0.0025455767115708685, 0.025913724305779498, 0.1579634438837901 ], "CLs_obs": 0.018130848669640688 }

We plan to update the README on HEPData for this result to reflect the use of minuit, given the current pyhf status.

Best, Jeff

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-hep/pyhf/issues/1320#issuecomment-966330313, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3ROOBWRFRCMO65T23L62TULPE3PANCNFSM4XWUJIQQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--

Sabine Kraml - @.*** - www.kraml.net

LPSC 53 Av des Martyrs 38026 Grenoble France (+33)(0)4 76 28 40 52

jackaraz commented 2 years ago

Dear @jshahinian

Thanks for the update, I attached the comparison between two optimizers where I don't see any difference. The exclusion plots show you Ma5 results (in red) where there is no correlation assumed between SRs and the most sensitive SR has been chosen per grid point. The green curve is the one calculated by pyhf (for similar analyses please see this link). But let me reiterate @sabinekraml 's remarks.

In MadAnalysis (probably similar to SModelS) we do not expect users to know any third-party program, i.e. pyhf, NumPy, scipy etc. Since the bare bones of calculating any confidence limit are NumPy and scipy we only require those modules to exist which can be downloaded through MadAnalysis. Please note that user does not even required to know python! All exclusion limits are calculated as a black box in the background where the user gives the samples which are created from the Lagrangian in their fantasies and we calculate the exclusion limit to this particular theory that the user wants to explore.

In terms of the validation (this part is slightly different than SModelS since we actually recast the analysis), we do not use the signal likelihoods provided by the experiment. By following the footsteps laid down in the analysis we generate our own samples and create patches from our yields. As expected these final number of events do not possess any uncertainty just the cross-section times efficiency and luminosity. An example can be found in this post where the patch is generated by MadAnalysis.

So for this pipeline to be successful we need a universal way of releasing these likelihoods. Since users are allowed to implement their own recasts if one analysis releases a HistFactory-JSON file in a particularly different way we have no way of knowing unless we face a problem like in this analysis.

So to sum up, unfortunately, our issue is not with the optimizer as far as I can see from the plots below. For some reason, combined SRs are way under both the exclusion limits and uncorrelated Ma5 results which we can not understand why.

Best regards Jack atlas_susy_2018_06_obs_scipy atlas_susy_2018_06_obs_minuit .

jackaraz commented 2 years ago

Hi all, I ran some additional tests comparing released atlas patches with MadAnalysis, I attached the JSON files that I used in case you want to duplicate the process. I ran each patch file with and without minuit optimizer just like @jshahinian showed in here. I used the ERJR_350p0_100p0 patch sample and generated exact same thing with MadAnalysis additionally I removed all the modifiers from this file and called it ERJR_350p0_100p0_simplified simply because MadAnalysis can not reproduce these modifiers. I simply run

pyhf cls BkgOnly.json -p <XXX>.json --optimizer <YYY>

is either `scipy` or `minuit` and `.json` is one of the attached samples. `BkgOnly.json` file is a freshly downloaded background file from HEPData. With the MadAnalysis sample (`ma5_350_100`) I got ``` scipy = { "CLs_exp": [ 0.9994098231510591, 0.9996206785829924, 0.9998015308213578, 0.9999284522692509, 0.9999862539941256 ], "CLs_obs": 0.9998015322407497 } minuit = { "CLs_exp": [ 0.9837765328285882, 0.9895381866557293, 0.9945079752491377, 0.9980135538320977, 0.9996170864834312 ], "CLs_obs": 0.9945075915879901 } ``` with `ERJR_350p0_100p0.json` sample ``` scipy = { "CLs_exp": [ 0.9991152951646117, 0.9994313427800472, 0.9997024477558429, 0.9998927263670201, 0.9999793889261156 ], "CLs_obs": 0.9997024365487247 } minuit = pyhf.exceptions.FailedMinimization: Optimization terminated successfully. ``` This error has been included in the attached file below named as `pyhf.log`. with `ERJR_350p0_100p0_simplified.json` ``` scipy, minuit = { "CLs_exp": [ 1.0, 1.0, 1.0, 1.0, 1.0 ], "CLs_obs": NaN } ``` So as you can see, except for the failed runs, the results seems to be quite close to each other. **PS:** The patch set has been created by `patchset extract patchset.json --name "ERJR_350p0_100p0" > ERJR_350p0_100p0.json`. **System Settings** - iminuit v2.8.4 - pyhf, version 0.6.3 - scipy v1.7.1 - Python 3.8.9 **Correction** I realised that I provided the wrong files so corrected the results with the correct files, the previous results (you might have received it in the email) is for json patch `ERJR_350p0_150p0` so I corrected it to match with madanalysis sample. With `ERJR_350p0_150p0_simplified.json` `pyhf` was able to calculate the limits with `scipy` but giving above results for `minuit`. [Patch Files](https://github.com/scikit-hep/pyhf/files/7531071/patches.zip)

sabinekraml commented 2 years ago

Hi Jack,

When I wrote in the skype chat that

I took the patchset that's shipped with the analysis likelihood and removed the control regions (i.e. all the "op": "add", "path": "/channels/0/samples/5", and "op": "add", "path": "/channels/1/samples/5", ) Then, in the signal regions, I removed all the modifiers apart from mu_SIG. << This means just not patching anything in the control regions, not removing them altogether. Please try your ERJR_350p0_150p0_simplified.json without the part

{
    "op": "remove",
    "path": "/channels/1"
},
{
    "op": "remove",
    "path": "/channels/0"
}

For me, this works fine (with Python 3.9.1, pyhf 0.6.3 and iminuit 2.2.0). Another remark: in ma5_350_150.json you are patching onto samples 4 instead of samples 5, that is you have e.g. "path": "/channels/3/samples/4", instead of "path": "/channels/3/samples/5", Why is that?

Cheers, Sabine

On Sat, Nov 13, 2021 at 1:19 AM Jack Y. Araz @.***> wrote:

Hi all, I ran some additional tests comparing released atlas patches with MadAnalysis, I attached the JSON files that I used in case you want to duplicate the process. I ran each patch file with and without minuit optimizer just like @jshahinian https://github.com/jshahinian showed in here https://github.com/scikit-hep/pyhf/issues/1320#issuecomment-966330313. I used the ERJR_350p0_150p0 patch sample and generated exact same thing with MadAnalysis additionally I removed all the modifiers from this file and called it ERJR_350p0_150p0_simplified simply because MadAnalysis can not reproduce these modifiers. I simply run

pyhf cls BkgOnly.json -p .json --optimizer

is either scipy or minuit and .json is one of the attached samples. BkgOnly.json file is a freshly downloaded background file from HEPData. With the MadAnalysis sample (ma5_350_150) I got

scipy = { "CLs_exp": [ 0.9994098231510591, 0.9996206785829924, 0.9998015308213578, 0.9999284522692509, 0.9999862539941256 ], "CLs_obs": 0.9998015322407497 }

minuit = { "CLs_exp": [ 0.9837765328285882, 0.9895381866557293, 0.9945079752491377, 0.9980135538320977, 0.9996170864834312 ], "CLs_obs": 0.9945075915879901 }

with ERJR_350p0_150p0.json sample

scipy = { "CLs_exp": [ 0.9991876040442894, 0.9994778283495483, 0.9997267756876622, 0.9999014985827872, 0.9999810746688169 ], "CLs_obs": 0.999726850318796 } minuit = pyhf.exceptions.FailedMinimization: Optimization terminated successfully.

This error has been included in the attached file below named as pyhf.log.

with ERJR_350p0_150p0_simplified.json

scipy = { "CLs_exp": [ 0.9995030622522308, 0.9996806119608634, 0.9998328925464459, 0.9999397592906201, 0.9999884265669846 ], "CLs_obs": 0.9998329204666813 } minuit = { "CLs_exp": [ 1.0, 1.0, 1.0, 1.0, 1.0 ], "CLs_obs": NaN }

So as you can see, except for the failed runs, the results seems to be quite close to each other.

System Settings

iminuit v2.8.4

pyhf, version 0.6.3

scipy v1.7.1

Python 3.8.9

Archive.zip https://github.com/scikit-hep/pyhf/files/7531021/Archive.zip

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-hep/pyhf/issues/1320#issuecomment-967740788, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3ROOEEOL56FT2MF5YCVK3ULWVKPANCNFSM4XWUJIQQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--

Sabine Kraml - @.*** - www.kraml.net

LPSC 53 Av des Martyrs 38026 Grenoble France (+33)(0)4 76 28 40 52

jackaraz commented 2 years ago

Hi @sabinekraml

For me, this works fine (with Python 3.9.1, pyhf 0.6.3 and iminuit 2.2.0). Another remark: in ma5_350_150.json you are patching onto samples 4 instead of samples 5, that is you have e.g. "path": "/channels/3/samples/4", instead of "path": "/channels/3/samples/5", Why is that?

There are 5 samples, if I'm not mistaken last value indicates where to append this sample in the sample list. I don't see any difference when I move it to 0,4 or 5.

This means just not patching anything in the control regions, not removing them altogether. Please try your ERJR_350p0_150p0_simplified.json without the part

We are not recasting control or validation regions as far as I know those need to be removed. I did exactly the same thing in the example I sent here also SModelS does exact same thing when the efficiency maps are not available for a given region as far as I can see from the code. Please correct me if I'm wrong.

Additionally, even without removing anything, my minuit optimizer crashes. If it works with an earlier version of Minuit that means it's highly version dependent as well as optimizer.

Cheers Jack

sabinekraml commented 2 years ago

Hi,

Could somebody of the pyhf team explain to us what precisely

"measurements": [
    {
        "config": {
            "parameters": [

[...] [...] { "fixed": true, "name": "mu_BG" }

does?

Best, Sabine

alexander-held commented 2 years ago

It sets that parameter mu_BG to constant by default, so it is no longer a free parameter unless explicitly requested (model.config.suggested_fixed() will have it as fixed by default).

sabinekraml commented 2 years ago

OK, a priori this is clear. However mu_BG does not appear anywhere else. So how does setting it to constant have any effect? How and where is it defined?

On Sun, Nov 14, 2021 at 5:45 PM Alexander Held @.***> wrote:

It sets that parameter mu_BG to constant by default, so it is no longer a free parameter unless explicitly requested (model.config.suggested_fixed() will have it as fixed by default).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-hep/pyhf/issues/1320#issuecomment-968325329, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3ROOGRML7FZIXAP7KQIQLUL7RRBANCNFSM4XWUJIQQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--

Sabine Kraml - @.*** - www.kraml.net

LPSC 53 Av des Martyrs 38026 Grenoble France (+33)(0)4 76 28 40 52

alexander-held commented 2 years ago

Maybe someone from the analysis team knows why this parameter shows up there. As long as the parameter is not used in the model (and it appears indeed that it is not defined anywhere), this setting will have no effect. Perhaps it is a leftover from some tests.

jackaraz commented 2 years ago

Hi all, I ran some additional tests comparing released atlas patches with MadAnalysis, I attached the JSON files that I used in case you want to duplicate the process. I ran each patch file with and without minuit optimizer just like @jshahinian showed in here. I used the ERJR_350p0_100p0 patch sample and generated exact same thing with MadAnalysis additionally I removed all the modifiers from this file and called it ERJR_350p0_100p0_simplified simply because MadAnalysis can not reproduce these modifiers. I simply run
pyhf cls BkgOnly.json -p <XXX>.json --optimizer <YYY>
is either scipy or minuit and <XXX>.json is one of the attached samples. BkgOnly.json file is a freshly downloaded background file from HEPData. With the MadAnalysis sample (ma5_350_100) I got
scipy = {
    "CLs_exp": [
        0.9994098231510591,
        0.9996206785829924,
        0.9998015308213578,
        0.9999284522692509,
        0.9999862539941256
    ],
    "CLs_obs": 0.9998015322407497
}

minuit = {
    "CLs_exp": [
        0.9837765328285882,
        0.9895381866557293,
        0.9945079752491377,
        0.9980135538320977,
        0.9996170864834312
    ],
    "CLs_obs": 0.9945075915879901
}
with ERJR_350p0_100p0.json sample
scipy = {
    "CLs_exp": [
        0.9991152951646117,
        0.9994313427800472,
        0.9997024477558429,
        0.9998927263670201,
        0.9999793889261156
    ],
    "CLs_obs": 0.9997024365487247
}
minuit = pyhf.exceptions.FailedMinimization: Optimization terminated successfully.
This error has been included in the attached file below named as pyhf.log.

with ERJR_350p0_100p0_simplified.json
scipy, minuit = {
    "CLs_exp": [
        1.0,
        1.0,
        1.0,
        1.0,
        1.0
    ],
    "CLs_obs": NaN
}
So as you can see, except for the failed runs, the results seems to be quite close to each other.

PS: The patch set has been created by patchset extract patchset.json --name "ERJR_350p0_100p0" > ERJR_350p0_100p0.json.

System Settings

iminuit v2.8.4

pyhf, version 0.6.3

scipy v1.7.1

Python 3.8.9

Correction I realised that I provided the wrong files so corrected the results with the correct files, the previous results (you might have received it in the email) is for json patch ERJR_350p0_150p0 so I corrected it to match with madanalysis sample. With ERJR_350p0_150p0_simplified.json pyhf was able to calculate the limits with scipy but giving above results for minuit.

Patch Files

Hi @matthewfeickert and @lukasheinrich did you have a chance to check why I’m getting attached errors from pyhf? Am I doing something wrong or should I be using a specific version of NumPy + minuit?

Thanks

scikit-hep / pyhf

hypotest with poi_test < 1. returns obs CLs = nan (ATLAS-SUSY-2018-06) #1320

Description

Expected Behavior

Actual Behavior

Steps to Reproduce

Checklist