scikit-hep / pyhf

pure-Python HistFactory implementation with tensors and autodiff
https://pyhf.readthedocs.io/
Apache License 2.0
283 stars 83 forks source link

Start discussion with SModelS developers about pyhf integration #620

Closed matthewfeickert closed 4 years ago

matthewfeickert commented 5 years ago

Description

@lukasheinrich has already had some discussions with the SModelS developers (I think mainly @WolfgangWaltenberger) about SModelS using pyhf. We should further pursue these discussions.

WolfgangWaltenberger commented 4 years ago

JFYI, we in SModelS have started our effort on interfacing pyhf with our code (so essentially SModelS and also MA5). Sabine's student @Ga0l and myself will be working on the implementation. We already have a first slew of questions and we would like to enter a phase of intensive exchange with you guys, if thats ok for you guys? If yes, whats your preferred means of communication? Emails, skype, etc?

Cheers, Wolfgang

lukasheinrich commented 4 years ago

@WolfgangWaltenberger probably github issues are the best. Of course email works as well, but for API stuff, where possibly the community at large would be interested more generally it would be useful to have this on GitHub

Ga0l commented 4 years ago

First, a technical problem: We are trying to reproduce the CLs calculation which is done when running the command line "pyhf cls" with an additional patch replacing the data yields (patch.bsm.json in attachment.zip). For now, I just made a little code (cls_test.py also in attachment.zip) but we would like to use it like that in SModelS. The problem is that it doesn't give the same CLs result as with the command line:

pyhf cls RegionA/BkgOnly.json -p RegionA/patch.sbottom_900_250_60.json -p patch.bsm.json

Why is it different? Do you have any idea? I also tried to forget the BSM patch and keep only the first one but it still is a bit different (see the CLs results in diff_cls.out in the attached zip).

For the BSM patch, I made a simple json patch like you described it in ATL-PHYS-PUB-2019-029 (Listing 16) to replace the data yields for reinterpration with other BSM models. What we are wondering is: Is it sufficient to replace the "/channels/0/samples/0/data" value or do we need to change the modifiers as well?

kratsg commented 4 years ago

Hi @Ga0l : modifiers_settings -> modifier_settings.

Ga0l commented 4 years ago

Thanks a lot @kratsg for your answer, it solved the problem. About the second question:

For the BSM patch, I made a simple json patch (patch.bsm.zip) like you described it in ATL-PHYS-PUB-2019-029 (Listing 16) to replace the data yields for reinterpration with other BSM models. What we are wondering is: Is it sufficient to replace the "/channels/0/samples/0/data" value or do we need to change the modifiers as well?

I've noticed the modifiers data vary with respect to the masses by comparing two patches (say "RegionA/patch.sbottom_1300_530_400.json" and "RegionA/patch.sbottom_1600_1595_60.json"). We'd like to be able to produce this kind of patch for BSM models for any mass point. In SModelS, to do so, the efficiency is interpolated from the efficiencies of given masses provided by the analyses.

I think we will first do the same and replace the signal yields by the BSM predictions (given the BSM cross section, the interpolated efficiency and the luminosity) whithout touching the associated modifiers.

However, the question still remains ; how can we perform a mass interpolation and handle the modifiers correctly?

Thanks, Gaël

kratsg commented 4 years ago

Hi @Ga0l , you can (to first approximation) drop off all modifiers except for the POI (and maybe lumi/cross-section) on the signal you make. The sbottom multi-b paper has more information about the uncertainties.

But yes, it would be excellent to determine the modifiers correctly. This is where the RECAST project comes in. You can provide the signal you want, and it will be processed through the same code that the sbottom team used and calculate all the systematics you would need to include.

./cc @lukasheinrich @cranmer

Ga0l commented 4 years ago

Ok, another thing that's bothering me is that with the patches provided on HEPData for ATL-PHYS-PUB-2019-029, the VR channels are removed but the CR channels are kept. However, do we really need the information in the CR channel for our computation or the relevant SR denoted as SR_*** are sufficient? Are these CR actually used for the CLs calculation?

lukasheinrich commented 4 years ago

yes, in principle the signal can leak into CRs. If you do not have the capability to compute yields in the CRs, you could set the yields to zero or remove them via the patch like the VRs, but whether this is valid is of course model dependent.

In the published analysis the final fit was done on CR + SR

Ga0l commented 4 years ago

Thanks @lukasheinrich ! We encountered another problem: We need to compute the upper limit on the signal strength modifier mu_SIG. For that we have made a code that varies mu until the CLs reaches 0.05. However, it seems that as soon as mu is greater than 10.0, pyhf raises an error.

I then tried to do the following command line: pyhf cls --testpoi 11.0 RegionA/BkgOnly.json -p RegionA/patch.sbottom_900_250_60.json and I get the following error. Would you know what is the issue?

Thanks, Gaël

lukasheinrich commented 4 years ago

Hi @Ga0l, yes the the bounds on the signal strength by default are [0,10]

you can change them via:

bounds = m.config.suggested_bounds()
bounds[m.config.poi_index] = [0,20]

and then pass bounds in hypotest

WolfgangWaltenberger commented 4 years ago

Btw, quick qn. Are you guys serious about the differentiability thing? E.g. can we use it with e.g. pytorch autograd? We would not be using this in the near future, but in the long run, this would be a killer feature!

lukasheinrich commented 4 years ago

yes of course we're serious :) this is already possible and used in the autodiff optimizer https://github.com/scikit-hep/pyhf/blob/master/src/pyhf/optimize/autodiff.py https://github.com/scikit-hep/pyhf/blob/master/src/pyhf/optimize/opt_pytorch.py

WolfgangWaltenberger commented 4 years ago

God I love the 21st century :))

Wolfgang

On Wed, 11 Dec 2019 at 20:38, Lukas notifications@github.com wrote:

yes of course we're serious :) this is already possible and used in the autodiff optimizer https://github.com/scikit-hep/pyhf/blob/master/src/pyhf/optimize/autodiff.py

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-hep/pyhf/issues/620?email_source=notifications&email_token=ABLSE47YLP5FOM47TZJSGQ3QYE6TFA5CNFSM4JEIINQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGUKFZA#issuecomment-564699876, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLSE4YJRXWSUF6HVYOAQ73QYE6TFANCNFSM4JEIINQQ .

kratsg commented 4 years ago

@WolfgangWaltenberger additionally, if you have suggestions for other backends you would like to see incorporated (or optimizers), feel free to let us know. We're going to hopefully incorporate jax soon (#377).

matthewfeickert commented 4 years ago

Hi @WolfgangWaltenberger, just wanted to let you know that we now have a pretty snappy JAX backed available in releases v0.3.4 onwards now that PR #377 has been merged in. We're liking JAX so expect to see some further improvements as well (@lukasheinrich already has some PRs on that subject open).

Ga0l commented 4 years ago

Hi,

I'm having a problem that seems close to this one. Again by varying the POI to find an upper limit, I got an assertion error as if the bounds were crossed. However, I am dynamically changing the bounds in order to avoid that and it seems to happen only for some values (in my case poi_test = 730.5244681411203). Here's the error:

New call of root_func() with mu =  10.0
mu bound : 10.0
/home/alguero/.local/lib/python3.7/site-packages/pyhf/tensor/numpy_backend.py:253: RuntimeWarning: invalid value encountered in log
  return n * np.log(lam) - lam - gammaln(n + 1.0)
1 - CLs :  0.00012986812926452096
New call of root_func() with mu =  100.0
mu bound : 110
1 - CLs :  0.0006741038592970039
New call of root_func() with mu =  1000.0
mu bound : 1010
/home/alguero/.local/lib/python3.7/site-packages/pyhf/infer/utils.py:62: RuntimeWarning: invalid value encountered in double_scalars
  CLs = CLsb / CLb
1 - CLs :  1.0
New call of root_func() with mu =  100.0
mu bound : 1010
1 - CLs :  0.0006741038592970039
New call of root_func() with mu =  1000.0
mu bound : 1010
1 - CLs :  1.0
New call of root_func() with mu =  954.9696448638172
mu bound : 1010
1 - CLs :  1.0
New call of root_func() with mu =  527.4848224319086
mu bound : 1010
1 - CLs :  0.0014617666746911118
New call of root_func() with mu =  933.5641138503321
mu bound : 1010
1 - CLs :  1.0
New call of root_func() with mu =  730.5244681411203
mu bound : 1010
ERROR:pyhf.optimize.opt_scipy:     fun: 2986.526676809377
     jac: array([-7.16552734e-02,  5.91941284e+02,  8.48580734e+03,  2.30274030e+06,
        2.19260705e+06, -2.93857990e+03,  0.00000000e+00, -2.40975037e+01,
       -1.72453613e+01, -1.14001465e+01,  0.00000000e+00, -1.04228210e+01,
        6.62438965e+00,  3.21028137e+01, -4.53353271e+01,  1.60440979e+01,
        2.33179321e+01,  3.30068970e+01,  0.00000000e+00,  0.00000000e+00,
       -6.26562500e+00, -1.19011230e+01, -1.05806885e+01, -1.32884521e+01,
       -4.59234619e+00, -1.98895264e+00, -7.32672119e+00, -9.78619385e+00,
       -2.97113037e+00,  1.85223389e+01, -3.97662354e+00, -1.02069702e+01,
        4.51583862e+00,  6.55146484e+01, -1.39204498e+02, -1.03129944e+02,
       -3.29564209e+01, -9.12549133e+01, -1.04159912e+02, -9.63010254e+01,
       -6.70816040e+00, -6.37317505e+01, -1.82036133e+01, -5.55774536e+01,
       -7.46965027e+01,  1.07088623e+01, -9.97900391e+00,  3.25897217e+00,
        1.35561523e+01,  1.20184326e+01,  2.34369507e+01,  0.00000000e+00,
       -7.70129395e+00,  2.62848511e+01,  1.57736206e+01,  1.01738892e+01,
        1.16876282e+02,  1.44717712e+01, -1.00000000e+01,  2.86949158e+01,
        2.27081299e+01,  2.40907593e+01,  9.81768799e+00])
 message: 'Inequality constraints incompatible'
    nfev: 402
     nit: 6
    njev: 6
  status: 4
 success: False
       x: array([ 7.30524468e+02,  1.08499895e+00,  1.60346257e-01,  5.10802609e-06,
        3.97761941e-06,  1.67331703e-03,  0.00000000e+00, -5.00000000e+00,
       -5.00000000e+00, -5.00000000e+00,  0.00000000e+00, -5.00000000e+00,
        5.00000000e+00,  5.00000000e+00, -5.00000000e+00,  5.00000000e+00,
        5.00000000e+00, -5.00000000e+00,  0.00000000e+00,  0.00000000e+00,
       -2.97551620e+00, -5.00000000e+00, -5.00000000e+00, -5.00000000e+00,
       -2.16508958e+00, -9.35125626e-01, -3.54906218e+00, -4.74335636e+00,
       -5.00000000e+00,  5.00000000e+00, -1.92641886e+00, -5.00000000e+00,
        2.11530614e+00,  5.00000000e+00, -5.00000000e+00, -5.00000000e+00,
       -5.00000000e+00, -5.00000000e+00, -5.00000000e+00, -5.00000000e+00,
        5.00000000e+00, -5.00000000e+00,  5.00000000e+00, -5.00000000e+00,
       -5.00000000e+00,  5.00000000e+00, -5.00000000e+00,  5.00000000e+00,
        5.00000000e+00,  5.00000000e+00,  5.00000000e+00,  0.00000000e+00,
       -3.80049578e+00,  1.00000000e+01,  5.00000000e+00,  5.00000000e+00,
        1.00000000e+01,  5.00000000e+00, -5.00000000e+00,  5.00000000e+00,
        5.00000000e+00,  5.00000000e+00,  5.00000000e+00])
Traceback (most recent call last):
  File "pyhfTest.py", line 26, in <module>
    result = ulcomputer.ulSigma(ulcomputer.workspaces[2], expected=True)
  File "/home/alguero/Work/smodels-utils/pyhfintegration/HEPData_workspaces/pyhfInterface.py", line 142, in ulSigma
    ul = optimize.brentq(root_func, lo_mu, hi_mu, rtol=1e-3, xtol=1e-3)
  File "/home/alguero/.local/lib/python3.7/site-packages/scipy/optimize/zeros.py", line 780, in brentq
    r = _zeros._brentq(f, a, b, xtol, rtol, maxiter, args, full_output, disp)
  File "/home/alguero/Work/smodels-utils/pyhfintegration/HEPData_workspaces/pyhfInterface.py", line 128, in root_func
    result = pyhf.infer.hypotest(test_poi, workspace.data(model), model, par_bounds=bounds, qtilde=True, return_expected = expected)
  File "/home/alguero/.local/lib/python3.7/site-packages/pyhf/infer/__init__.py", line 84, in hypotest
    qmu_v = qmu(poi_test, data, pdf, init_pars, par_bounds)
  File "/home/alguero/.local/lib/python3.7/site-packages/pyhf/infer/test_statistics.py", line 34, in qmu
    mu, data, pdf, init_pars, par_bounds, return_fitted_val=True
  File "/home/alguero/.local/lib/python3.7/site-packages/pyhf/infer/mle.py", line 62, in fixed_poi_fit
    **kwargs,
  File "/home/alguero/.local/lib/python3.7/site-packages/pyhf/optimize/opt_scipy.py", line 47, in minimize
    assert result.success
AssertionError

I tried to just run pyhf.infer.hypotest with poi_test = 730.5244681411203 and it crashes in the same way but it doesn't crash if e.g., I change the last digit (like 730.5244681411202).

Here's my code if needed:

def ulSigma (self, workspace, expected=False, mu_bound = 10.0):
        self.mu_bound = mu_bound
        def root_func(mu):
            print("New call of root_func() with mu = ", mu)
            # Opening main workspace file of region A
            # Same modifiers_settings as those use when running the 'pyhf cls' command line
            msettings = {'normsys': {'interpcode': 'code4'}, 'histosys': {'interpcode': 'code4p'}}
            model = workspace.model(modifier_settings=msettings)
            bounds = model.config.suggested_bounds()
            if mu > self.mu_bound:
                self.mu_bound = int(mu/10)*10 + 10
            print('mu bound :', self.mu_bound)
            bounds[model.config.poi_index] = [0,self.mu_bound]
            test_poi = mu
            result = pyhf.infer.hypotest(test_poi, workspace.data(model), model, par_bounds=bounds, qtilde=True, return_expected = expected)
            if expected:
                CLs = result[1].tolist()[0]
            else:
                CLs = result[0]
            print("1 - CLs : ", 1.0 - CLs)
            return 1.0 - self.cl - CLs
        hi_mu = 10.0
        lo_mu = 1.0
        while root_func(hi_mu) < 0.0:
            hi_mu *= 10
            lo_mu *= 10
        ul = optimize.brentq(root_func, lo_mu, hi_mu, rtol=1e-3, xtol=1e-3)
        return ul

The workspace for which this error appears is built with RegionC/BkgOnly.json from HEPData sbottom search and the following patch:

[
  {
    "op": "add",
    "path": "/channels/1/samples/0",
    "value": {
      "data": [
        5.2466267379e-05,
        3.3087303267000003e-05,
        7.391427164999999e-06,
        0.0
      ],
      "modifiers": [
        {
          "data": null,
          "type": "normfactor",
          "name": "mu_SIG"
        }
      ],
      "name": "bsm"
    }
  },
  {
    "op": "remove",
    "path": "/channels/5"
  },
  {
    "op": "remove",
    "path": "/channels/4"
  },
  {
    "op": "remove",
    "path": "/channels/3"
  },
  {
    "op": "remove",
    "path": "/channels/2"
  },
  {
    "op": "remove",
    "path": "/channels/0"
  }
]

Thanks in advance for your help,

Gaël

kratsg commented 4 years ago

Hi @Ga0l , can you try with other backends (e.g. pytorch)? Similarly, you can try with minuit instead of scipy. I wonder if you're just hitting some fundamental limitations due to float precision... (that's my initial gut)

lukasheinrich commented 4 years ago
poi_test = 730.5244681411203

that seems a bit excessive. Why is this so large? The POI standard bounds are [0,10] so if you could rescale your nominal signal by a factor 1000 that would be useful, then the poi_test is 0.7

i.e change your inputs to be x1000 like this and run with poi_test=0.7

[
        5.2466267379e-02,
        3.3087303267000003e-02,
        7.391427164999999e-03,
        0.0
      ]

you might need to do such a thing dynamically given the BSM point you are looking at which might be a bit annoying but will probably improve fit results. A common way to do this is to scale the histogram such that the nominal xsec is at µ=1

Ga0l commented 4 years ago

Hi all, Thanks for your advices. I've implemented a dynamic rescaling of the BSM signal I feed through the patch so now the POI is always in [0, 10]. However, I was still getting the same weird error for some random cases. I then switched to pytorch backend and it works really fine and it seems faster!

Thanks a lot!

WolfgangWaltenberger commented 4 years ago

To conduct a SModelS-internal discussion here, @Ga0l for SModelS I want us to stick with pytorch (b/c we use already pytorch in a different context in SModelS), so for us I would say the issue is resolved, as long as it works with pytorch ....

lukasheinrich commented 4 years ago

great! closing for now and feel free to reopen.

Ga0l commented 4 years ago

Hi all, Now everything seems to work technically. However, I tried to compare the cross section upper limits I get with the official results for a few mass points with the staus search. I seem to get results that are ~10-20% above the official ones for most of the point I tested.

For example, the upper limit for the mass point (280, 200) is 0.076981 bp (from the first plot on the HEPData page, which gives the upper limit for combined left and right handed staus production) and I get 0.09852.

Here's what I'm doing. We're taking the efficiencies and acceptances from the same link, which in this case, for efficiency*acceptance, is (0.00078238, 0.00077732) for (lowMass, highMass).

  1. I compute lumi*effciency*acceptance*1000 to get my signal prediction. I'm considering the cross section to be 1 pb so that the upper limit on mu is directly a cross section upper limit in pb. That gives (108.75081999999999, 108.04748) as signals for this mass point.
  2. The signals are rescaled by a factor 0.0625 so that the upper limit is in the range [0, 10], which gives (6.796926249999999, 6.7529675) as signals.
  3. I build a patch with these signals, with only POI and lumi modifiers.
  4. The patch is applied to SR-combined/BkgOnly.json, which gives the following worskpace.
  5. I compute the 95% upper limit on mu which is 1.576471 and that gives 0.09852944952137435 once rescaled.

All that is done within our code but I don't think it's relevant to look at the whole code so I tried to select the useful informations. But anyway, I attach it here, just in case. Why am I getting so much different results? Is it because I dropped off all the modifiers or because I removed the CRs? Are you noticing any mistakes in my procedure?

Many thanks,

Gaël

lukasheinrich commented 4 years ago

hi @Ga0l the efficiencies and acceptances are only approximations derived at truth level using an internal rivet-like implementation. While the upper limits quoted are from the reco-level analyis

I'm not terribly surprised that those nummbers don't match exactly. In fact I think we'd be quite interested to see how they relate, so if you could make a comparison that'd be super useful.

WolfgangWaltenberger commented 4 years ago

Wait ... you are talking about event-level signal efficiencies/acceptances, right? Cause @Ga0l took your efficiencies, but in another run he retrieved these numbers from the SModelS database, so essentially from HepData. Are you saying that the HepData SMS signal-efficiencies are at the truth level and thus only approximations to what ATLAS is using for reals?

kratsg commented 4 years ago

Wait ... you are talking about event-level signal efficiencies/acceptances, right? Cause @Ga0l took your efficiencies, but in another run he retrieved these numbers from the SModelS database, so essentially from HepData. Are you saying that the HepData SMS signal-efficiencies are at the truth level and thus only approximations to what ATLAS is using for reals?

So reco-level efficiencies are divided by the truth-level acceptance to get a reco-only efficiency. So if I look at the aux material:

Acceptance

Efficiency

Acceptance * Efficiency

Can you at least confirm you're matching the right numbers? Then, for the upper limit calculation, that's a model-independent fit [although it's not clear to me that one can directly compare an UL from the JSON workspaces to an UL from a model-independent fit directly].

I think I generally follow the procedure, although I need to think about how you correctly apply the 0.0625 scale factor...

lukasheinrich commented 4 years ago

anyways these efficiencies and acceptances are only approximations of the real analysis, the real analysis that produced the published contour is not necessarily capturable using these few numbers.

Ga0l commented 4 years ago

Wait ... you are talking about event-level signal efficiencies/acceptances, right? Cause @Ga0l took your efficiencies, but in another run he retrieved these numbers from the SModelS database, so essentially from HepData. Are you saying that the HepData SMS signal-efficiencies are at the truth level and thus only approximations to what ATLAS is using for reals?

@WolfgangWaltenberger I was actually using the SModelS database with the masses (280, 200) and I printed the efficiency*acceptance to my screen in order to extract the relevant information for a given example.

Acceptance

Efficiency

Acceptance * Efficiency

Can you at least confirm you're matching the right numbers?

If I look at the values on HEPData (these are the files used in the SModelS database), the numbers are : Acceptance

The numbers are more precise on HEPData but they seem to match yours. The acceptance efficiency is directly taken from Eff_Acc_table_SRhigh and Eff_Acc_table_SRlow and those tables are used, in digitized form, to create the SModelS database entry for this analysis, which allow us to retrieve the efficiencyacceptance for a given mass point (right @WolfgangWaltenberger?). Using SModelS database, I retrieve exactly these numbers : (0.00078238, 0.00077732).

I have another question about the first two figures on HEPData, X-section U.L 1 and X-section U.L 2 which gives the upper limit on the cross section, respectively for left and right staus combined production and for left only. The upper limits are quite different between the two figures, what is causing this?

Ga0l commented 4 years ago

Dear pyhf developers,

we made some good progress in the past weeks. We should send you a plot made with our SModelS/pyhf interface very soon.

  1. However, we are performing a root finding procedure and we noticed that pyhf takes quite a while to run. Is there a parameter in pyhf that allows to control the number of toy experiments? That way we might be able to reduce the running time.

  2. We also have a question about the GPUs. Do you have an easy mechanism to put everything on the GPUs in pyhf (I'm not comfortable with that, I don't know if the question makes sense, feel free to reformulate @WolfgangWaltenberger).

Many thanks,

Gaël

lukasheinrich commented 4 years ago

hi @Ga0l,

great to hear! would be very happy to see a plot!.

Re: 1. what's the time scale we're talking about here for a single pyhf.infer.hypotest call (if this is what you are running)

Re 2. Yes we can put things easily on GPUs, but so far it looks like the models must be quite complex in order to see a benefit. We will have students to look into exactly what the scaling properties are.

WolfgangWaltenberger commented 4 years ago

Hey, let me chime in a bit. I have a few issues to discuss, so let me split things up into a few comments.

Re: 1. Maybe an additional question is whether we have reimplemented some functionality that already exists in pyhf. What we have been doing is finding the root of "CLs - 0.95" via brent bracketing. @Ga0l coded this himself, but using (I think) a scipy implementation of brent bracketing (right @Ga0l ?) Is this how you guys envisage for us to compute the 95% limits, or is there something in pyhf that we should use?

lukasheinrich commented 4 years ago

Often in SUSY world (at least for ATLAS) results are only reported with µ=1. If you want an upper limit, yes you'd need to either to a grid scan, or brent root finding, or something like https://github.com/diana-hep/excursion

pyhf itself doesn't have utilities for that.

I'm still curious about the timing, how does the time it takes to find the UL compare the the upstream input computation?

WolfgangWaltenberger commented 4 years ago

Plots. So here is one of our first reasonable validation plots. Its preliminary, the black line is ATLAS-SUSY-2018-04 official exclusion curve. The grey line is SModelS using pyhf, running over the published data. We are using the official reference cross sections for stau stau production. You see, we overexclude a bit, but not crazy much.

TStauStau_2EqMassAx_EqMassBy_pretty

WolfgangWaltenberger commented 4 years ago

Speed. So the plot above took 720 CPU_core* minutes to produce, it ran over 350 theory (simplified models) points, so one point took about 2 minutes, single core. Basically all time (> 90%) was spent in pyhf. I currently dont have the time spent per pyhf.hypothesistest call, but we can find this out. The asset of SModels is speed, so we wish to get the number down to O(seconds) per theory point, and we would be ok with sacrificing accuracy in favor of speed. Since we "fix" the simplified models we can always learn the results with simple multi-layer perceptrons. Given that pyhf likelihoods are differentiable, this might actually be super easy and efficient. If you are interested, we could make this a joint effort between you guys and us. That said, we anyhow wish to learn the rest of our non-pyhf-based results in SModelS, so we would anyhow want to do something similar. And apart from this, we may still wish to figure out how to make our current pyhf-based approach faster, to render the training of the MLP fast, if for nothing else.

WolfgangWaltenberger commented 4 years ago

Ah maybe one more thing. This plot,

https://smodels.github.io/validation/123/13TeV/ATLAS/ATLAS-SUSY-2018-04/validation/TStauStau_2EqMassAx_EqMassBy_pretty.png

is our validation plot with ATLAS' official upper limits, as we just published it within our SModelS v1.2.3 database update. For comparison.

WolfgangWaltenberger commented 4 years ago

Also we should of course mention pyhf in the plot.

TStauStau_2EqMassAx_EqMassBy_pretty

Ga0l commented 4 years ago

I confirm that I used a scipy implementation of brent bracketing.
Regarding speed, I measured the time spent by hypotest using python "time" library on my local computer. Each call to hypotest last between 600 and 800 ms and it takes about 10 seconds to perform the brent bracketing.

lukasheinrich commented 4 years ago

Hi @Ga0l thanks a lot for the timing! Can you put this into context w/ other computations required to get the result? I'm curious how much this is the bottleneck.

lukasheinrich commented 4 years ago

i.e. if I understand @WolfgangWaltenberger comment correctly the total timing is ~2min per point. Would you consider the 10s a bottleneck?

Ga0l commented 4 years ago

Hi Lukas,

With only one process, each call to hypotest takes 600 ms but the more I increase the number of processes, the longer hypotest takes to run, eventually reaching 15 seconds with 8 processes. Then the whole Brent bracketing can easily reach the 2 minutes.

So the problem comes from the parallelisation. We will figure out why.

kratsg commented 4 years ago

So the problem comes from the parallelisation. We will figure out why.

sounds like an issue with python's GIL. Are you sharing any information amongst the processes?

lukasheinrich commented 4 years ago

Maybe this is also relevant for our toy calc stuff. @Ga0l have you tried joblib for parallelisation?

On Fri, May 8, 2020 at 12:10 PM Giordon Stark notifications@github.com wrote:

So the problem comes from the parallelisation. We will figure out why.

sounds like an issue with python's GIL. Are you sharing any information amongst the processes?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/scikit-hep/pyhf/issues/620#issuecomment-625744085, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARV6A4FISRBDNJOQ4DKBU3RQPLA3ANCNFSM4JEIINQQ .

WolfgangWaltenberger commented 4 years ago

That's on me to answer. Currently we parallelize via multiprocessing.apply_async -- ie processes, not threads, evading the GIL -- with zero interprocess communication, so I wouldnt have expected any problems. I will play around with different ways of parallelizing (eg joblib), and let you guys know.

Ga0l commented 4 years ago

Hi all,

In addition to the upper limit on the cross section, we need to have the value of the likelihood for the signal hypothesis and the chi squared.

For the likelihood, I used pyhf.infer.mle.fixed_poi_fit which I understand gives twice the negative log likelihood. So I called this method with mu=1.0 and converted it into a likelihood (exp(-x/2)). My first question is : did I understand well the use of pyhf.infer.mle.fixed_poi_fit and is it the good way to proceed?

And my second question is : is there a way to get the chi squared or should I compute it using the negative likelihood as well?

Many thanks,

Gaël

matthewfeickert commented 4 years ago

For the likelihood, I used pyhf.infer.mle.fixed_poi_fit which I understand gives twice the negative log likelihood. ... My first question is : did I understand well the use of pyhf.infer.mle.fixed_poi_fit and is it the good way to proceed?

As can be (kinda) seen from the pyhf.infer.mle.fixed_poi_fit docs (I should improve this) the return is governed by the optimizer but the default return is the best fit model parameters. In the example in the docs though, if you just wanted -2*NLL for the best fit parameters then you can get that from

pyhf.infer.mle.fixed_poi_fit(test_poi, data, model, return_fitted_val=True)[-1]

And my second question is : is there a way to get the chi squared

We don't provide additional statistical tooling in pyhf as this is already covered by the backend libraries that we build on. So for now you'll need to compute it yourself. Let us know if you have questions though!

Ga0l commented 4 years ago

Hi all,

We are facing what seems to be a memory leak problem. While trying to narrow down the problem, we figured it might as well come from pyhf.

  1. I made a simple code (memPyhf.py in attachment) that repeatedly executes the hypotest function and I observed that memory consumption is constantly growing. You can find the memory profile in mprofile_hypotest.dat.

  2. We observed that switching backends multiple times consumes a lot of memory as well. switchBackends.py is a simple example of that, with its memory profile mprofile_switchBackends.dat. All that you can find in the same attachment.

Would you have any idea what's causing such memory consumption?

Thanks for help,

Gaël

lukasheinrich commented 4 years ago

Hi @Ga0l

can you open up a separate issue. This certainly seems surprising. We'll make this a priority.

kratsg commented 4 years ago

hi @Ga0l , for the 2nd part. I added a time.sleep(1.0) at the beginning of the for-loop to try and determine if switching backends is causing a memory increase -- it's not.

Screenshot 2020-07-09 08 05 28

This is just because tensorflow/jax are large backends and take up a bit of memory upon import. Not sure we can do anything about that.

Ga0l commented 4 years ago

I've opened up a separate issue. I haven't seen the last comment so the backend part is also there.

Ga0l commented 4 years ago

I guess it is better to come back here since the actual memory leak issue has been solved. However, as I said we still have a memory leak in our SModelS/pyhf interface. It seems that there's a link with the parallelisation issue we're also having :

edit : actually I'm not sure the memory profiler works correctly when parallelizing. I get a constant memory consumption with the memory profiler but if I look at the global memory consumption of my laptop, it is increasing very fast

kratsg commented 4 years ago

Can you provide the code you're using for the multiprocessing?