Closed darousso closed 9 months ago
@darousso Thanks for your question and for using pyhf. As you're asking about a feature that hasn't been released yet with official support and isn't in the public API it would be helpful to get as much information as possible from you. Can you please give us the following information?
pyhf
used (here we need to know the branch and commit number that you're on as you're not using an official release)As we haven't made a release with pseudoexperiment generation support yet it would be good to understand this now so that if there are issues we can resolve them before we publicly release this. So your question is very important and we appreciate you bringing it up (also for being a bold beta tester :rocket:).
@matthewfeickert
Thanks for your super quick response!
I downloaded it via pip3 install --user git+https://github.com/scikit-hep/pyhf.git@toycalc
If I do pip3 show pyhf
I get:
Name: pyhf
Version: 0.4.2.dev55
Summary: (partial) pure python histfactory implementation
Home-page: https://github.com/scikit-hep/pyhf
Author: Lukas Heinrich, Matthew Feickert, Giordon Stark
Author-email: lukas.heinrich@cern.ch, matthew.feickert@cern.ch, gstark@cern.ch
License: Apache
Location: /var/clus/usera/drousso/.local/lib/python3.6/site-packages
Requires: scipy, click, tqdm, jsonschema, jsonpatch, pyyaml
Required-by:
Python Notebook has been attached as a txt file since github is not letting me upload it (sorry). Toys Debugging.txt
Thank you so much for helping look into this! And also thank you for developing pyhf (it is so much easier to use than HistFitter...)!
I downloaded it via
pip3 install --user git+https://github.com/scikit-hep/pyhf.git@toycalc
Ah okay great. This actually has all the information we need, along with your notebook which you gave us. We can start debugging from here — thanks!
This might take some time to better understand the actual behavior, but once we get the pseudoexperiment generation support into master
we can point you to a new dev
release that you can install from TestPyPI. :+1:
Python Notebook has been attached as a txt file since github is not letting me upload it
@darousso sorry, can you actually send us the notebook another way, or put it somewhere that we can access it through CERN? Unforunatley it seems to have gotten some of the cell formatting strangely messed up when I renamed the .txt
to .ipynb
and opened it.
@matthewfeickert Sorry, I am not exactly sure how best to attach it, so I have just directly sent it to your CERN email. Let me know if that's alright!
I have just directly sent it to your CERN email.
Perfect. Thanks!
Okay, I've just ripped the spec out and made everything a runnable Python script that I put in this Gist. This can serve as a debugging source, but I am able to replicate your results with the spec and running with JAX as you were:
# Asymptotics
cls_obs: [0.15568652]
cls_exp: [[0.02969 ]
[0.08653433]
[0.22822617]
[0.49778549]
[0.8050249 ]]
# Toys
cls_obs: [0.24197901]
cls_exp: [[0.25022422]
[0.26773903]
[1. ]
[1. ]
[1. ]]
Our analysis is also looking at using toys with pyhf, so I'm interested in this answer as well!
This will definitely be addressed along the way to the release of v0.5.0
. We can't promise full support for anything though until it is in an official release, but v0.5.0
should be the next release that comes out, and the aim of the dev team is to have this be done sooner rather than later.
Out of curiosity, in what timeframe is v0.5.0 planned to be released?
Out of curiosity, in what timeframe is v0.5.0 planned to be released?
We don't have set timeframes, but it is a priority for us to get out. The 3 of us have been a bit swamped at the moment with preparing various software tutorials and conferences and there are some additional PRs that will need to go in as well before we can release v0.5.0
for technical reasons. This hasn't been abandoned though — this is just one of the last pieces in the chain.
Note: very small numbers so asymptotics may not set in yet.
The systematic variations ucs_StopRHadron_1100_100000
are both less than the nominal, which may cause some strange behavior.
Is there a nice way to plot a histogram of the test statistic from the toys?
Is there a nice way to plot the values of the parameters being used to generate the toys (vs. mu)?
Does the aux data vary in the Toy generation?
Hello,
Sorry, am a bit confused with the roadmaps, has the fix been implemented in the current release?
-David
has the fix been implemented in the current release?
No, the related PR hasn't been merged.
am a bit confused with the roadmaps
This is scheduled for minor release v0.6.0
.
Ah perfect, thanks!
(Sorry to bother again, I have just learned that our analysis intends on requesting EB in December, would the minor release be out by then?)
(Sorry to bother again, I have just learned that our analysis intends on requesting EB in December, would the minor release be out by then?)
hi @darousso, thanks for checking in. Yes, that is very much our plan.
We also now have an announcement mailing list that you can join to be updated on upcoming releases and general announcements: https://groups.google.com/group/pyhf-announcements/subscribe. If you join you'll be able to know a few days in advance of when you can expect v0.6.0
on PyPI.
hi @darousso, I think this is actually expected/understood. can you try to plot a brazil band for your setup? specificaclly with sufficiently high mu (say up to mu=5). At some point the upper sigmas should come down
@darousso following up on what @lukasheinrich is pointing to, this is probably related to the discussion in PR #966.
below I show brazil bands for scans of µ \in [0.1, 5, 50]
asymptotics
toys (1k toys per µ)
thanks @kratsg these look reasonable. I should create a "learn notebook" to explain what the issue is. in short, since half oof the test stat distribution under signal hypothesis is a delta function CL_s+b can be either 1.0 or in the inverval [0.5.,0.0].. so CLs values can exhibit this type of discontinuity
in any case for an upper limit you're interested in the region where CLs ~0.05 so µ between 2 and 4 in the above plot
hi @darousso,
there'll be more explanation, but generally your setup seems ok. Using the additions in #1160 this is what your example looks like
existing comparison between toys and yours (slightly apples to oranges) ( clip = False)
adjusting asymptotics to make apples to apples comparison
VERSION 1 clip = True, inclusive_pvalue = False
VERSION 2 clip = True, inclusive_pvalue = True
the fixes in #1126 should also make these plots look nicer
Hi everyone, thank you so much for doing this!! Really sorry for the late reply, have been trying my best to try playing around with things out on my side with the new stuff.
Sorry, I am quite new to limit setting so I am a bit confused by the above discussion, I am assuming from the above that for the hypotest, the issue lies with the fact that curves are not constrained to be 1 at mu=0, which creates this saturation which is fixed with this "inclusive_pvalue=False" parameter in the learn_toys_funky branch? What does this parameter do? (The hypotest function won't accept the "clip" parameter for toys, I am assuming that is just for asymptotics?) Is the inclusive_pvalue the only thing I need to change in the new setup?
As #1162 is merged, all that's left is #1160 / #993 (for the clipping).
Sorry, I am quite new to limit setting so I am a bit confused by the above discussion, I am assuming from the above that for the hypotest, the issue lies with the fact that curves are not constrained to be 1 at mu=0, which creates this saturation which is fixed with this "inclusive_pvalue=False" parameter in the learn_toys_funky branch? What does this parameter do? (The hypotest function won't accept the "clip" parameter for toys, I am assuming that is just for asymptotics?) Is the inclusive_pvalue the only thing I need to change in the new setup?
The fundamental issue is that for the test statistics here, you're looking at the distribution of the test statistic in the signal+background hypothesis to be a delta function (1.0) or in the interval of [0.5, 0.0] which is a bit discontinuous which is what you're seeing. One annoying issue at the same time is that the pvalues may not be ordered in the same way as the test statistics (when computing intervals: #1162). So this needed to get fixed at the same time.
We'll be trying to flesh out/clarify the API when 0.6.0 is released and we finish up all the remaining PRs.
I'm moving this to v0.6.1
even though PR #993 mostly resolves it as PR #1160 should be discussed more on what parts are needed. for the time being that has been moved to v0.6.1
. PR #1160 also has a stub of a learn notebook on clipping.
Hi @kratsg and @matthewfeickert , thank you so much for all your help on this front, and my sincerest apologies for the late reply!
I have tried running things with the latest pyhf version (as well as with the learn_toys_funky branch back in November when there was the inclusive_pvalue option), I definitely see an improvement to last time (as back then I couldn't even get an exclusion curve), but I am still noticing some strange behaviour with the toys calculator, as it looks like the upper sigma bands have collapsed and the experimental curve is hugging the expected. Would you happen to know what I may be doing incorrect? (I notice the clipping option is only available for asymptotics?)
output = pyhf.infer.hypotest(mupoi,
data,model,
return_expected_set = return_expected_set,
calctype=calctype,
ntoys=ntoys
)
Once again, my most sincerest apologies if I am screwing up something really simple...
(I realize I shouldn't be putting ATLAS-specific plots here even if the results are already public. It is here: https://gitlab.cern.ch/drousso/dvjetsanalysis/-/blob/e9a82569b44a28b32d1f9668f3afc4f541a89641/DVMuonLimits/TestCondorMET_100000/Exclusion_Plot.png )
Dear @matthewfeickert et al, just a friendly question if you had time to look into this issue already? Many thanks, Oleg
Hello,
Sorry for bugging again, I just thought it may be more useful if I actually provided some example workspaces and a description of the behaviour in comparing pyhf and Histfitter in a couple slides.
Is the asymmetric bands expected behaviour and this is instead a bug in HistFitter?
-David
Note to devs : This Issue needs to be the focus of v0.6.4
. I think as soon as this is resolved we should then move forward with releasing v0.6.4
and then move everything else left on the board to v0.7.0
and v07.1
@darousso With PR #1610 in I can try to revisit this in the back half of the week once I'm out of workshops. As was mentioned, the asymptotic approximaitons may not be valid here, so it probably makes sense to do comparison between HistFitter and pyhf pseudo-experiments and to reproduce @kratsg plots from earlier (https://github.com/scikit-hep/pyhf/issues/892#issuecomment-718340602) with the updated code.
Dear authors, this problem seems to persist in pyhf 0.6.3. Has there been any progress on it? We have an analysis at an advanced stage that is impacted by it.
Dear authors, this problem seems to persist in pyhf 0.6.3. Has there been any progress on it? We have an analysis at an advanced stage that is impacted by it.
This needs to get revisited for the v0.7.0
release. Can you either share a minimal reproducible example of how your analysis is being affected, and/or see if your problem still exists if you use the dev release on TestPyPI?
Minimal reproducible example:
import json
import matplotlib.pyplot as plt
import pyhf
from pyhf.contrib.viz import brazil
pyhf.set_backend('jax')
j = json.loads('{"channels": [{"name": "channel1", "samples": [{"data": [10.0, 0.2], "modifiers": [{"data": null, "name": "lumi", "type": "lumi"}, {"data": null, "name": "mu_sig", "type": "normfactor"}, {"data": {"hi": 1.1, "lo": 0.9}, "name": "bsm_uncerts", "type": "normsys"}], "name": "bsm_signal"}, {"data": [0.62, 1.94], "modifiers": [{"data": [0.31, 0.97], "name": "staterror_channel1", "type": "staterror"}, {"data": {"hi_data": [0.64, 2.033], "lo_data": [0.6, 1.847]}, "name": "non_linearity", "type": "histosys"}], "name": "bkg"}]}], "measurements": [{"config": {"parameters": [{"auxdata": [1.0], "bounds": [[0.5, 1.5]], "inits": [1.0], "name": "lumi", "sigmas": [0.017]}], "poi": "mu_sig"}, "name": "meas"}], "observations": [{"data": [0, 0], "name": "channel1"}], "version": "1.0.0"}')
ws = pyhf.Workspace(j)
model = ws.model()
data = ws.data(model)
poi_vals = [0.0, 0.1, 0.2, 0.3, 0.5, 1.0]
results = [
pyhf.infer.hypotest(
test_poi, data, model, test_stat="qtilde", return_expected_set=True, calctype='toybased', ntoys=10000
)
for test_poi in poi_vals
]
fig, ax = plt.subplots()
fig.set_size_inches(7, 5)
brazil.plot_results(poi_vals, results, ax=ax)
fig.savefig(f"analysis_example.pdf")
plt.close(fig)
The resulting plot looks like the attached one.
…and running the exact same example as above in the current dev version of pyhf, 0.6.4.dev77
, yields the following, very different result:
Dear @matthewfeickert, all, is there any insight into this? We may need to find an alternative to pyhf if we cannot resolve this, but that would of course be a pain.
Thank you, Stefan
The difference pointed out above bisects to 9fbbbf9740175fcdfe73fedc4351953be18d2664 (#1639) and affects the asymptotic calculation, see the examples below done with asymptotics:
5ea4e0a7:
9fbbbf97:
c.f. Issue #1720 also. We have regression fixing to do.
As an intermediate solution, you can switch your staterror
to shapesys
in the workspace. The shapesys
modifier does not seem to be affected. That does however change your constraint term from Gaussian to Poisson (#760).
This should be currently fixed in master
. Can we get confirmation?
We've gotten no confirmation. Given that nobody's run into issues with the fixes provided by @lukasheinrich this will get closed.
Question
I have noticed that the toys hypotest seems to give some strange values even if the asymptotics gives regular looking values.
In general the cls_exp upper sigma values seem to be all saturated at 1 for all signal points, and this does not seem to improve when increasing the number of toys.
Relevant Issues and Pull Requests
An example signal point model is the following:
The following results are obtained:
Do let me know if you need more information about environment or the actual files I am running (it is in an ipynb)