doc: Add page describing how to build Upper Limits table

kratsg commented 1 year ago

This came out of scikit-hep/pyhf#2263.

This describes how to build an upper limit table exactly to what ATLAS publishes in various papers, particularly for the SUSY standard tables. This builds on an existing published analysis EWK 2L2J and its corresponding HEPData entry.

The current rendering can be seen here.

Summarize commit messages into a comprehensive review of the PR

* Add rich to requirements.txt and rebuild lock file.
* Add notebook on procedure for building upper limits table common to
  search analyses.
* Add .DS_Store to .gitignore.

kratsg commented 1 year ago

@matthewfeickert - it seems black-jupyter is modifying the file in one way that gets undone by the nbqa-isort hook. Essentially something about a trailing new line that black-jupyter is adding that gets cleaned up by nbqa-isort.

$ pre-commit run black-jupyter --all-files
black-jupyter............................................................Failed
- hook id: black-jupyter
- files were modified by this hook

reformatted book/UpperLimitsTable.ipynb

All done! ✨ 🍰 ✨
1 file reformatted, 14 files left unchanged.

$ git diff
diff --git a/book/UpperLimitsTable.ipynb b/book/UpperLimitsTable.ipynb
index bf9b40d..03e6ae6 100644
--- a/book/UpperLimitsTable.ipynb
+++ b/book/UpperLimitsTable.ipynb
@@ -20,6 +20,7 @@
     "pyhf.set_backend(\"numpy\", \"minuit\")\n",
     "\n",
     "from rich import box\n",
+    "\n",
     "# for making tables\n",
     "from rich.console import Console\n",
     "from rich.table import Table"

$ pre-commit run nbqa-isort --all-files
nbqa-isort...............................................................Failed
- hook id: nbqa-isort
- files were modified by this hook

Fixing /Users/kratsg/pyhf-tutorial/book/UpperLimitsTable.ipynb

$ git diff
$

kratsg commented 1 year ago

I won't be able to give this a full review until later in the week, but I don't think we don't have rich in book/requirements.txt and so we need to add at least that there and then relock the book/requirements.lock.

Why here and not binder/requirements.txt? I tried to run this on my Mac and got a segfault so I'm not sure how this lockfile resolution is working.

matthewfeickert commented 1 year ago

Why here and not binder/requirements.txt?

Sorry, you're right. I was going from memory, but you do want that to be in binder/requirements.txt.

I tried to run this on my Mac and got a segfault so I'm not sure how this lockfile resolution is working.

Oh bummer. If you just run nox it segfaults on you? If so, can you just update binder/requirements.txt and I'll fix the lock file and then we can switch over to a Docker based solution for the lock file so that this doesn't happen?

kratsg commented 1 year ago

Oh bummer. If you just run nox it segfaults on you? If so, can you just update binder/requirements.txt and I'll fix the lock file and then we can switch over to a Docker based solution for the lock file so that this doesn't happen?

nox seems to work, I was running it manually.. so not sure why there's a difference. I do get errors however, but unrelated to my changes

 nox
nox > Running session lock
nox > Creating virtual environment (virtualenv) using python in .nox/lock
nox > python -m pip install --upgrade pip setuptools wheel
nox > python -m pip install --upgrade pip-tools
nox > cat /Users/kratsg/pyhf-tutorial/binder/requirements.txt /Users/kratsg/pyhf-tutorial/book/requirements.txt
nox > pip-compile --resolver=backtracking --generate-hashes --output-file /Users/kratsg/pyhf-tutorial/book/requirements.lock /Users/kratsg/pyhf-tutorial/requirements.txt
  ERROR: Cannot install pyhf[contrib,minuit,xmlio]==0.7.2 because these package versions have conflicting dependencies.
Discarding numpy==1.25.1 (from -r /Users/kratsg/pyhf-tutorial/book/requirements.lock (line 915)) to proceed the resolution
  ERROR: Cannot install pyhf[contrib,minuit,xmlio]==0.7.2 because these package versions have conflicting dependencies.
Discarding scipy==1.11.1 (from -r /Users/kratsg/pyhf-tutorial/book/requirements.lock (line 1423)) to proceed the resolution
  ERROR: Cannot install -r /Users/kratsg/pyhf-tutorial/requirements.txt (line 3) because these package versions have conflicting dependencies.
Discarding ipython==8.14.0 (from -r /Users/kratsg/pyhf-tutorial/book/requirements.lock (line 539)) to proceed the resolution

matthewfeickert commented 1 year ago

I do get errors however, but unrelated to my changes

That's because you're apparently using Python 3.8

This file is autogenerated by pip-compile with Python 3.8

and the lock file was built assuming Python 3.10+ (again, I should Dockerize this, sorry). I'll fix this now and review hopefully tonight.

matthewfeickert commented 1 year ago

Rebased to pick up PR #58 to get pyhf v0.7.3. Reviewing after I make dinner. :+1:

kratsg commented 1 year ago

Is there a reason that these references are chosen? Is there some HEP reference paper on stats that cites the J. Phys. Chem. A 2001 paper? If not, let's check the PDG to see if this is included there, unless there was something particularly striking about this reference.

39.2.5 has a really small blurb and not useful for anyone imo. The 2001 paper is not only the top result when googling for the method but it's also been linked by stats forum in emails and by @alexander-held as well. I'm not sure there's anything better.

Maybe I need to read this section more carefully, but how does Equation 52 show the following statements? Regardless of how it does, can we make this more explicit here?

you'd have to read through equation 52 to see how it sets the value of $\mu$ here in order to determine the hypotheses. It is written more explicitly by writing out what the hypotheses are.

This is mixing multiple ideas of an upper limit signal strength and the definition of CLb. CLb is just a statement of the p-value of the background only hypothesis given the observed test-statistic, and then flipped with the 1−pb part for geometric reasons for use in the ratio that gives CLs — it is a construction that exists only by necessity for the CLs method. The CLb makes no statement on what is the observed test statistic that you use. Here we are choosing the test stat for the 95% CL upper limit but that is just a choice as we wanted a test size of α=0.05, but it could have been anything. So I we should not include the test size in the definition (3 line equality) of CLb.

This is the official ATLAS/CMS statement here. I didn't change anything because this is agreed upon. Take it up with those collaborations from their decision in like 2012. As in, this is the definition that has been used to make the tables in quite nearly every single SUSY paper published by both collaborations. Which is why it has to be written out more explicitly here, or nobody will figure out how to get those numbers.

Is there another reference that we can use? I know that this one comes up in search results because it is the only bloody thing that isn't firewalled by ATLAS, but I've never been a huge fan of this note and it kinda bugs me that it doesn't have any stable reference. It isn't even on CDS (private to ATLAS) as far as I know, but I would love to be wrong on this!

Nope.

-values do not measure compatibility of a model and data. I know that the caption for Table 20 literally says this, but it is wrong. p-values are the probability given a model hypothesis to have observed a test statistic as extreme or more so (extreme in terms of disagreement with the model hypothesis) than the observed test statistic. They can not make statements on model compatibility with the data. So I would suggest we rephrase this section.

The caption is the official agreed upon text from the ATLAS Stats Committee + ATLAS SUSY group. That will have to be taken up with them. I can even link you the corresponding twikis that have this phrasing: https://twiki.cern.ch/twiki/bin/view/AtlasProtected/StandardSUSYRefs#Suggested_text_agreed_with_StatF

I'm not really crazy about these slides in general and I feel like while they do provide context in the lecture he gave that they don't actually help here. So I would personally suggest removing this section and that we write up something better later.

These are added until something better comes along which is not going to be covered in this notebook so trivially which is already complicated enough.

It would probably be useful to mention that DRInt_cuts should correspond to Signal Region "DR-Int-EWK" and comment on why the results we get here are slightly different form the published table. This is maybe beating the point over the head, but I think it would help (would help 02:00 Matthew!).

I don't know for sure why they're different. Part of the problem is that the table up there has some additional rounding that comes into play in certain pieces. I already confirmed that the calculations I show match up with the calculations from regular RooFit/RooStats -- so there's other stuff going on. The differences are close enough given the larger uncertainties/variations that I'm not concerned anyone will be confused that it's not exactly the same.

Just remember this notebook isn't really anything particularly from my own words, but just combining all the various bits and pieces scattered across various documents into one place, to make sure everything that goes into making that table is actually described. Whether or not the descriptions are correct/factual is up to those documents unfortunately.

matthewfeickert commented 1 year ago

Just remember this notebook isn't really anything particularly from my own words. ... Whether or not the descriptions are correct/factual is up to those documents unfortunately.

I don't understand. This is not an ATLAS document and it is not a summary of ATLAS documents, it is a summary of a procedure in our own words. We're under no constraints to write things that aren't correct because they appear somewhere else or to have to use language that was written once a decade ago by someone in ATLAS and then never updated.

alexander-held commented 1 year ago

A few comments on discussion items: I think I got the reference for error propagation from either Google or some Stackoverflow thread. A better reference if we were to write a paper is probably some textbook, but I think this one does its job well.

I don't think the figures from Alex fit in here really well. They would at least need a detailed explanation to connect them to the text. Right now they feel disconnected.

For the CLs reference: alternatives might be the CLs paper (probably harder to digest). The statistics info note for the Higgs discovery https://cds.cern.ch/record/1379837 also has something on this I believe.

kratsg commented 1 year ago

I don't understand. This is not an ATLAS document and it is not a summary of ATLAS documents, it is a summary of a procedure in our own words. We're under no constraints to write things that aren't correct because they appear somewhere else or to have to use language that was written once a decade ago by someone in ATLAS and then never updated.

We can drop the images for sure. The problem is more that people in ATLAS and CMS (and outside) will use pyhf to make these tables, but also to reproduce what is in the papers. Right now, that procedure as described in the notebook is exactly what is being done. We could make this an ATLAS-internal only thing instead (but we don't have an ATLAS-internal pyhf documentation -- and this also impacts CMS as they are using the same definitions). Also this language has gotten updated somewhat frequently.

matthewfeickert commented 1 year ago

@kratsg I think we're going to have a lot of discussion on these points, but you've also done a lot of work here that would be better served getting merged in and then having us revise it as needed. I'd propose that you apply @alexander-held's comments, basically just

I don't think the figures from Alex fit in here really well. They would at least need a detailed explanation to connect them to the text. Right now they feel disconnected.

We can drop the images for sure.

and then we merge this and I open up a GitHub Issue from my original comments where we can discuss this and then make any necessary revisions in a follow up PR.

I also think it is fine to drop the images even though you mentioned

These are added until something better comes along which is not going to be covered in this notebook so trivially which is already complicated enough.

as I plan to do exactly this in https://github.com/scikit-hep/pyhf/issues/2274.

I want to "get things right" (to my mind) usually in the first pass, but I also realize that I slow down our code velocity quite a bit with this and so I'm trying (bear with me while I slowly work on growing here) to take a few more pages out of Matt Rocklin's views on the purpose and roles of review: Small Scope and Fast Review: The secret to happy devs

The problem is more that people in ATLAS and CMS (and outside) will use pyhf to make these tables, but also to reproduce what is in the papers. Right now, that procedure as described in the notebook is exactly what is being done.

I agree. The procedure is fine! Things that I want to change are not the procedure, but the description of what is happening in the procedure. Some of the statements (made by others that you are just reproducing, so not an attack on you) are just not true at all. We don't have any obligation to repeat such things, and I would argue we should actively avoid doing so!

We could make this an ATLAS-internal only thing instead (but we don't have an ATLAS-internal pyhf documentation -- and this also impacts CMS as they are using the same definitions).

No, I would say that we should actively avoid making anything pyhf related ATLAS internal. There are obvious things that only can be done with ATLAS at the moment (i.e. full probability models) and if we inside of ATLAS want to produce internal resources for the collaboration that use pyhf that's fine (but why make it internal?), but as a pyhf team product I think it is important that everything is always public.

So we're in agreement here about wanting to avoid this. :+1:

Also this language has gotten updated somewhat frequently.

Then it should hopefully be easier to correct internally at some point in the future.

kratsg commented 1 year ago

Done, images dropped. I added a quick blurb for the link to scikit-hep/pyhf#2274 and that should cover that portion of the section there.

Thanks for the thorough review (admittedly, I still haven't gotten any feedback from the Stats Forum on this notebook).

pyhf / pyhf-tutorial

doc: Add page describing how to build Upper Limits table #57