scikit-hep / pyhf

pure-Python HistFactory implementation with tensors and autodiff
https://pyhf.readthedocs.io/
Apache License 2.0
281 stars 83 forks source link

Construction of qmutilde for muhat < 0 #2352

Open alexander-held opened 11 months ago

alexander-held commented 11 months ago

For models with some physically motivated bounds (like $\mu\geq0$) the $\tilde{q}_\mu$ test statistic is defined in https://arxiv.org/abs/1007.1727 at equation 16. The use of this test statistics in pyhf seems to rely on setting a $\mu\geq0$ parameter bound: see code.

This is generally going to give a result consistent with the prescription given in the formula, as for any (at least approximately) parabolic likelihoods where $\hat{\mu} < 0$, the result when constraining $\mu\geq0$ is going to be $\hat{\mu} = 0$.

There is a notable exception that arises for quadratic POI dependence, where a second local minimum can appear (e.g. common for EFT measurements). For $\hat{\mu} < 0$ and a second local minimum at $\mu>0$, the $\tilde{q}_\mu$ prescription would require performing an additional fit with $\mu=0$ to evaluate the correct denominator. See the following figure for such an example:

example case

As far as I am aware, the current pyhf implementation would incorrectly evaluate the test statistic by using the local minimum in the denominator.

matthewfeickert commented 11 months ago

So following up on some discussion points raised by @cranmer from our NYU/UW-Madison group meeting discussion:

A question that I have is if you believe that you would be in a situation where (edit) $\mu < 0$, like an EFT analysis, then why would you be selecting $\tilde{q}_{\mu}$ — which is "For the case where one considers models for which $\mu \geq 0$" — and not $q\mu$?

alexander-held commented 11 months ago

I agree that "incorrect" is not a good description, maybe "inconsistent with the definition" is better. The question about the usefulness of that approach in such a scenario stands of course.

Regarding your question: $\hat{\mu}<0$ could always happen due to "unlucky" observed data even when you expect $\mu\geq0$ for physics reasons. There is no way to protect against this beyond designing your model with a bound at 0 and you would not know whether or not this will happen prior to unblinding. The method and test statistic is already fixed at that point.

cranmer commented 11 months ago

Quick comment, the situation $\hat{\mu} < 0$ is common in vanilla signal+background signal strength situations like $Pois(n | \mu s + b)$, e.g. when $n < b$. The presence of a local minima / non-convex log-likelihood curve however is uncommon there (unless something really weird with signal systematics). The EFT situations are ones where you might expect some interference or something that could lead to local minima, but in those situations $\mu$ is some parameter where it's not clear to me that you would want to do an upper-limit. In that case it's not so much about $\tilde{q}\mu$ vs. ${q}\mu$ as it is about $q$ vs. $t$ (e.g. single-sided vs. double sided).

matthewfeickert commented 11 months ago

Sorry, you're both obviously correct, but I was typing https://github.com/scikit-hep/pyhf/issues/2352#issuecomment-1777571555 while trying to listen to another meeting and I typoed. I meant to write

A question that I have is if you believe that you would be in a situation where $\mu < 0$

(so not $\hat{\mu} < 0$)

but if I had read better I would have seen that @alexander-held addressed this in the first sentence:

For models with some physically motivated bounds (like $\mu\geq0$)

so we're already starting there with the answer to my question.

matthewfeickert commented 11 months ago

@alexander-held For reference, as you mention in your 2023-10-16 ATLAS Statistics Committee Meeting slides this was addressed in

xRooFit: fixed recently and deployed in StatAnalysis

I'm assuming this is https://gitlab.cern.ch/will/xroofit/-/commit/ebf6a49194c3c08dd2e12d8933fe16067b6243e8 (xRooFit doesn't use MR based workflow so going by commit message)?

alexander-held commented 10 months ago

I believe so, @will-cern can presumably confirm.