Open alexander-held opened 11 months ago
So following up on some discussion points raised by @cranmer from our NYU/UW-Madison group meeting discussion:
A question that I have is if you believe that you would be in a situation where (edit) $\mu < 0$, like an EFT analysis, then why would you be selecting $\tilde{q}_{\mu}$ — which is "For the case where one considers models for which $\mu \geq 0$" — and not $q\mu$?
I agree that "incorrect" is not a good description, maybe "inconsistent with the definition" is better. The question about the usefulness of that approach in such a scenario stands of course.
Regarding your question: $\hat{\mu}<0$ could always happen due to "unlucky" observed data even when you expect $\mu\geq0$ for physics reasons. There is no way to protect against this beyond designing your model with a bound at 0 and you would not know whether or not this will happen prior to unblinding. The method and test statistic is already fixed at that point.
Quick comment, the situation $\hat{\mu} < 0$ is common in vanilla signal+background signal strength situations like $Pois(n | \mu s + b)$, e.g. when $n < b$. The presence of a local minima / non-convex log-likelihood curve however is uncommon there (unless something really weird with signal systematics). The EFT situations are ones where you might expect some interference or something that could lead to local minima, but in those situations $\mu$ is some parameter where it's not clear to me that you would want to do an upper-limit. In that case it's not so much about $\tilde{q}\mu$ vs. ${q}\mu$ as it is about $q$ vs. $t$ (e.g. single-sided vs. double sided).
Sorry, you're both obviously correct, but I was typing https://github.com/scikit-hep/pyhf/issues/2352#issuecomment-1777571555 while trying to listen to another meeting and I typoed. I meant to write
A question that I have is if you believe that you would be in a situation where $\mu < 0$
(so not $\hat{\mu} < 0$)
but if I had read better I would have seen that @alexander-held addressed this in the first sentence:
For models with some physically motivated bounds (like $\mu\geq0$)
so we're already starting there with the answer to my question.
@alexander-held For reference, as you mention in your 2023-10-16 ATLAS Statistics Committee Meeting slides this was addressed in
xRooFit: fixed recently and deployed in StatAnalysis
I'm assuming this is https://gitlab.cern.ch/will/xroofit/-/commit/ebf6a49194c3c08dd2e12d8933fe16067b6243e8 (xRooFit
doesn't use MR based workflow so going by commit message)?
I believe so, @will-cern can presumably confirm.
For models with some physically motivated bounds (like $\mu\geq0$) the $\tilde{q}_\mu$ test statistic is defined in https://arxiv.org/abs/1007.1727 at equation 16. The use of this test statistics in
pyhf
seems to rely on setting a $\mu\geq0$ parameter bound: see code.This is generally going to give a result consistent with the prescription given in the formula, as for any (at least approximately) parabolic likelihoods where $\hat{\mu} < 0$, the result when constraining $\mu\geq0$ is going to be $\hat{\mu} = 0$.
There is a notable exception that arises for quadratic POI dependence, where a second local minimum can appear (e.g. common for EFT measurements). For $\hat{\mu} < 0$ and a second local minimum at $\mu>0$, the $\tilde{q}_\mu$ prescription would require performing an additional fit with $\mu=0$ to evaluate the correct denominator. See the following figure for such an example:
As far as I am aware, the current
pyhf
implementation would incorrectly evaluate the test statistic by using the local minimum in the denominator.