scikit-hep / cabinetry

design and steer profile likelihood fits
https://cabinetry.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
27 stars 21 forks source link

Wrong `n_dof` for fixed unconstrained parameters in `_goodness_of_fit` #478

Open lorenzennio opened 4 months ago

lorenzennio commented 4 months ago

Hey,

I think I found a small bug in the _goodness_of_fit function, when working with fixed unconstrained parameters. When you fix an unconstrained parameter in the fit using the fix_pars argument, the n_dof is still calculated with all (fixed or free) unconstrained parameters of the model.

Code to reproduce:

import cabinetry
import pyhf

import logging
logging.basicConfig(level=logging.DEBUG)

spec = {
    "channels": [
        {
        "name": "singlechannel",
        "samples": [
            {
            "name": "signal",
            "data": [5.0, 10.0, 15.0],
            "modifiers": [
                {
                    "name": "mu",
                    "type": "normfactor",
                    "data": None
                }
            ]
            },
            {
            "name": "background",
            "data": [50.0, 60.0, 70.0],
            "modifiers": [
                {
                    "name": "mu_bkg",
                    "type": "normfactor",
                    "data": None
                }
            ]
            }
        ]
        }
    ]
    }

model = pyhf.Model(spec)

data = [55.0, 70.0, 70.0]

fixed = model.config.suggested_fixed()
fixed[model.config.par_map['mu_bkg']['slice']] = [True]
fit = cabinetry.fit.fit(model, data, fix_pars=fixed, goodness_of_fit=True)
print(fit.goodness_of_fit)

model.config.par_map['mu_bkg']['paramset'].suggested_fixed = [True]
fit = cabinetry.fit.fit(model, data, fix_pars=fixed, goodness_of_fit=True)
print(fit.goodness_of_fit)
alexander-held commented 4 months ago

Thanks a lot for raising this! I see that model_utils.unconstrained_parameter_count already has a check for parameters set to be fixed in the model itself but that cannot catch these cases of parameters that are dynamically set to be fixed. I need to think a bit about how to best handle this. Perhaps we can calculate the difference of the sum of True values in model.config.suggested_fixed() and the fix_pars within fit.fit(), which should come out to be this difference that needs to be subtracted from n_dof.

lorenzennio commented 4 months ago

How would we deal with fixed constrained parameters then?

Would you rather correct for this in the fit() function directly, or pass fix_pars through to model_utils.unconstrained_parameter_count as an optional argument? For the latter, one could implement a simple correction, having access to fix_pars.

alexander-held commented 4 months ago

Good point, sounds like we should pass the information through to handle constrained parameters correctly.

lorenzennio commented 4 months ago

What do you think of this fix?

def unconstrained_parameter_count(model: pyhf.pdf.Model, fix_pars: Optional[List[bool]] = None,) -> int:
    """Returns the number of unconstrained, non-constant parameters in a model.

    The number is the sum of all independent parameters in a fit. A shapefactor that
    affects multiple bins enters the count once for each independent bin. Parameters
    that are set to constant are not included in the count.

    Args:
        model (pyhf.pdf.Model): model to count parameters for
        fix_pars (Optional[List[bool]], optional): list of booleans specifying which
            parameters are held constant, defaults to None (use ``pyhf`` suggestion)

    Returns:
        int: number of unconstrained parameters
    """
    n_pars = 0
    # custom fixed parameters overrule suggested fixed parameters
    if fix_pars is None:
        fix_pars = model.config.suggested_fixed()

    for parname in model.config.par_order:
        if not model.config.param_set(parname).constrained:
        # only consider non-constant parameters
            par_slice = model.config.par_slice(parname)
            n_pars += fix_pars[par_slice].count(False)

    return n_pars

Can PR this, if you are happy.

alexander-held commented 4 months ago

Thanks for opening the PR! Looking at the original implementation, I was trying to remember why I wrote it in that specific way. I cannot think of any obvious issues with what you have though. I'll have a closer look at the PR.

lorenzennio commented 4 months ago

No worries, let me know if you want me to change anything.