Goodness-of-fit with custom auxiliary data

alexander-held commented 1 year ago

The goodness-of-fit calculation currently evaluates the constraint term for the saturated model via the suggested parameter inits for a model: https://github.com/scikit-hep/cabinetry/blob/4181499fb1f30fa4f94d2215a8837a5ec2e3db3f/src/cabinetry/fit/__init__.py#L384-L388

For default auxiliary data (model.config.auxdata) this corresponds to the maximum of the constraint term. When using custom auxiliary data this is no longer the case and the parameter values that should be used are different.

An inefficient way to obtain those is via a small fit:

objective = lambda values: -model.constraint_logpdf(aux_data, np.asarray(values))
pars = scipy.optimize.minimize(objective, model.config.suggested_init()).x

A better approach would be to skip the fit and construct the parameters directly from the auxdata (and inits for free-floating parameters):

i_aux = 0
pars = []  # parameters that will maximize constraint term
for parname in model.config.par_order:
    if parname in model.config.auxdata_order:
        for _ in range(model.config.param_set(parname).n_parameters):
            # pick up auxdata
            pars += [aux_data[i_aux]]
            i_aux += 1
    else:
        # no aux for parameter -> pick up init (free-floating parameters)
        pars += model.config.param_set(parname).suggested_init

The implementation above is missing the scaling for Poisson terms (model.constraint_model.constraints_poisson.batched_factors).

alexander-held commented 1 year ago

Minimal setup to test implementations (needs testing with all modifier types):

import pyhf
import numpy as np
import scipy.optimize

model = pyhf.simplemodels.correlated_background(
    signal=[12.0, 11.0],
    bkg=[50.0, 52.0],
    bkg_up=[51.0, 53.0],
    bkg_down=[49.0, 52.0],
)
pdf = model.make_pdf(pyhf.tensorlib.astensor(model.config.suggested_init()))
samples = pdf.sample((2,))

for sample in samples:
    _, aux_data = model.fullpdf_tv.split(pyhf.tensorlib.astensor(sample))
    print(f"aux: {aux_data}")

    objective = lambda values: -model.constraint_logpdf(aux_data, np.asarray(values))
    best_pars = scipy.optimize.minimize(objective, model.config.suggested_init()).x
    constraint_ll = pyhf.tensorlib.to_numpy(
        model.constraint_logpdf(aux_data, best_pars)
    )
    print(f"pars maximizing constraint term: {best_pars}")

alexander-held commented 1 year ago

More complete with scaling for Poisson terms: https://gist.github.com/alexander-held/0c5730d6e2d0193d55f799107d3ce286

scikit-hep / cabinetry

Goodness-of-fit with custom auxiliary data #377