scikit-hep / pyhf

pure-Python HistFactory implementation with tensors and autodiff
https://pyhf.readthedocs.io/
Apache License 2.0
285 stars 84 forks source link

Multiple staterror modifiers per sample and channel, round trips and duplicate modifiers #1830

Open alexander-held opened 2 years ago

alexander-held commented 2 years ago

Summary

In the context of #760 I was wondering what would happen in a setup that exploits the fact that pyhf allows customizing parameter names for staterror modifiers to assign multiple such modifiers to the same sample in the same channel.

Within just pyhf, this seems to work fine, and there will be multiple parameters in the fit. When converting to xml+ROOT, two StatError modifiers do show up in the xml. When converting back to JSON, the staterror modifier names get standardized, resulting in two fields with the same modifier name. Only one of them is used when building the model, so the roundtrip fails.

OS / Environment

n/a

Steps to Reproduce

Default workspace:

{
    "channels": [
        {
            "name": "ch",
            "samples": [
                {
                    "data": [1000.0],
                    "modifiers": [
                        {"data": null, "name": "mu_sig", "type": "normfactor"},
                        {"data": [10.0], "name": "staterror_ch_1", "type": "staterror"},
                        {"data": [20.0], "name": "staterror_ch_2", "type": "staterror"}
                    ],
                    "name": "signal"
                }
            ]
        }
    ],
    "measurements": [{"config": {"parameters": [], "poi": "mu_sig"}, "name": "meas"}],
    "observations": [{"data": [1000], "name": "ch"}],
    "version": "1.0.0"
}

This contains two parameters controlling the staterror modifiers (+ mu_sig, which is of no relevance here).

Convert to xml+ROOT via pyhf json2xml, resulting in the following config/FitConfig_ch.xml:

<!DOCTYPE Channel SYSTEM '../HistFactorySchema.dtd'>

<Channel Name="ch" InputFile="data/data.root">
  <Data HistoName="histch_data" InputFile="data/data.root" />
  <Sample Name="signal" HistoName="histch_signal" InputFile="data/data.root" NormalizeByTheory="False">
    <NormFactor Name="mu_sig" Val="1" Low="0" High="10" />
    <StatError Activate="True" HistoName="histch_signal_staterror_ch_1" />
    <StatError Activate="True" HistoName="histch_signal_staterror_ch_2" />
    </Sample>
  </Channel>

I do not know whether this is technically valid HistFactory, but it converts fine with hist2workspace. I did not check whether one of the modifiers gets dropped, or if they get merged.

Convert back to JSON with pyhf xml2json:

{
    "channels": [
        {
            "name": "ch",
            "samples": [
                {
                    "data": [1000.0],
                    "modifiers": [
                        {"data": null, "name": "mu_sig", "type": "normfactor"},
                        {"data": [10.0], "name": "staterror_ch", "type": "staterror"},
                        {"data": [20.0], "name": "staterror_ch", "type": "staterror"}
                    ],
                    "name": "signal"
                }
            ]
        }
    ],
    "measurements": [
        {
            "config": {
                "parameters": [
                    {
                        "auxdata": [1.0],
                        "bounds": [[1.0, 1.0]],
                        "inits": [1.0],
                        "name": "lumi",
                        "sigmas": [0.0]
                    },
                    {"bounds": [[0.0, 10.0]], "inits": [1.0], "name": "mu_sig"}
                ],
                "poi": "mu_sig"
            },
            "name": "meas"
        }
    ],
    "observations": [{"data": [1000.0], "name": "ch"}],
    "version": "1.0.0"
}

The modifiers are still both there, but since the name is now the same for both, the first one is dropped from the model and only the second modifier enters. See e.g. pyhf inspect, which will show a single staterror modifier for this version, and two for the original.

File Upload (optional)

No response

Expected Results

Unclear to me what the expected behavior should be:

This was example was written as a targeted setup to test behavior, and may not be all that relevant in practice. shapesys modifiers can be used for this effect to achieve the same behavior (assuming customizable constraint term, as in https://github.com/scikit-hep/pyhf/pull/1829). It may make sense to only allow a single staterror modifier per sample and channel to avoid this issue altogether.

Actual Results

see above, roundtrip fails

pyhf Version

pyhf, version 0.6.4

Code of Conduct

kratsg commented 12 months ago

This will be changed as part of the HS3 migration, with documentation warning under staterror similar to what is currently done for shapesys. "Can only have one staterror per channel. If you want multiple staterrors, use shapefactor"

matthewfeickert commented 12 months ago

It is @cburgard's recommendation (and I'm fine/agree here) that having the multiple staterror should cause an error. So https://github.com/scikit-hep/pyhf/issues/1830#issuecomment-1846882345 is follow the idea that we should make this an error and then document this as we change towards using shapefactor.

alexander-held commented 12 months ago

Raising an error sounds very reasonable, great that this can move forward as part of HS3.