scikit-hep / pyhf

pure-Python HistFactory implementation with tensors and autodiff
https://pyhf.readthedocs.io/
Apache License 2.0
281 stars 83 forks source link

Modifying parts of an existing model #1557

Open alexander-held opened 3 years ago

alexander-held commented 3 years ago

Question

I noticed that it is possible to modify config.channels of an existing Model. That leads to some functionality taking into account the change, and other functionality breaking. It seems unreasonable to expect that a modified model still works, and it is simple enough to build a new model when needed. Given that users may accidentally modify the model, what do you think about making the relevant properties read-only?

Below is an example to demonstrate the behavior with a generic two-channel workspace:

spec = {
    "channels": [
        {
            "name": "region_1",
            "samples": [
                {
                    "data": [25, 5],
                    "modifiers": [
                        {
                            "data": None,
                            "name": "Signal strength",
                            "type": "normfactor",
                        },
                    ],
                    "name": "Signal",
                }
            ],
        },
        {
            "name": "region_2",
            "samples": [
                {
                    "data": [8],
                    "modifiers": [
                        {
                            "data": None,
                            "name": "Signal strength",
                            "type": "normfactor",
                        },
                    ],
                    "name": "Signal",
                }
            ],
        },
    ],
    "measurements": [
        {"config": {"parameters": [], "poi": "Signal strength"}, "name": "example"}
    ],
    "observations": [
        {"data": [35, 8], "name": "region_1"},
        {"data": [10], "name": "region_2"},
    ],
    "version": "1.0.0",
}

import pyhf, pyhf.infer

ws = pyhf.Workspace(spec)
model = ws.model()

data = ws.data(model, with_aux=False)
print(data)
print(pyhf.infer.mle.fit(data, model))

model.config.channels.pop()

data = ws.data(model, with_aux=False)
print(data)  # still works, but only shows data for one channel
print(pyhf.infer.mle.fit(data, model))  # breaks

Running this results in

[35, 8, 10]
[1.39473815]
[35, 8]
Eval failed for data [35.0, 8.0] pars: [1.0]
Traceback (most recent call last):
  File "[...]/pyhf/src/pyhf/pdf.py", line 822, in logpdf
    raise exceptions.InvalidPdfData(
pyhf.exceptions.InvalidPdfData: eval failed as data has len 2 but 3 was expected

with a longer traceback I am skipping here. The data extraction respects the fact that a channel was removed, but the fit fails.

Users may come across something like this if they want a subset of the channels and accidentally write the result back into model.config. I have not tested the modification of other aspects of model.config.

Relevant Issues and Pull Requests

none I am aware of

kratsg commented 3 years ago

How would you suggest making it read-only? The only way I see this being possible is if we hand copies in return e.g.

@property
def channels(self):
  return [*self.channels]
alexander-held commented 3 years ago

That is the option I thought of as well. Alternatively you could also of course keep the current approach where users just need to ensure they do not modify the model config, but that is likely not a common occurrence.