Load data summary - Githubissues

pyhf / pyhf-benchmark

Benchmarking of hardware acceleration of pyhf

Apache License 2.0

3 stars 1 forks source link

Load data summary #7

Open coolalexzb opened 4 years ago

coolalexzb commented 4 years ago

Currently, I want to make a summary of loading data. This is important to complete an automatic process.

Situation 1: background + signal (BkgOnly.json + patchset.json) e.g. BkgOnly.json + patchset.json

    patchset = pyhf.PatchSet(json.loads('patchset.json'))
    workspace = pyhf.workspace.Workspace(json.loads('BkgOnly.json'))
    model = workspace.model(
        measurement_name=None,
        patches=patchset[model_point],
        modifier_settings={
            "normsys": {"interpcode": "code4"},
            "histosys": {"interpcode": "code4p"},
        },
    )

Situation 2: workspace.json e.g. HVTWZ_3500.json

    workspace = pyhf.workspace.Workspace(json.loads('HVTWZ_3500.json'))
    model = workspace.model()

Situation 3: BkgOnly.json

    workspace = pyhf.workspace.Workspace(json.loads('BkgOnly.json'))
    model = workspace.model()

If there is any problem over the examples I wrote or any situations I ignored, feel free to add notes here. Thank you!

matthewfeickert commented 4 years ago

All that is needed to create a model is a valid spec (c.f. pyhf.pdf.Model API) and a valid spec is anything that passes pyhf inspect. So while using pyhf.workspace.Workspace is a very natural and normal way for us to do things it is not explicitly required.

For further reference, the pyhf model JSON Schema is linked to in the first reference in the Likelihood Specifiaction bibliography (v1.0.0 of the schema that is).

pyhf.workspace.Workspace doesn't accept a file path though, so all the examples above that pass in a string aren't valid.

coolalexzb commented 4 years ago

Oh, I made an error in Situation 2 and 3. The input should be a Json type. I have modified them

coolalexzb commented 4 years ago

If we initialize workspace in the way like situation 2 and 3 (we don't have a patchset.json in these situations), can we get data by workspace.data(model) successfully in further steps? Is there any situation that I did not take into consideration? @matthewfeickert

matthewfeickert commented 4 years ago

If we initialize workspace in the way like situation 2 and 3 (we don't have a patchset.json in these situations), can we get data by workspace.data(model) successfully in further steps?

The patchset just gives the signal contribution to the statistical model. The data (observations + auxiliary data) are not part of the model and so pyhf.workspace.Workspace.data() doesn't care if the model object does or doesn't have a signal component.

This is maybe made a bit more clear by just looking at the source for pyhf.workspace.Workspace.data which is very short.

Is there any situation that I did not take into consideration?

I think this is fine. :+1: There are other ways to make models, but for what we're doing this covers all the cases we care about.