scikit-hep / pyhf

pure-Python HistFactory implementation with tensors and autodiff
https://pyhf.readthedocs.io/
Apache License 2.0
276 stars 82 forks source link

Extended workspace schema for plotting #1007

Open alexander-held opened 3 years ago

alexander-held commented 3 years ago

Question

The current pyhf schema disallows additional channel properties beyond name and samples:

https://github.com/scikit-hep/pyhf/blob/d8be1d121777d52babe018c453d05588d9122395/src/pyhf/schemas/1.0.0/defs.json#L66-L74

What do you think about relaxing this requirement to permit writing additional properties into it? With an added property for the name of the variable the channel is binned in and an array of the bins, it would be possible to e.g. visualize data/MC distributions and effects of systematic uncertainties wrt. nominal (using only the extended workspace as input).

This extra info has no relevance to the statistical inference, so plots could be done without these cosmetics too (which is possible at the moment already).

It might make more sense to separately define a schema for plots. It is also not clear to me whether best-fit results would naturally fit into the workspace (to allow visualizing post-fit distributions), since the workspace feels like a "pre-fit" object. The majority of the information in the workspace is relevant to plots though, so any new format for specifying plots for data/MC distributions would duplicate a lot of information.

Relevant Issues and Pull Requests

none I'm aware of

kratsg commented 3 years ago

Personally, I'd prefer a separate JSON file. Likelihoods should be seen as a pre-fit configuration. Plots were always intended to be separated out from pyhf, so plot configurations should be a separate file that can still reference the likelihood (e.g. via a digest/shasum).