root-project / root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
https://root.cern
Other
2.53k stars 1.24k forks source link

[RF] RooFit - Pythonic interaction with the RooWorkspace #13185

Open guitargeek opened 1 year ago

guitargeek commented 1 year ago

Pythonic interaction with the RooWorkspace

This issue tracks the progress on the GSoC project on the Pythonic interaction with the RooWorkspace: https://hepsoftwarefoundation.org/gsoc/2023/proposal_RooFit-RooWorkspacePythonization.html

This project was assigned to @yashnator.

Milestones and TODOs

Merged PRs

  1. https://github.com/root-project/root/pull/12911
  2. https://github.com/root-project/root/pull/12994
  3. https://github.com/root-project/root/pull/13152
  4. https://github.com/root-project/root/pull/13150
guitargeek commented 1 year ago

One of the project goals is to support setting up the workspace for likelihood fits purely from Python dictionaries, without using RooFit objects or JSON string literals.

One good target for this is the creation of HistFactory models, which can be done by importing a full HS3 JSON as described here in this tutorial: https://root.cern/doc/master/rf515__hfJSON_8py.html

With the PRs that were already merged, creating the HistFactory pdfs from dictionaries already works. But the dataset specification still must go over string literals, as shown in this simplified version of the tutorial:

# Simplified version of the HistFactory JSON IO tutorial:
# https://root.cern/doc/master/rf515__hfJSON_8py.html
# You can also find it in the tutorials/roofit folder of the ROOT repo.

import ROOT

# Python dictionary specifying the model pdf
model_channel1 = {
    "axes": [{"name": "obs_x_channel1", "max": 2.0, "min": 1.0, "nbins": 2}],
    "samples": [
        {
            "data": {"contents": [20, 10]},
            "modifiers": [
                {"data": {"hi": 1.05, "lo": 0.95}, "name": "syst1", "type": "normsys"},
                {"name": "mu", "type": "normfactor"},
            ],
            "name": "signal",
        },
        {
            "data": {"contents": [100, 0], "errors": [5, 0]},
            "modifiers": [
                {"data": {"hi": 1.05, "lo": 0.95}, "name": "syst2", "type": "normsys"},
                {"name": "mcstat", "type": "staterror"},
            ],
            "name": "background1",
        },
        {
            "data": {"contents": [0, 100], "errors": [0, 10]},
            "modifiers": [
                {"data": {"hi": 1.05, "lo": 0.95}, "name": "syst3", "type": "normsys"},
                {"name": "mcstat", "type": "staterror"},
            ],
            "name": "background2",
        },
    ],
    "type": "histfactory_dist",
}

# Python dictionary specifying the binned dataset
observed_channel1 = {
    "axes": [{"name": "obs_x_channel1", "nbins": 2, "min": 1, "max": 2}],
    "contents": [122, 112],
    "type": "binned",
}

# Creating an empty workspace
ws = ROOT.RooWorkspace("workspace")

# Importing the HistFactory pdf from a dictionary specification already works!
ws["model_channel1"] = model_channel1

# It would be nice if the user can also specify the datasets like this, such
# that no string literals are necessary to specify everything necessary for the
# likelihood analysis (note this doesn't work yet):
#
#     ws["observed_channel1"] = observed_channel1

# Right now, the only way to import dataset via the JSON IO is to read a full
# HS3 JSON:
ROOT.RooJSONFactoryWSTool(ws).importJSONfromString(
    """
{
    "distributions": [
    ],
    "data": [
        {
            "name": "observed_channel1",
            "axes": [
                {
                    "name": "obs_x_channel1",
                    "nbins": 2,
                    "min": 1,
                    "max": 2
                }
            ],
            "contents": [122, 112],
            "type": "binned"
        }
    ]
}
"""
)

# Both the model_channel1 and the observed_channel1 should be in the workspace now.
ws.Print()

pdf = ws["model_channel1"]
data = ws["observed_channel1"]

# Fit the model pdf to the data to see if things work
result = pdf.fitTo(data, Save=True, PrintLevel=-1)
result.Print()

This workflow should be supported without string literals, meaning it would be good to also support the creation of binned datasets from dictionaries.