[RF] RooFit - Pythonic interaction with the RooWorkspace

Pythonic interaction with the RooWorkspace

This issue tracks the progress on the GSoC project on the Pythonic interaction with the RooWorkspace: https://hepsoftwarefoundation.org/gsoc/2023/proposal_RooFit-RooWorkspacePythonization.html

This project was assigned to @yashnator.

Milestones and TODOs

[x] Pythonic way to use the RooWorkspace factory language using __setitem__ on the workspace (#12911)
[x] Enable creation of pdf, functions and variables from Python dictionaries passed to RooWorksapce.__setitem__ (#12994)
[x] Implement automatic loading of JSON IO keys (#13152)
[ ] Support creation of binned datasets from dictionaries as described in this comment
[ ] Move all logic except for dict to string conversion to the C++ side
[ ] Avoid using nlohmann-json directly, but use RooFits JSONInterface instead

Merged PRs

One of the project goals is to support setting up the workspace for likelihood fits purely from Python dictionaries, without using RooFit objects or JSON string literals.

One good target for this is the creation of HistFactory models, which can be done by importing a full HS3 JSON as described here in this tutorial: https://root.cern/doc/master/rf515__hfJSON_8py.html

With the PRs that were already merged, creating the HistFactory pdfs from dictionaries already works. But the dataset specification still must go over string literals, as shown in this simplified version of the tutorial:

# Simplified version of the HistFactory JSON IO tutorial:
# https://root.cern/doc/master/rf515__hfJSON_8py.html
# You can also find it in the tutorials/roofit folder of the ROOT repo.

import ROOT

# Python dictionary specifying the model pdf
model_channel1 = {
    "axes": [{"name": "obs_x_channel1", "max": 2.0, "min": 1.0, "nbins": 2}],
    "samples": [
        {
            "data": {"contents": [20, 10]},
            "modifiers": [
                {"data": {"hi": 1.05, "lo": 0.95}, "name": "syst1", "type": "normsys"},
                {"name": "mu", "type": "normfactor"},
            ],
            "name": "signal",
        },
        {
            "data": {"contents": [100, 0], "errors": [5, 0]},
            "modifiers": [
                {"data": {"hi": 1.05, "lo": 0.95}, "name": "syst2", "type": "normsys"},
                {"name": "mcstat", "type": "staterror"},
            ],
            "name": "background1",
        },
        {
            "data": {"contents": [0, 100], "errors": [0, 10]},
            "modifiers": [
                {"data": {"hi": 1.05, "lo": 0.95}, "name": "syst3", "type": "normsys"},
                {"name": "mcstat", "type": "staterror"},
            ],
            "name": "background2",
        },
    ],
    "type": "histfactory_dist",
}

# Python dictionary specifying the binned dataset
observed_channel1 = {
    "axes": [{"name": "obs_x_channel1", "nbins": 2, "min": 1, "max": 2}],
    "contents": [122, 112],
    "type": "binned",
}

# Creating an empty workspace
ws = ROOT.RooWorkspace("workspace")

# Importing the HistFactory pdf from a dictionary specification already works!
ws["model_channel1"] = model_channel1

# It would be nice if the user can also specify the datasets like this, such
# that no string literals are necessary to specify everything necessary for the
# likelihood analysis (note this doesn't work yet):
#
#     ws["observed_channel1"] = observed_channel1

# Right now, the only way to import dataset via the JSON IO is to read a full
# HS3 JSON:
ROOT.RooJSONFactoryWSTool(ws).importJSONfromString(
    """
{
    "distributions": [
    ],
    "data": [
        {
            "name": "observed_channel1",
            "axes": [
                {
                    "name": "obs_x_channel1",
                    "nbins": 2,
                    "min": 1,
                    "max": 2
                }
            ],
            "contents": [122, 112],
            "type": "binned"
        }
    ]
}
"""
)

# Both the model_channel1 and the observed_channel1 should be in the workspace now.
ws.Print()

pdf = ws["model_channel1"]
data = ws["observed_channel1"]

# Fit the model pdf to the data to see if things work
result = pdf.fitTo(data, Save=True, PrintLevel=-1)
result.Print()

This workflow should be supported without string literals, meaning it would be good to also support the creation of binned datasets from dictionaries.

root-project / root