root-project / root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
https://root.cern
Other
2.7k stars 1.28k forks source link

[RF] Known bugs in RooFit JSON tool #9372

Closed guitargeek closed 2 years ago

guitargeek commented 2 years ago

PR https://github.com/root-project/root/pull/8944 was a good first step to get the RooWorkspace to JSON converter work for typical histfactory models, but this simple example code showcases that are are still significant bugs.

These bugs need to be fixed in the release, such that we can promote the JSON converter as a new feature.

#include "RooRealVar.h"
#include "RooSimultaneous.h"
#include "RooCategory.h"
#include "RooFitHS3/RooJSONFactoryWSTool.h"

void mySim()
{
   using namespace RooFit;

   // Import keys and factory expressions files for the RooJSONFactoryWSTool.
   std::string rootetcPath = gSystem->Getenv("ROOTSYS");
   RooJSONFactoryWSTool::loadExportKeys(
           rootetcPath + "/etc/root/RooFitHS3_wsexportkeys.json");
   RooJSONFactoryWSTool::loadFactoryExpressions(
           rootetcPath + "/etc/root/RooFitHS3_wsfactoryexpressions.json"
   );

   // Create a test model: RooSimultaneous with Gaussian in one component, and
   // product of two Gaussians in the other.
   RooRealVar x("x", "x", -8, 8);
   RooRealVar mean("mean", "mean", 0, -8, 8);
   RooRealVar sigma("sigma", "sigma", 0.3, 0.1, 10);
   RooGaussian g1("g1", "g1", x, mean, sigma);
   RooGaussian g2("g2", "g2", x, mean, RooConst(0.3));
   RooProdPdf model("model", "model", RooArgList{g1, g2});
   RooGaussian model_ctl("model_ctl", "model_ctl", x, mean, sigma);
   RooCategory sample("sample", "sample", {{"physics", 0}, {"control", 1}});
   RooSimultaneous simPdf("simPdf", "simultaneous pdf", sample);
   simPdf.addPdf(model, "physics");
   simPdf.addPdf(model_ctl, "control");

   // Export to JSON
   {
       RooWorkspace ws{"workspace"};
       ws.import(simPdf);
       RooJSONFactoryWSTool tool{ws};
       tool.exportJSON("simPdf.json");
       // Output can be pretty-printed with `python -m json.tool simPdf.json`
   }

   // Import JSON
   RooWorkspace ws{"workspace"};
   RooJSONFactoryWSTool tool{ws};
   tool.importJSON("simPdf.json");
   // At the moment this will fail, because the Gaussians in the product are
   // missing in the JSON dump!
}

The JSON dump will look like this:

{
    "functions": {},
    "pdfs": {
        "model": {
            "factors": [
                "g1",
                "g2"
            ],
            "name": "model",
            "tags": [
                "SnapShot_ExtRefClone"
            ],
            "type": "pdfprod"
        },
        "model_ctl": {
            "mean": "mean",
            "sigma": "sigma",
            "tags": [
                "SnapShot_ExtRefClone"
            ],
            "type": "Gaussian",
            "x": "x"
        },
        "simPdf": {
            "channels": {
                "model": {
                    "factors": [
                        "g1",
                        "g2"
                    ],
                    "name": "model",
                    "tags": [
                        "SnapShot_ExtRefClone"
                    ],
                    "type": "pdfprod"
                },
                "model_ctl": {
                    "mean": "mean",
                    "sigma": "sigma",
                    "tags": [
                        "SnapShot_ExtRefClone"
                    ],
                    "type": "Gaussian",
                    "x": "x"
                }
            },
            "tags": [
                "toplevel"
            ],
            "type": "simultaneous"
        }
    }
}

Here are the problems that need to be fixed:

  1. Parameter definition (in particular with ranges) is missing
  2. mode and model_ctl are duplicate (both in pdf level and as children of the RooSimultaneous). They should only be in the top level, while the RooSimultaneous only has the names, e.g.:
        "simPdf": {
            "channels": {
                "control": "model_ctl",
                "physics": "model"
            },
            "tags": [
                "toplevel"
            ],
            "type": "simultaneous"
        }
  3. The pdfs g1 and g2 are missing in the JSON.

These problems need to be fixed and a unit test should be written that verifies that the model above can be successfully serialized and de-serialized.

guitargeek commented 2 years ago

@cburgard, your help would be greatly appreciated here to ensure that the fixes are correctly implemented without breaking support for HistFactory models.

github-actions[bot] commented 2 years ago

Hi @guitargeek,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely, :robot: