umd-lhcb / plot_scripts

LHCb analysis plotting and table-making utilities
0 stars 0 forks source link

plot_scripts

Repository for LHCb analysis plotting and table-making utilities. The package allows to easily define the format of plots and load the appropriate ntuple with selected cuts and weights.

Cuts and weights can be provided with strings similar to those used by ROOT, eg mu_P/1000 > 3 && mu_PT/1000 > 0.5. Arithmetic and logical operators, parentheses, and vector operations are implemented. Other features such as functions, eg log() or abs(), may come in the future. But they can be provided directly with NamedFunc functions, described below. These all rely on defining the branch structure beforehand in txt/variables.

Originally created by Adam Dishaw at https://github.com/richstu/ra4_draw.

Setup and overview

Compilation requires c++11 and ROOT 6. To compile it and run a minimal example, type in your terminal

./compile.py && ./run/core/minimal_example.exe

The && simply concatenates commands if the first was successful. You will obtain a cute little plot plots/FitVar_Mmiss2d1000000__sigontop_lin.pdf based on a small ntuple committed with the project. Some times when the files in the src/core and inc/core folders are changed, and ./compile.py clean is needed.

In general, the scripts in this repo rely on the ntuples that are in the lhcb-ntuples-gen project, downloaded with git annex. See the RD(*) wiki for installation instructions. The figures included in this README were generated with src/rdx/example_plots_tables.cxx which uses this 576 MB ntuple. To run this and all the scripts that depend on the RD(*) ntuples you will need to set up a soft link named ntuples pointing at wherever you placed yoru lhcb-ntuples-gen/ntuples folder. For instance, if lhcb-ntuples-gen is in the same folder as plot_scripts

ln -s ../lhcb-ntuples-gen/ntuples
./compile.py && ./run/rdx/example_plots_tables.exe

Let us look at the code in src/core/minimal_example.cxx

  // Defining plot styles
  PlotOpt lin_lumi("txt/plot_styles.txt", "LHCbPaper");
  lin_lumi.Title(TitleType::data).Bottom(BottomType::ratio).Stack(StackType::signal_on_top);
  vector<PlotOpt> plottypes = {lin_lumi};
  Palette colors("txt/colors.txt", "default");

  // Defining processes (plot components)
  string ntuple = "various/ntuples/Dst--20_04_20--cutflow_mc--cocktail--2016--md--dv45-subset.root";
  vector<shared_ptr<Process> > procs;
  procs.push_back(Process::MakeShared<Baby_run2>("Data (actually MC)",Process::Type::data, colors("data"),
                                                 set<string>({ntuple}), "1"));
  procs.push_back(Process::MakeShared<Baby_run2>("MC: q^{2} < 7 GeV^{2}", Process::Type::background, colors("blue"),
                                                 set<string>({ntuple}), "FitVar_q2/1000000<7"));
  procs.push_back(Process::MakeShared<Baby_run2>("MC: q^{2} > 7 GeV^{2}", Process::Type::background, colors("green"),
                                                 set<string>({ntuple}), "FitVar_q2/1000000>7"));

  // Making plots. Missing mass plot is set to GeV^2 by dividing by 1e6
  PlotMaker pm;
  pm.Push<Hist1D>(Axis(75, -5, 10,"FitVar_Mmiss2/1000000", "m_{miss}^{2} [GeV^{2}]"), "1", procs, plottypes);
  pm.MakePlots(1); // The "1" is the luminosity scaling

The main components are

This plot uses an ntuple of type Baby_run2, whose branches and types are defined in txt/variables/run2. This file can be generated for new ntuples using the ntpdump script.

1D plots

The image below shows three examples that illustrate the main ways of stacking the different components in the plot as well as some of the possible formatting options

Stacking option plots

Main PlotOpt options (plot styles)

The plot styles are the first things to be defined. As shown in the minimal example above, you set the style by creating a PlotOpt object typically based on one of the standard formats defined in txt/plot_styles.txt. These set the font types and sizes for titles, labels, and legends, offsets, and canvas dimensions.

You typically define one PlotOpt object, and make others based on that one. You then put them into a vector and pass it to the PlotMaker, which then creates one plot per style. For instance,

  PlotOpt lin_lumi("txt/plot_styles.txt", "LHCbPaper");
  lin_lumi.Title(TitleType::data).Bottom(BottomType::pull).YAxis(YAxisType::linear).Stack(StackType::signal_on_top);
  PlotOpt log_lumi = lin_lumi().YAxis(YAxisType::log).Bottom(BottomType::ratio).Overflow(OverflowType::none);
  PlotOpt lin_shapes = lin_lumi().Stack(StackType::shapes).Title(TitleType::info).Bottom(BottomType::off);

  vector<PlotOpt> plottypes = {lin_lumi, log_lumi, lin_shapes};

The full list of options is in inc/core/plot_opt.hpp. Key options defined with enum in the c++ code are:

Main Process options (plot components)

Each Process object is pushed into a vector to define a plot component. They are of one of the types define by the tree structures in txt/variables. The full list of options is in inc/core/process.hpp. The input arguments are

For instance, here we use the same ntuple to define three components: muons at high p, muons at high pT, and all muons:

  string repofolder = "ntuples/";
  string run2bare = "0.9.0-cutflow/Dst-cutflow_mc/Dst--20_06_05--cutflow_mc--bare--MC_2016_Beam6500GeV-2016-MagDown-Nu1.6-25ns-Pythia8_Sim09b_Trig0x6138160F_Reco16_Turbo03_Stripping26NoPrescalingFlagged_11874091_ALLSTREAMS.DST.root";
  auto mup_high = Process::MakeShared<Baby_run2_bare>("p^{reco}(#mu) > 100 GeV", Process::Type::data, kBlack,
                                                      set<string>({repofolder+run2bare}), "mu_P>100000");
  auto mupt_high = Process::MakeShared<Baby_run2_bare>("p^{reco}_{T}(#mu) > 8 GeV", Process::Type::data, kBlue,
                                                       set<string>({repofolder+run2bare}), "mu_PT>8000");
  auto all_mu = Process::MakeShared<Baby_run2_bare>("MC", Process::Type::background, kBlack,
                                                    set<string>({repofolder+run2bare}), "1");
  mup_high->SetMarkerStyle(20);  mup_high->SetMarkerSize(0.4);
  mupt_high->SetMarkerStyle(21); mupt_high->SetMarkerSize(0.4);

  vector<shared_ptr<Process> > procs_mu = {mup_high, mupt_high, all_mu};

Main Hist1D options

The Hist1D objects are pushed into a PlotMaker and generate one plot per plot style defined by the vector of PlotOpt. Input arguments for the standard constructor are:

Hist1D(const Axis &xaxis, const NamedFunc &cut, const std::vector<std::shared_ptr<Process> > &processes,
       const std::vector<PlotOpt> &plot_options = {PlotOpt()}, const std::vector<NamedFunc> &weights = {"1"});

The full list of options is in inc/core/hist1d.hpp. Some of these options are illustrated below

Additional option plots

  pm.Push<Hist1D>(Axis(75, -5, 10,"FitVar_Mmiss2/1000000", "m_{miss}^{2} [GeV^{2}]"), "FitVar_q2/1000000>8",
                  procs_mm, plottypes).Tag("example").TopRight("#font[82]{TopRight} label").RatioTitle("Fake data","MC");
  pm.Push<Hist1D>(Axis(100, 0, 500,"spi_PT", "p_{T}^{reco}(#pi_{slow}) [MeV]",{300}), "spi_TRUEPT>0",
                  procs_comp_spi,plottypes_spi).TopRight("13 TeV").RightLabel({"Slow pion p_{T}"});
  pm.Push<Hist1D>(Axis(50, -0.5, 0.5,"(spi_TRUEPT-spi_PT)/spi_PT", "(p_{T}^{true}(#pi_{slow}) - p_{T}^{reco}(#pi_{slow}))/p_{T}^{reco}(#pi_{slow})",{-25/300., 50/300.}),
                  "1", procs_low_spi, plottypes_spi).TopRight("13 TeV").LeftLabel({"Slow pion", "p_{T} resolution"});

Some of the key options that can be changed as it is being pushed to PlotMaker are:

Inner workings and NamedFunc

When this project is compiled, src/core/generate_baby.cxx is compiled and executed first. This script reads all the tree structures from txt/variables and produces c++ classes named Baby_<filename> with functions calling each branch of each tree. This is done in an efficient way so that only branches needed for that event are loaded from disk. Even if a branch is used multiple times it is only read from disk once.

PlotMaker loops over each ntuple file just once, even if that file is used in multiple processes and multiple plots, so it is reasonable efficient. However, something may be wrong with the implemenation because time does increase with the number of plots faster than one would expect from CPU limitations. Perhaps NamedFunc are memory inefficient.

Cuts and weights are stored in NamedFunc. This is a flexible class that accepts strings in its constructor similar to the string used in ROOT, eg mu_P/1000 > 3 && mu_PT/1000 > 0.5. This string is parsed before looping over the events in the ntuples, so the loop itself is very fast. Arithmetic and logical operators, parentheses, and vector operations are implemented. Other features such as functions, eg log() or abs(), may come in the future.

One of the main features of NamedFunc is that you can mix the strings with custom c++ functions. For instance, the example below applies different trigger cuts depending on the name of the ntuple file, and makes a plot with a q2 > 8 cut given by the string (which is transformed to a NamedFunc) and the trigger cut given by the NamedFunc:

  //////// NamedFunc to apply different trigger to Run 1 and Run 2
  NamedFunc trigger("trigger",[&](const Baby &b){
    set<std::string> files = b.FileNames();
    TString firstfile = *files.cbegin();
    bool L0 = b.mu_L0Global_TIS() && (b.b0_L0Global_TIS() || b.dst_L0HadronDecision_TOS());
    if(firstfile.Contains("2011") || firstfile.Contains("2012"))
      return L0 && (b.Kplus_Hlt1TrackAllL0Decision_TOS() || b.piminus0_Hlt1TrackAllL0Decision_TOS())
        && b.D0_Hlt2CharmHadD02HH_D02KPiDecision_TOS();
    else
      return L0 && (b.k_Hlt1TrackMVALooseDecision_TOS() || b.pi_Hlt1TrackMVALooseDecision_TOS()
                    || b.d0_Hlt1TwoTrackMVADecision_TOS()) && b.d0_Hlt2XcMuXForTauB2XcMuDecision_Dec();
  });

  pm_mm.Push<Hist1D>(Axis(75, -5, 10,"FitVar_Mmiss2/1000000", "m_{miss}^{2} [GeV^{2}]"), 
                     "FitVar_q2/1000000>8" && trigger, procs_mm, plottypes);

Tables and pie charts

Both tables and pie charts and produced with the Table class. A simple table to optimize the BDTiso cut in the full would look like

  pm_mm.Push<Table>("optimization", vector<TableRow>{
      TableRow("All events","1", 0,2, "1"),
        TableRow("BDT$_\\text{iso}<0.10$",  "b0_ISOLATION_BDT < 0.10",0,0, "1"),
        TableRow("BDT$_\\text{iso}<0.15$",  "b0_ISOLATION_BDT < 0.15",0,0, "1"),
        TableRow("BDT$_\\text{iso}<0.20$",  "b0_ISOLATION_BDT < 0.20",0,1, "1"),
        TableRow("$m_\\text{miss}^2 > 3\\text{ GeV}^2$, BDT$_\\text{iso}<0.10$",
                 "FitVar_Mmiss2/1000000 > 3 && b0_ISOLATION_BDT < 0.10",0,0, "1"),
        TableRow("$m_\\text{miss}^2 > 3\\text{ GeV}^2$, BDT$_\\text{iso}<0.15$",
                 "FitVar_Mmiss2/1000000 > 3 && b0_ISOLATION_BDT < 0.15",0,0, "1"),
        TableRow("$m_\\text{miss}^2 > 3\\text{ GeV}^2$, BDT$_\\text{iso}<0.20$",
                 "FitVar_Mmiss2/1000000 > 3 && b0_ISOLATION_BDT < 0.20",0,0, "1"),
        }, procs_mm, true, true).TotColumn("None");//do_fom, do_unc

Optimization table

The constructor of Table is

  Table(const std::string &name, const std::vector<TableRow> &rows, const std::vector<std::shared_ptr<Process> > &processes,
        bool do_unc=false, bool do_fom=false, bool do_eff=false, bool print_table=true, bool print_pie=false, bool print_titlepie=true);

Two additional member functions of Table are useful:

An example of a cutflow table that prints the pie charts is

  ///////// Automatically appending cutflow cuts
  vector<NamedFunc> cuts = {"1", trigger, "FitVar_Mmiss2/1000000 > 3", "FitVar_q2/1000000 > 7", "b0_ISOLATION_BDT < 0.15"};
  vector<string> rownames = {"All events", "Trigger", "$m_\\text{miss}^2 > 3\\text{ GeV}^2$", "$q^2 > 7\\text{ GeV}^2$",
                             "BDT$_\\text{iso}<0.15$"};
  vector<TableRow> table_rows;
  NamedFunc fullcut = "1";
  for(size_t ind = 0; ind < cuts.size(); ind++) {
    string title = (ind==0 ? rownames[ind] : "+ " + rownames[ind]);
    int lines = (ind==0 ? 1 : 0);
    fullcut = fullcut && cuts[ind];
    table_rows.push_back(TableRow(title,fullcut, 0,lines, "1"));
  }
  pm_mm.Push<Table>("pie", table_rows, procs_mm, true, true, true, true, true, true); //do_fom, do_unc, do_eff, print_table,print_pie,print_titlepie

Cutflow table

Pie charts

2D plots

Many of the options used to build a Hist2D are the same as those described above for the Hist1D, except that Hist2D takes two Axis and that you should define the marker properties on the processes. Here is an example

  pm_mu.Push<Hist2D>(Axis(55, -0.6, 0.5, "mu_TRUEP_X/mu_TRUEP_Z", "p_{x}^{true}/p_{z}^{true}(#mu)", {-0.38, 0.38}),
                  Axis(38, -0.38, 0.38, "mu_TRUEP_Y/mu_TRUEP_Z", "p_{y}^{true}/p_{z}^{true}(#mu)", {-0.28, 0.28}),
                  "1", procs_mu, scattertype).TopRight("");

Scatter plot

The optional underlying histogram is typically used to show how the background is expected to be distributed. The markers on the top are often the actual data. BEWARE, the plot is saved by default as .pdf file, so the location of each marker is kept in the file. If you plot many events in the scatter plot, the file can be huge and unmanageable. This is not an issue for the underlying histogram.

Event scans

This functionality emulates ROOT's TTree::Scan but with the added power of user-defined functions via NamedFunc, the ability to quickly print all events to a file, and the flexibility of having it in the same package as your plots and tables.

For example, the code below prints to a file the run and event numbers as well as the MC IDs of various particles for all events with missing mass greater than 8 GeV^2 that are not truthmatched to signal, normalization, or D**.

  pm_mm.Push<EventScan>("eventscan", !is_dsptau && !is_dspmu && !is_dss && "FitVar_Mmiss2/1000000 > 8", 
                        vector<NamedFunc>{"runNumber", "eventNumber", "mu_MC_MOTHER_ID", "d0_MC_MOTHER_ID",
                                            "d0_MC_GD_MOTHER_ID", "d0_MC_GD_GD_MOTHER_ID"}, procs_mm).Precision(10);

The Precision option controls the digits of precision for float variables and the width of the columns. The output file contains lists the values of the selected variables for all 287 events that pass the selection:

      Row        runNumber      eventNumber  mu_MC_MOTHER_ID  d0_MC_MOTHER_ID d0_MC_GD_MOTHER_ d0_MC_GD_GD_MOTH
        0          5616918            70792              221             -413              511              513
        1          5616835             3433              521             -413           -10413             -511
        2          5616942            84858              313              413              511              513
        3          5616942            84858              313              413              511              513
        4          5617022           132703              211             -413              511              513
      ...