umd-lhcb / lhcb-ntuples-gen

ntuples generation with DaVinci and in-house offline components
BSD 2-Clause "Simplified" License
1 stars 0 forks source link

Validate global and skim cuts for wrong-sign (comb. bkg.) templates #96

Closed yipengsun closed 2 years ago

yipengsun commented 2 years ago

This is more complicated than I thought, so I decide to start with a new issue. I'll use the top post to track the main progress

yipengsun commented 2 years ago

Note the definitions of wrong-sign-related variables in Phoebe's AddB.C:

    IDprod = (double)muplus_ID*D0_ID;
    DstIDprod = (double)D0_ID*piminus_ID;
yipengsun commented 2 years ago

The h_comb histogram are filled as:

  if((DstIDprod > 0 && IDprod < 0 &&  ((muPID < 1.) || (muPID > 0. && totWeight == 1.)) && isData > 0. && TMath::Abs(totWeight) <= 1.) && flagDstSB==0.) 
  {
    //if(muPID==1) totWeight*=(1.175-0.35*(Y_M-2116.)/(5280.-2116.));
    //cout << totWeight << endl;
    /*if(muPID==1)*/ totWeight=(1.29-Y_M*4.5e-5);
    //if(muPID < 1) totWeight*=-1.*totWeight_jack/0.08;
    double totWeightsmearpi=(1.29-Y_Msmearpi*4.5e-5);
    double totWeightsmearpi_eCut=(1.29-Y_Msmearpi_eCut*4.5e-5);
    double totWeightsmearpi0=(1.29-Y_Msmearpi0*4.5e-5);
    double totWeightsmearpi0_eCut=(1.29-Y_Msmearpi0_eCut*4.5e-5);
    double totWeightsmearK=(1.29-Y_MsmearK*4.5e-5);
    double totWeightsmearK_eCut=(1.29-Y_MsmearK_eCut*4.5e-5);
    double totWeightsmearK0=(1.29-Y_MsmearK0*4.5e-5);
    double totWeightsmearK0_eCut=(1.29-Y_MsmearK0_eCut*4.5e-5);
    double wp=1;
    double wm=1;
    //if(muPID < 1 && !TMath::IsNaN(totWeight_jack)) totWeight=-1.*totWeight_jack/0.08;
    if(rwtcut && nomAcc) h_comb_rwt->Fill(rwt_x,rwt_y,rwt_z,totWeight);
    //h_comb->Fill(m_nu1,El,q2,totWeight);h_DOCA_comb->Fill(DOCAVAR,totWeight);
    if(muPID > 0)
    {
      h_comb_nosmear->Fill(m_nu1,El,q2,totWeight);
      h_comb->Fill(m_nu1,El,q2,totWeight);
      evtUsed=true;
    }
    else if (!TMath::IsNaN(totWeight_jack))
    {
      evtUsed=true;
      h_comb->Fill(m_nu1,El,q2,-1*(1-piprob-kprob)*totWeight*totWeight_jack/0.08);
      h_comb_nosmear->Fill(m_nu1,El,q2,-1*totWeight*totWeight_jack/0.08);
      if(use_uBDT)
      {
        h_comb->Fill(m_nu1smearpi,Elsmearpi,q2smearpi,-1*piprob*totWeightsmearpi*totWeight_jack/0.08);
        h_comb->Fill(m_nu1smearK,ElsmearK,q2smearK,-1*kprob*totWeightsmearK*totWeight_jack/0.08);
      }
      if(use_uBDTeCut)
      {
        h_comb->Fill(m_nu1smearpi_eCut,Elsmearpi_eCut,q2smearpi_eCut,-1*piprob*totWeightsmearpi_eCut*totWeight_jack/0.08);
        h_comb->Fill(m_nu1smearK_eCut,ElsmearK_eCut,q2smearK_eCut,-1*kprob*totWeightsmearK_eCut*totWeight_jack/0.08);
      }
      else if(use_notuBDT)
      {
        h_comb->Fill(m_nu1smearpi0,Elsmearpi0,q2smearpi0,-1*piprob*totWeightsmearpi0*totWeight_jack/0.08);
        h_comb->Fill(m_nu1smearK0,ElsmearK0,q2smearK0,-1*kprob*totWeightsmearK0*totWeight_jack/0.08);
      }
      else if(use_notuBDTeCut)
      {
        h_comb->Fill(m_nu1smearpi0_eCut,Elsmearpi0_eCut,q2smearpi0_eCut,-1*piprob*totWeightsmearpi0_eCut*totWeight_jack/0.08);
        h_comb->Fill(m_nu1smearK0_eCut,ElsmearK0_eCut,q2smearK0_eCut,-1*kprob*totWeightsmearK0_eCut*totWeight_jack/0.08);
      }
      else
      {
        h_comb->Fill(m_nu1smearpi,Elsmearpi,q2smearpi,-1*0.5*piprob*totWeightsmearpi*totWeight_jack/0.08);
        h_comb->Fill(m_nu1smearpi0,Elsmearpi0,q2smearpi0,-1*0.5*piprob*totWeightsmearpi0*totWeight_jack/0.08);
        h_comb->Fill(m_nu1smearK0,ElsmearK0,q2smearK0,-1*0.5*kprob*totWeightsmearK0*totWeight_jack/0.08);
        h_comb->Fill(m_nu1smearK,ElsmearK,q2smearK,-1*0.5*kprob*totWeightsmearK*totWeight_jack/0.08);
      }
    }
yipengsun commented 2 years ago

Note the following:

The h_data global cuts:

  if((DstIDprod > 0 && IDprod > 0 && muPID > 0. && flagDstSB==0.) && isData > 0. /*&& flagGhost < 1. && DstOk > 0.*/) 

The h_comb global cuts:

  if((DstIDprod > 0 && IDprod < 0 &&  ((muPID < 1.) || (muPID > 0. && totWeight == 1.)) && isData > 0. && TMath::Abs(totWeight) <= 1.) && flagDstSB==0.) 

Also note that I haven't found any evidence that SKIM CUTS are different between data and comb. bkg.

I have the following questions:

  1. According to ANA, sec. 86, p. 82, Phoebe chose to REMOVE contributions from misID Mu in the comb. bkg. But looking at the code above, it seems that she is ADDING contributions?

    Well, the weights ARE NEGATIVE!

  2. If we disregard the totWeight_jack related things, then h_comb and h_data indeed have very similar, if not identical, cuts. But by applying only these cuts + fit variable range cuts, I'm already having fewer events than reported:

    After applying isData > 0 && DstIDprod > 0 && IDprod < 0 && muPID > 0 && m_nu1 >= -2.0 && m_nu1 <= 10.9 && El >= 0.1e3 && El <= 2.65e3 && q2 >= -0.4e6 && q2 <= 12.6e6: 57,133
        After applying ISO skim cut: 14,238 (+3,019, +21.2%)
        After applying 1OS skim cut: 1,850 (-7,883, -426.1%)
        After applying 2OS skim cut: 2,222 (-7,452, -335.4%)
        After applying  DD skim cut: 10,273 (-329, -3.2%)
  3. The reference numbers are:

    DST_COMB_REF_NUMS = {
        'ISO': 11219,
        '1OS': 9733,
        '2OS': 9674,
        'DD': 10602,
    }

    These are extracted from h_comb templates for D*. Am I using the right one? It's suspicious that ISO and 2OS numbers are very similar.

yipengsun commented 2 years ago

I'm trying to check this from a different perspective: Recreating ANA, fig. 8: ref_comb_bkg

Note that I'm using Phoebe's latest step-1.5 ntuples (both D0 and D*), which contains ALL data and comb. bkg.

I'm applying ISO skim cuts, on top of all global offiline cuts (minus B mass requirement).

More explanations:

D0_B_mass_data_vs_comb D0_B_mass_data_vs_comb_log Dst_B_mass_data_vs_comb Dst_B_mass_data_vs_comb_log

yipengsun commented 2 years ago

Here's the inputs that I used:

yipengsun commented 2 years ago

@Svende after more thoughts, I think I have a better understanding on the IDprod and DstIDprod, tabulated below:

- DstIDprod > 0 DstIDprod < 0
IDprod > 0 normal data wrong-sign slow Pi
IDprod < 0 wrong-sign Mu forbidden at DaVinci level
Svende commented 2 years ago

Yes, that makes sense. I would rather call the normal data OS and instead of wrong-sign SS, whatever is more intuitive. I would think of IDprod as a variable to select B-combinatorial as WS mu (SS mu) and DstIDprod to select for D* combinatorial such as D0 pi- (SS pi) or what you call WS slow pion above.

yipengsun commented 2 years ago

Trying to work on the wrong-sign Pi sample.Extracted references numbers from the h_doug histos from Phoebe's latest public templates.Still missing a lot of events, using GetEntries():

Working on Dst wrong-sign slow Pi...
Cuts we are about to apply:
    isData > 0 && DstIDprod < 0 && IDprod > 0 && muPID > 0 && m_nu1 >= -2.0 && m_nu1 <= 10.9 && El >= 0.1e3 && El <= 2.65e3 && q2 >= -0.4e6 && q2 <= 12.6e6 && L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)) && !muVeto && DLLe < 1.0 && BDTmu > 0.25 && mu_P > 3.0e3 && mu_P < 100.0e3 && mu_ETA > 1.7 && mu_ETA < 5.0 && dxy < 7.0 && Y_M < 5280.0 && abs(Dst_M-D0_M-145.454) < 2.0 && !(reweighting_69_gen3_pt2 < 0.01 || reweighting_89_gen3_pt2 < 0.01)
The reference templates have the following entries:
    ISO: 17,997
    1OS: 9,972
    2OS: 9,994
     DD: 11,843
Before applying any cut: 6,452,737
After applying isData > 0 && DstIDprod < 0 && IDprod > 0 && muPID > 0 && m_nu1 >= -2.0 && m_nu1 <= 10.9 && El >= 0.1e3 && El <= 2.65e3 && q2 >= -0.4e6 && q2 <= 12.6e6: 119,355
    After applying ISO skim cut: 57,321 (+39,324, +68.6%)
    After applying 1OS skim cut: 3,430 (-6,542, -190.7%)
    After applying 2OS skim cut: 3,501 (-6,493, -185.5%)
    After applying  DD skim cut: 16,809 (+4,966, +29.5%)
After applying L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)): 110,215
    After applying ISO skim cut: 53,248 (+35,251, +66.2%)
    After applying 1OS skim cut: 3,100 (-6,872, -221.7%)
    After applying 2OS skim cut: 3,200 (-6,794, -212.3%)
    After applying  DD skim cut: 15,387 (+3,544, +23.0%)
After applying !muVeto && DLLe < 1.0 && BDTmu > 0.25 && mu_P > 3.0e3 && mu_P < 100.0e3 && mu_ETA > 1.7 && mu_ETA < 5.0: 91,920
    After applying ISO skim cut: 47,019 (+29,022, +61.7%)
    After applying 1OS skim cut: 2,663 (-7,309, -274.5%)
    After applying 2OS skim cut: 2,398 (-7,596, -316.8%)
    After applying  DD skim cut: 11,878 (+35, +0.3%)
After applying dxy < 7.0 && Y_M < 5280.0: 91,736
    After applying ISO skim cut: 46,963 (+28,966, +61.7%)
    After applying 1OS skim cut: 2,658 (-7,314, -275.2%)
    After applying 2OS skim cut: 2,395 (-7,599, -317.3%)
    After applying  DD skim cut: 11,838 (-5, -0.0%)
After applying abs(Dst_M-D0_M-145.454) < 2.0: 16,211
    After applying ISO skim cut: 8,192 (-9,805, -119.7%)
    After applying 1OS skim cut: 450 (-9,522, -2116.0%)
    After applying 2OS skim cut: 432 (-9,562, -2213.4%)
    After applying  DD skim cut: 2,178 (-9,665, -443.8%)
After applying !(reweighting_69_gen3_pt2 < 0.01 || reweighting_89_gen3_pt2 < 0.01): 16,211
    After applying ISO skim cut: 8,192 (-9,805, -119.7%)
    After applying 1OS skim cut: 450 (-9,522, -2116.0%)
    After applying 2OS skim cut: 432 (-9,562, -2213.4%)
    After applying  DD skim cut: 2,178 (-9,665, -443.8%)

Perhaps Phoebe also removed the misID contributions from the h_doug templates? I'll try GetIntegral and see if that works.

yipengsun commented 2 years ago

@manuelfs When I do ComputetIntegral, I get the following output:

// _file0 is the ISO D* sample
root [1] auto histo = (TH1*)_file0->Get("h_doug")

// GetEntries is fine
root [3] histo->GetEntries()
(double) 17997.000

// Looks like the histogram is already normalized
root [7] histo->ComputeIntegral()
(double) 1.0000000

I think Phoebe has normalized the histograms, so we can't know the effective number (w/ misID contributions removed) from the histo directly?

The resulting integral is normalized to 1 If the routine is called with the onlyPositive flag set an error will be produced in case of negative bin content and a NaN value returned

Don't know why in this case the result is 1.0, as the onlyPositive defaults to false.

The return is defined as:

 return fIntegral[nbins];

So the return value should be the total integral! I don't know why here's it's 1 but the Integral method returns a sensible number (which calls DoIntegral internally).

Note to myself: GetIntegral computes the integral of each bin and returns the pointer of that array.

yipengsun commented 2 years ago

Oh, but the histo->Integral() gives a meaningful number: 10197.4, but that is larger than my final number 8192. Maybe I'm missing cuts? It's the other way around, maybe I'm already applying too much cuts?

yipengsun commented 2 years ago

Switching to use Integral() numbers as reference, for Wrong-sign slow Pi:

Working on Dst wrong-sign slow Pi...
Cuts we are about to apply:
    isData > 0 && DstIDprod < 0 && IDprod > 0 && muPID > 0 && m_nu1 >= -2.0 && m_nu1 <= 10.9 && El >= 0.1e3 && El <= 2.65e3 && q2 >= -0.4e6 && q2 <= 12.6e6 && L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)) && !muVeto && DLLe < 1.0 && BDTmu > 0.25 && mu_P > 3.0e3 && mu_P < 100.0e3 && mu_ETA > 1.7 && mu_ETA < 5.0 && dxy < 7.0 && Y_M < 5280.0 && abs(Dst_M-D0_M-145.454) < 2.0 && !(reweighting_69_gen3_pt2 < 0.01 || reweighting_89_gen3_pt2 < 0.01)
The reference templates have the following entries:
    ISO: 10,197.4
    1OS: 481
    2OS: 589
     DD: 2,064
Before applying any cut: 6,452,737
After applying isData > 0 && DstIDprod < 0 && IDprod > 0 && muPID > 0 && m_nu1 >= -2.0 && m_nu1 <= 10.9 && El >= 0.1e3 && El <= 2.65e3 && q2 >= -0.4e6 && q2 <= 12.6e6: 119,355
    After applying ISO skim cut: 57,321 (+47,123.6, +82.2%)
    After applying 1OS skim cut: 3,430 (+2,949, +86.0%)
    After applying 2OS skim cut: 3,501 (+2,912, +83.2%)
    After applying  DD skim cut: 16,809 (+14,745, +87.7%)
After applying L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)): 110,215
    After applying ISO skim cut: 53,248 (+43,050.6, +80.8%)
    After applying 1OS skim cut: 3,100 (+2,619, +84.5%)
    After applying 2OS skim cut: 3,200 (+2,611, +81.6%)
    After applying  DD skim cut: 15,387 (+13,323, +86.6%)
After applying !muVeto && DLLe < 1.0 && BDTmu > 0.25 && mu_P > 3.0e3 && mu_P < 100.0e3 && mu_ETA > 1.7 && mu_ETA < 5.0: 91,920
    After applying ISO skim cut: 47,019 (+36,821.6, +78.3%)
    After applying 1OS skim cut: 2,663 (+2,182, +81.9%)
    After applying 2OS skim cut: 2,398 (+1,809, +75.4%)
    After applying  DD skim cut: 11,878 (+9,814, +82.6%)
After applying dxy < 7.0 && Y_M < 5280.0: 91,736
    After applying ISO skim cut: 46,963 (+36,765.6, +78.3%)
    After applying 1OS skim cut: 2,658 (+2,177, +81.9%)
    After applying 2OS skim cut: 2,395 (+1,806, +75.4%)
    After applying  DD skim cut: 11,838 (+9,774, +82.6%)
After applying abs(Dst_M-D0_M-145.454) < 2.0: 16,211
    After applying ISO skim cut: 8,192 (-2,005.3999999999996, -24.5%)
    After applying 1OS skim cut: 450 (-31, -6.9%)
    After applying 2OS skim cut: 432 (-157, -36.3%)
    After applying  DD skim cut: 2,178 (+114, +5.2%)
After applying !(reweighting_69_gen3_pt2 < 0.01 || reweighting_89_gen3_pt2 < 0.01): 16,211
    After applying ISO skim cut: 8,192 (-2,005.3999999999996, -24.5%)
    After applying 1OS skim cut: 450 (-31, -6.9%)
    After applying 2OS skim cut: 432 (-157, -36.3%)
    After applying  DD skim cut: 2,178 (+114, +5.2%)
yipengsun commented 2 years ago

For Wrong-sign Mu, with Intergral():

Working on Dst wrong-sign Mu...
Cuts we are about to apply:
    isData > 0 && DstIDprod > 0 && IDprod < 0 && muPID > 0 && m_nu1 >= -2.0 && m_nu1 <= 10.9 && El >= 0.1e3 && El <= 2.65e3 && q2 >= -0.4e6 && q2 <= 12.6e6 && L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)) && !muVeto && DLLe < 1.0 && BDTmu > 0.25 && mu_P > 3.0e3 && mu_P < 100.0e3 && mu_ETA > 1.7 && mu_ETA < 5.0 && dxy < 7.0 && Y_M < 5280.0 && abs(Dst_M-D0_M-145.454) < 2.0 && !(reweighting_69_gen3_pt2 < 0.01 || reweighting_89_gen3_pt2 < 0.01)
The reference templates have the following entries:
    ISO: 3,783.8578
    1OS: 472.144
    2OS: 355.61965
     DD: 2,325.2639
Before applying any cut: 6,452,737
After applying isData > 0 && DstIDprod > 0 && IDprod < 0 && muPID > 0 && m_nu1 >= -2.0 && m_nu1 <= 10.9 && El >= 0.1e3 && El <= 2.65e3 && q2 >= -0.4e6 && q2 <= 12.6e6: 57,133
    After applying ISO skim cut: 14,238 (+10,454.1422, +73.4%)
    After applying 1OS skim cut: 1,850 (+1,377.856, +74.5%)
    After applying 2OS skim cut: 2,222 (+1,866.38035, +84.0%)
    After applying  DD skim cut: 10,273 (+7,947.7361, +77.4%)
After applying L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)): 53,130
    After applying ISO skim cut: 13,283 (+9,499.1422, +71.5%)
    After applying 1OS skim cut: 1,723 (+1,250.856, +72.6%)
    After applying 2OS skim cut: 2,072 (+1,716.38035, +82.8%)
    After applying  DD skim cut: 9,501 (+7,175.7361, +75.5%)
After applying !muVeto && DLLe < 1.0 && BDTmu > 0.25 && mu_P > 3.0e3 && mu_P < 100.0e3 && mu_ETA > 1.7 && mu_ETA < 5.0: 25,161
    After applying ISO skim cut: 6,682 (+2,898.1422, +43.4%)
    After applying 1OS skim cut: 828 (+355.856, +43.0%)
    After applying 2OS skim cut: 912 (+556.38035, +61.0%)
    After applying  DD skim cut: 4,207 (+1,881.7361, +44.7%)
After applying dxy < 7.0 && Y_M < 5280.0: 24,970
    After applying ISO skim cut: 6,618 (+2,834.1422, +42.8%)
    After applying 1OS skim cut: 821 (+348.856, +42.5%)
    After applying 2OS skim cut: 908 (+552.38035, +60.8%)
    After applying  DD skim cut: 4,182 (+1,856.7361, +44.4%)
After applying abs(Dst_M-D0_M-145.454) < 2.0: 15,786
    After applying ISO skim cut: 4,446 (+662.1421999999998, +14.9%)
    After applying 1OS skim cut: 531 (+58.855999999999995, +11.1%)
    After applying 2OS skim cut: 456 (+100.38035000000002, +22.0%)
    After applying  DD skim cut: 2,555 (+229.73610000000008, +9.0%)
After applying !(reweighting_69_gen3_pt2 < 0.01 || reweighting_89_gen3_pt2 < 0.01): 15,786
    After applying ISO skim cut: 4,446 (+662.1421999999998, +14.9%)
    After applying 1OS skim cut: 531 (+58.855999999999995, +11.1%)
    After applying 2OS skim cut: 456 (+100.38035000000002, +22.0%)
    After applying  DD skim cut: 2,555 (+229.73610000000008, +9.0%)

This looks like a sensible result: Our numbers are always larger than Phoebe's, because we are NOT removing misID contributions. Also it looks like the misID represents ~10%-20% of the comb. bkg.

manuelfs commented 2 years ago

According to Phoebe, the reason why the wrong-sign pi numbers are off is because "there is a scaling applied to the templates to match the normalization to the result of fitting the deltaM spectrum".

But she confirms that the wrong-sign samples have the same global cuts as the right-sign samples just with the sign changes. Since we already validated the global cuts with the right-sign samples, and the sign changes are clear from redoHistos, we consider the wrong-sign cuts validated.

Now we just need to implement them in our workflow.

yipengsun commented 2 years ago

@manuelfs I've compared the wrong-sign Mu and Pi between our DV ntuple and Phoebe's step-1.5 (looking at 2011 MD only), and the results are consistent.

I consider this validated. I've also updated the documentation at here.

If you are also happy with the doc, feel free to close this issue.

BTW, We can't do 2016 wrong-sign ntuples, yet, because the 2016 production was done w/ cutflow_data mode, which lacks the required wrong-sign trees. A reproduction of 2016 data is needed.

yipengsun commented 2 years ago

Also updated the YAML file for run 2 postprocessing. This is actually trivial, because the wrong-sign samples are in different trees so that cuts are TRULY identical to the data one (different trees have different sign requirements, so Phoebe's DstIDprod and IDprod cuts are already baked-in).

yipengsun commented 2 years ago

For the comparison with most loose cuts: I noticed that in OUR Run 1 ntuple, the Mu has a PID cut of mu_PIDmu > 2.0 at DaVinci level, so I have to apply this cut.

What get omitted is the mu_PIDe < 1.0 cut.

The raw numbers are compatible, but the intersection is notably smaller than either of the raw numbers:

For wrong-sign Mu:

# From Phoebe's step-1.5
uiddump -n gen/ref-rdx-ntuple-run1-data-Dst-comp/ntuple/Dst_data_2011_md_ws_Mu--22_01_17--mix--all--2011-2012--md-mu--phoebe.root -t tree
Num of events: 12884, Num of IDs: 12219, Num of UIDs: 11613
Num of duplicated IDs: 606, Num of duplicated events: 665, duplicate rate: 5.16%

# From our step-1
> uiddump -n gen/rdx-ntuple-run1-data-Dst-comp/ntuple/Dst_ws_Mu--22_01_17--std--data--2011--md.root -t tree
Num of events: 12497, Num of IDs: 11796, Num of UIDs: 11162
Num of duplicated IDs: 634, Num of duplicated events: 701, duplicate rate: 5.61%

# Intersection between the 2
> uidcommon -n gen/ref-rdx-ntuple-run1-data-Dst-comp/ntuple/Dst_data_2011_md_ws_Mu--22_01_13--mix--all--2011-2012--md-mu--phoebe.root -N gen/rdx-ntuple-run1-data-Dst-comp/ntuple/Dst_ws_Mu--22_01_13--std--data--2011--md.root -t tree -T tree
Total common IDs: 9338

For wrong-sign slow Pi:

# From Phoebe's step-1.5
> uiddump -n gen/ref-rdx-ntuple-run1-data-Dst-comp/ntuple/Dst_data_2011_md_ws_Pi--22_01_17--mix--all--2011-2012--md-mu--phoebe.root -t tree
Num of events: 20446, Num of IDs: 19409, Num of UIDs: 18436
Num of duplicated IDs: 973, Num of duplicated events: 1037, duplicate rate: 5.07%

# From our step-1
> uiddump -n gen/rdx-ntuple-run1-data-Dst-comp/ntuple/Dst_ws_Pi--22_01_17--std--data--2011--md.root -t tree
Num of events: 22308, Num of IDs: 21165, Num of UIDs: 20091
Num of duplicated IDs: 1074, Num of duplicated events: 1143, duplicate rate: 5.12%

# Intersection between the 2
> uidcommon -n gen/ref-rdx-ntuple-run1-data-Dst-comp/ntuple/Dst_data_2011_md_ws_Pi--22_01_17--mix--all--2011-2012--md-mu--phoebe.root -N gen/ref-rdx-ntuple-run1-data-Dst-comp/ntuple/Dst_data_2011_md_ws_Pi--22_01_17--mix--all--2011-2012--md-mu--phoebe.root -t tree -T tree
Total common IDs: 19409

I think a plausible explanation is: This is a combinatoric background, so DaVinci is reconstructing the events that satisfy our cuts with RANDOM particles (the ordering may have changed), so the agreement is not as well as the correct-sign tree.

@manuelfs

yipengsun commented 2 years ago

No, the RANDOMNESS doesn't really make sense, as we should be running over the same input files.

What could be the case is that different DaVinci version is looking over different subset of particles in the same event, thus some of the events are reco'ed as a comb. in one DaVinci version, but not the other.

yipengsun commented 2 years ago

I've updated the wrong-sign slow Pi number above. That number looks more consistent, so my hypothesis above that different DaVinci versions look at different subset of available particles doesn't hold.

I am now confused by this. The YAML is at: https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/master/postprocess/ref-rdx-run1/ref-rdx-run1-Dst.yml

The only different between the 2 wrong-sign trees in Phoebe's case is literally require different signs. In our case it's using 2 different input trees.

yipengsun commented 2 years ago

I reproduced the normal data (correct-sign) numbers, with the following caveats:

  1. The correct-sign is done with D0. Here the wrong-sign is with D*
  2. I applied the D0 mass window cut and B meson mass window cut for D0. I did NOT apply this cut for D*

Perhaps I should apply the mass window cuts for D* and try again. Although I think those tight, offline mass window cuts are NOT applied in either Phoebe's step-1.5 nor our step-1, so my previous comparisons are valid, but are NOT consistent w/ the D0 observations.

yipengsun commented 2 years ago

I decide to REMOVE the mass window cuts for D0, and try again:

# Phoebe's step-1.5
> uiddump -n gen/ref-rdx-ntuple-run1-data-D0-comp/ntuple/D0_data_2011_md--22_01_18--mix--all--2011-2012--md-mu--phoebe.root -t tree                        
Num of events: 549374, Num of IDs: 547649, Num of UIDs: 545929
Num of duplicated IDs: 1720, Num of duplicated events: 1725, duplicate rate: 0.31%

# Ours step-1
> uiddump -n gen/rdx-ntuple-run1-data-D0-comp/ntuple/D0--22_01_18--std--data--2011--md.root -t tree
Num of events: 546429, Num of IDs: 544678, Num of UIDs: 542933
Num of duplicated IDs: 1745, Num of duplicated events: 1751, duplicate rate: 0.32%

# Intersection
> uidcommon -n gen/rdx-ntuple-run1-data-D0-comp/ntuple/D0--22_01_18--std--data--2011--md.root -N gen/ref-rdx-ntuple-run1-data-D0-comp/ntuple/D0_data_2011_md--22_01_18--mix--all--2011-2012--md-mu--phoebe.root -t tree -T tree
Total common IDs: 543601

I'd say these numbers are still pretty compatible.

yipengsun commented 2 years ago

I'll look into the D* normal data first to see if the numbers are compatible with D* wrong-sign number. If the normal data is consistent, I suspect our DaVinci script may not be fully consistent w/ Phoebes.

yipengsun commented 2 years ago

@manuelfs I updated the D0 numbers, including wrong-sign samples at https://github.com/umd-lhcb/rdx-run2-analysis/blob/master/docs/cuts/cut_validation.md

The interesting thing is: If I don't apply the mu_pid_e < 1.0 cut, then there's some noticeable disagreement between Phoebe's and ours, in both correct- and wrong-sign samples.

Applying that cut, the difference is gone.

I checked both Phoebe's AddD0B_temp.C and OUR DaVinci Mu PID cut: https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/aad8528216a9b2a201082b4b5c3db8bdb94dda43/run1-rdx/reco_Dst_D0.py#L371

I believe my reduced PID cuts are consistent. This means that mu_pid_e or mu_is_mu has a significant change BETWEEN DaVinci versions!

yipengsun commented 2 years ago

Correction: I'll list the nominal and reduced Mu PID cuts:

yipengsun commented 2 years ago

I updated the documentation to make the distinction between Nominal PID and Reduced PID clear.

manuelfs commented 2 years ago

I'm still confused. You say that when you apply the mu_pid_e < 1.0 cut the difference is gone, but if mu_pid_e was different in different DV releases, the difference in yields would appear after applying that cut.

Also, your Reduced PID yields are smaller than the nominal, so it has to have more cuts.

yipengsun commented 2 years ago

I'm still confused. You say that when you apply the mu_pid_e < 1.0 cut the difference is gone, but if mu_pid_e was different in different DV releases, the difference in yields would appear after applying that cut.

Yeah you are right. Still thinking about why.

Also, your Reduced PID yields are smaller than the nominal, so it has to have more cuts.

I accidentally swapped between the two. Because for the study, the DEFAULT is the Reduced. I confused myself there. I'll update the doc.

yipengsun commented 2 years ago

I'm still confused. You say that when you apply the mu_pid_e < 1.0 cut the difference is gone, but if mu_pid_e was different in different DV releases, the difference in yields would appear after applying that cut.

The first thought that jumped into my mind is: There's some inconsistency between the cut I applied and Phoebe applied. But I checked those files already to make sure The Only Mu PID cut applied is the mu_pid_mu > 2.0 cut. I linked the files in a previous post, and the YAML file is also checked:

https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/aad8528216a9b2a201082b4b5c3db8bdb94dda43/postprocess/ref-rdx-run1/ref-rdx-run1-D0.yml#L250-L251

Well, could it be that mu_pid_mu already has a different meaning so at DaVinci step the output is already quite different? And we recovered the old number because the mu_pid_e is still effective in rejecting some of the Mu that leaked though the mu_pid_mu cut?

yipengsun commented 2 years ago

Aside from known differences in PID and ISO BDT, there's a difference in mass window.

For the RIGHT-SIGN 2011 MD samples, we have a wider mass window compared to Phoebe's. This is based on the step-1.5 line ntuples.

d0_m dst_m

The offline mass-window cuts for the D* trees are:

Bool_t FLAG_SEL_D0_MASS(Double_t d0_m, Double_t d0_m_ref = 1864.83) {
  return ABS(d0_m - d0_m_ref) < 23.4;
}

Bool_t FLAG_SEL_DST_MASS(Double_t dst_m, Double_t d0_m) {
  auto dst_ref_deltam = ABS(dst_m - d0_m - 145.454);
  return dst_ref_deltam < 2.0;
}
yipengsun commented 2 years ago

@manuelfs I've used finner binned and add vertical lines to indicate the approximate mass window boundaries (The boundaries are exact for D0, approximate for D*, because for D* the cuts is the difference between D* and D0 mass, which varies event-by-event)

d0_m dst_d0_delta_m dst_m

yipengsun commented 2 years ago

@manuelfs I've updated the doc https://github.com/umd-lhcb/rdx-run2-analysis/blob/master/docs/cuts/cut_validation.md

and summarized the latest numbers in a more consistent manner in this talk: https://github.com/yipengsun/talks/releases/download/0.31/220202.pdf

I think this really shows that the additional Mu PID cuts play an important role to make the final numbers consistent.

If you are happy about the result, feel free to close this issue.

manuelfs commented 2 years ago

Thank you Yipeng, the new tables are very nice, and the numbers do look consistent.

Let's close this after the group meeting on Wednesday if there are no further comments.

yipengsun commented 2 years ago

We see that for WS Pi, the chi2/dof distribution is indeed different between DV versions. We have improved vertexing so we are removing slightly more events.

dst_chi2ndof_ws_pi