Closed yipengsun closed 2 years ago
Note the definitions of wrong-sign-related variables in Phoebe's AddB.C
:
IDprod = (double)muplus_ID*D0_ID;
DstIDprod = (double)D0_ID*piminus_ID;
The h_comb
histogram are filled as:
if((DstIDprod > 0 && IDprod < 0 && ((muPID < 1.) || (muPID > 0. && totWeight == 1.)) && isData > 0. && TMath::Abs(totWeight) <= 1.) && flagDstSB==0.)
{
//if(muPID==1) totWeight*=(1.175-0.35*(Y_M-2116.)/(5280.-2116.));
//cout << totWeight << endl;
/*if(muPID==1)*/ totWeight=(1.29-Y_M*4.5e-5);
//if(muPID < 1) totWeight*=-1.*totWeight_jack/0.08;
double totWeightsmearpi=(1.29-Y_Msmearpi*4.5e-5);
double totWeightsmearpi_eCut=(1.29-Y_Msmearpi_eCut*4.5e-5);
double totWeightsmearpi0=(1.29-Y_Msmearpi0*4.5e-5);
double totWeightsmearpi0_eCut=(1.29-Y_Msmearpi0_eCut*4.5e-5);
double totWeightsmearK=(1.29-Y_MsmearK*4.5e-5);
double totWeightsmearK_eCut=(1.29-Y_MsmearK_eCut*4.5e-5);
double totWeightsmearK0=(1.29-Y_MsmearK0*4.5e-5);
double totWeightsmearK0_eCut=(1.29-Y_MsmearK0_eCut*4.5e-5);
double wp=1;
double wm=1;
//if(muPID < 1 && !TMath::IsNaN(totWeight_jack)) totWeight=-1.*totWeight_jack/0.08;
if(rwtcut && nomAcc) h_comb_rwt->Fill(rwt_x,rwt_y,rwt_z,totWeight);
//h_comb->Fill(m_nu1,El,q2,totWeight);h_DOCA_comb->Fill(DOCAVAR,totWeight);
if(muPID > 0)
{
h_comb_nosmear->Fill(m_nu1,El,q2,totWeight);
h_comb->Fill(m_nu1,El,q2,totWeight);
evtUsed=true;
}
else if (!TMath::IsNaN(totWeight_jack))
{
evtUsed=true;
h_comb->Fill(m_nu1,El,q2,-1*(1-piprob-kprob)*totWeight*totWeight_jack/0.08);
h_comb_nosmear->Fill(m_nu1,El,q2,-1*totWeight*totWeight_jack/0.08);
if(use_uBDT)
{
h_comb->Fill(m_nu1smearpi,Elsmearpi,q2smearpi,-1*piprob*totWeightsmearpi*totWeight_jack/0.08);
h_comb->Fill(m_nu1smearK,ElsmearK,q2smearK,-1*kprob*totWeightsmearK*totWeight_jack/0.08);
}
if(use_uBDTeCut)
{
h_comb->Fill(m_nu1smearpi_eCut,Elsmearpi_eCut,q2smearpi_eCut,-1*piprob*totWeightsmearpi_eCut*totWeight_jack/0.08);
h_comb->Fill(m_nu1smearK_eCut,ElsmearK_eCut,q2smearK_eCut,-1*kprob*totWeightsmearK_eCut*totWeight_jack/0.08);
}
else if(use_notuBDT)
{
h_comb->Fill(m_nu1smearpi0,Elsmearpi0,q2smearpi0,-1*piprob*totWeightsmearpi0*totWeight_jack/0.08);
h_comb->Fill(m_nu1smearK0,ElsmearK0,q2smearK0,-1*kprob*totWeightsmearK0*totWeight_jack/0.08);
}
else if(use_notuBDTeCut)
{
h_comb->Fill(m_nu1smearpi0_eCut,Elsmearpi0_eCut,q2smearpi0_eCut,-1*piprob*totWeightsmearpi0_eCut*totWeight_jack/0.08);
h_comb->Fill(m_nu1smearK0_eCut,ElsmearK0_eCut,q2smearK0_eCut,-1*kprob*totWeightsmearK0_eCut*totWeight_jack/0.08);
}
else
{
h_comb->Fill(m_nu1smearpi,Elsmearpi,q2smearpi,-1*0.5*piprob*totWeightsmearpi*totWeight_jack/0.08);
h_comb->Fill(m_nu1smearpi0,Elsmearpi0,q2smearpi0,-1*0.5*piprob*totWeightsmearpi0*totWeight_jack/0.08);
h_comb->Fill(m_nu1smearK0,ElsmearK0,q2smearK0,-1*0.5*kprob*totWeightsmearK0*totWeight_jack/0.08);
h_comb->Fill(m_nu1smearK,ElsmearK,q2smearK,-1*0.5*kprob*totWeightsmearK*totWeight_jack/0.08);
}
}
Note the following:
The h_data
global cuts:
if((DstIDprod > 0 && IDprod > 0 && muPID > 0. && flagDstSB==0.) && isData > 0. /*&& flagGhost < 1. && DstOk > 0.*/)
The h_comb
global cuts:
if((DstIDprod > 0 && IDprod < 0 && ((muPID < 1.) || (muPID > 0. && totWeight == 1.)) && isData > 0. && TMath::Abs(totWeight) <= 1.) && flagDstSB==0.)
Also note that I haven't found any evidence that SKIM CUTS are different between data and comb. bkg.
I have the following questions:
According to ANA, sec. 86, p. 82, Phoebe chose to REMOVE contributions from misID Mu
in the comb. bkg. But looking at the code above, it seems that she is ADDING contributions?
Well, the weights ARE NEGATIVE!
If we disregard the totWeight_jack
related things, then h_comb
and h_data
indeed have very similar, if not identical, cuts. But by applying only these cuts + fit variable range cuts, I'm already having fewer events than reported:
After applying isData > 0 && DstIDprod > 0 && IDprod < 0 && muPID > 0 && m_nu1 >= -2.0 && m_nu1 <= 10.9 && El >= 0.1e3 && El <= 2.65e3 && q2 >= -0.4e6 && q2 <= 12.6e6: 57,133
After applying ISO skim cut: 14,238 (+3,019, +21.2%)
After applying 1OS skim cut: 1,850 (-7,883, -426.1%)
After applying 2OS skim cut: 2,222 (-7,452, -335.4%)
After applying DD skim cut: 10,273 (-329, -3.2%)
The reference numbers are:
DST_COMB_REF_NUMS = {
'ISO': 11219,
'1OS': 9733,
'2OS': 9674,
'DD': 10602,
}
These are extracted from h_comb
templates for D*
. Am I using the right one? It's suspicious that ISO
and 2OS
numbers are very similar.
I'm trying to check this from a different perspective: Recreating ANA, fig. 8:
Note that I'm using Phoebe's latest step-1.5 ntuples (both D0
and D*
), which contains ALL data and comb. bkg.
I'm applying ISO skim cuts, on top of all global offiline cuts (minus B
mass requirement).
More explanations:
ISO & global
, nothing else"all" & B normal mass cut (< 5200 for D0, < 5280 for D*) & fit variables in our normal range
"all" & B sideband mass cut (> 5400 for both)
, NO fit variable requirement
Here's the inputs that I used:
@Svende after more thoughts, I think I have a better understanding on the IDprod
and DstIDprod
, tabulated below:
- | DstIDprod > 0 |
DstIDprod < 0 |
---|---|---|
IDprod > 0 |
normal data | wrong-sign slow Pi |
IDprod < 0 |
wrong-sign Mu |
forbidden at DaVinci level |
Yes, that makes sense. I would rather call the normal data OS and instead of wrong-sign SS, whatever is more intuitive. I would think of IDprod as a variable to select B-combinatorial as WS mu (SS mu) and DstIDprod to select for D* combinatorial such as D0 pi- (SS pi) or what you call WS slow pion above.
Trying to work on the wrong-sign Pi
sample.Extracted references numbers from the h_doug
histos from Phoebe's latest public templates.Still missing a lot of events, using GetEntries()
:
Working on Dst wrong-sign slow Pi...
Cuts we are about to apply:
isData > 0 && DstIDprod < 0 && IDprod > 0 && muPID > 0 && m_nu1 >= -2.0 && m_nu1 <= 10.9 && El >= 0.1e3 && El <= 2.65e3 && q2 >= -0.4e6 && q2 <= 12.6e6 && L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)) && !muVeto && DLLe < 1.0 && BDTmu > 0.25 && mu_P > 3.0e3 && mu_P < 100.0e3 && mu_ETA > 1.7 && mu_ETA < 5.0 && dxy < 7.0 && Y_M < 5280.0 && abs(Dst_M-D0_M-145.454) < 2.0 && !(reweighting_69_gen3_pt2 < 0.01 || reweighting_89_gen3_pt2 < 0.01)
The reference templates have the following entries:
ISO: 17,997
1OS: 9,972
2OS: 9,994
DD: 11,843
Before applying any cut: 6,452,737
After applying isData > 0 && DstIDprod < 0 && IDprod > 0 && muPID > 0 && m_nu1 >= -2.0 && m_nu1 <= 10.9 && El >= 0.1e3 && El <= 2.65e3 && q2 >= -0.4e6 && q2 <= 12.6e6: 119,355
After applying ISO skim cut: 57,321 (+39,324, +68.6%)
After applying 1OS skim cut: 3,430 (-6,542, -190.7%)
After applying 2OS skim cut: 3,501 (-6,493, -185.5%)
After applying DD skim cut: 16,809 (+4,966, +29.5%)
After applying L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)): 110,215
After applying ISO skim cut: 53,248 (+35,251, +66.2%)
After applying 1OS skim cut: 3,100 (-6,872, -221.7%)
After applying 2OS skim cut: 3,200 (-6,794, -212.3%)
After applying DD skim cut: 15,387 (+3,544, +23.0%)
After applying !muVeto && DLLe < 1.0 && BDTmu > 0.25 && mu_P > 3.0e3 && mu_P < 100.0e3 && mu_ETA > 1.7 && mu_ETA < 5.0: 91,920
After applying ISO skim cut: 47,019 (+29,022, +61.7%)
After applying 1OS skim cut: 2,663 (-7,309, -274.5%)
After applying 2OS skim cut: 2,398 (-7,596, -316.8%)
After applying DD skim cut: 11,878 (+35, +0.3%)
After applying dxy < 7.0 && Y_M < 5280.0: 91,736
After applying ISO skim cut: 46,963 (+28,966, +61.7%)
After applying 1OS skim cut: 2,658 (-7,314, -275.2%)
After applying 2OS skim cut: 2,395 (-7,599, -317.3%)
After applying DD skim cut: 11,838 (-5, -0.0%)
After applying abs(Dst_M-D0_M-145.454) < 2.0: 16,211
After applying ISO skim cut: 8,192 (-9,805, -119.7%)
After applying 1OS skim cut: 450 (-9,522, -2116.0%)
After applying 2OS skim cut: 432 (-9,562, -2213.4%)
After applying DD skim cut: 2,178 (-9,665, -443.8%)
After applying !(reweighting_69_gen3_pt2 < 0.01 || reweighting_89_gen3_pt2 < 0.01): 16,211
After applying ISO skim cut: 8,192 (-9,805, -119.7%)
After applying 1OS skim cut: 450 (-9,522, -2116.0%)
After applying 2OS skim cut: 432 (-9,562, -2213.4%)
After applying DD skim cut: 2,178 (-9,665, -443.8%)
Perhaps Phoebe also removed the misID contributions from the h_doug
templates? I'll try GetIntegral
and see if that works.
@manuelfs When I do ComputetIntegral
, I get the following output:
// _file0 is the ISO D* sample
root [1] auto histo = (TH1*)_file0->Get("h_doug")
// GetEntries is fine
root [3] histo->GetEntries()
(double) 17997.000
// Looks like the histogram is already normalized
root [7] histo->ComputeIntegral()
(double) 1.0000000
I think Phoebe has normalized the histograms, so we can't know the effective number (w/ misID contributions removed) from the histo directly?
The resulting integral is normalized to 1 If the routine is called with the onlyPositive flag set an error will be produced in case of negative bin content and a NaN value returned
Don't know why in this case the result is 1.0
, as the onlyPositive
defaults to false
.
The return is defined as:
return fIntegral[nbins];
So the return value should be the total integral! I don't know why here's it's 1 but the Integral
method returns a sensible number (which calls DoIntegral
internally).
Note to myself: GetIntegral
computes the integral of each bin and returns the pointer of that array.
Oh, but the histo->Integral()
gives a meaningful number: 10197.4
, but that is larger than my final number 8192
. Maybe I'm missing cuts? It's the other way around, maybe I'm already applying too much cuts?
Switching to use Integral()
numbers as reference, for Wrong-sign slow Pi:
Working on Dst wrong-sign slow Pi...
Cuts we are about to apply:
isData > 0 && DstIDprod < 0 && IDprod > 0 && muPID > 0 && m_nu1 >= -2.0 && m_nu1 <= 10.9 && El >= 0.1e3 && El <= 2.65e3 && q2 >= -0.4e6 && q2 <= 12.6e6 && L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)) && !muVeto && DLLe < 1.0 && BDTmu > 0.25 && mu_P > 3.0e3 && mu_P < 100.0e3 && mu_ETA > 1.7 && mu_ETA < 5.0 && dxy < 7.0 && Y_M < 5280.0 && abs(Dst_M-D0_M-145.454) < 2.0 && !(reweighting_69_gen3_pt2 < 0.01 || reweighting_89_gen3_pt2 < 0.01)
The reference templates have the following entries:
ISO: 10,197.4
1OS: 481
2OS: 589
DD: 2,064
Before applying any cut: 6,452,737
After applying isData > 0 && DstIDprod < 0 && IDprod > 0 && muPID > 0 && m_nu1 >= -2.0 && m_nu1 <= 10.9 && El >= 0.1e3 && El <= 2.65e3 && q2 >= -0.4e6 && q2 <= 12.6e6: 119,355
After applying ISO skim cut: 57,321 (+47,123.6, +82.2%)
After applying 1OS skim cut: 3,430 (+2,949, +86.0%)
After applying 2OS skim cut: 3,501 (+2,912, +83.2%)
After applying DD skim cut: 16,809 (+14,745, +87.7%)
After applying L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)): 110,215
After applying ISO skim cut: 53,248 (+43,050.6, +80.8%)
After applying 1OS skim cut: 3,100 (+2,619, +84.5%)
After applying 2OS skim cut: 3,200 (+2,611, +81.6%)
After applying DD skim cut: 15,387 (+13,323, +86.6%)
After applying !muVeto && DLLe < 1.0 && BDTmu > 0.25 && mu_P > 3.0e3 && mu_P < 100.0e3 && mu_ETA > 1.7 && mu_ETA < 5.0: 91,920
After applying ISO skim cut: 47,019 (+36,821.6, +78.3%)
After applying 1OS skim cut: 2,663 (+2,182, +81.9%)
After applying 2OS skim cut: 2,398 (+1,809, +75.4%)
After applying DD skim cut: 11,878 (+9,814, +82.6%)
After applying dxy < 7.0 && Y_M < 5280.0: 91,736
After applying ISO skim cut: 46,963 (+36,765.6, +78.3%)
After applying 1OS skim cut: 2,658 (+2,177, +81.9%)
After applying 2OS skim cut: 2,395 (+1,806, +75.4%)
After applying DD skim cut: 11,838 (+9,774, +82.6%)
After applying abs(Dst_M-D0_M-145.454) < 2.0: 16,211
After applying ISO skim cut: 8,192 (-2,005.3999999999996, -24.5%)
After applying 1OS skim cut: 450 (-31, -6.9%)
After applying 2OS skim cut: 432 (-157, -36.3%)
After applying DD skim cut: 2,178 (+114, +5.2%)
After applying !(reweighting_69_gen3_pt2 < 0.01 || reweighting_89_gen3_pt2 < 0.01): 16,211
After applying ISO skim cut: 8,192 (-2,005.3999999999996, -24.5%)
After applying 1OS skim cut: 450 (-31, -6.9%)
After applying 2OS skim cut: 432 (-157, -36.3%)
After applying DD skim cut: 2,178 (+114, +5.2%)
For Wrong-sign Mu, with Intergral()
:
Working on Dst wrong-sign Mu...
Cuts we are about to apply:
isData > 0 && DstIDprod > 0 && IDprod < 0 && muPID > 0 && m_nu1 >= -2.0 && m_nu1 <= 10.9 && El >= 0.1e3 && El <= 2.65e3 && q2 >= -0.4e6 && q2 <= 12.6e6 && L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)) && !muVeto && DLLe < 1.0 && BDTmu > 0.25 && mu_P > 3.0e3 && mu_P < 100.0e3 && mu_ETA > 1.7 && mu_ETA < 5.0 && dxy < 7.0 && Y_M < 5280.0 && abs(Dst_M-D0_M-145.454) < 2.0 && !(reweighting_69_gen3_pt2 < 0.01 || reweighting_89_gen3_pt2 < 0.01)
The reference templates have the following entries:
ISO: 3,783.8578
1OS: 472.144
2OS: 355.61965
DD: 2,325.2639
Before applying any cut: 6,452,737
After applying isData > 0 && DstIDprod > 0 && IDprod < 0 && muPID > 0 && m_nu1 >= -2.0 && m_nu1 <= 10.9 && El >= 0.1e3 && El <= 2.65e3 && q2 >= -0.4e6 && q2 <= 12.6e6: 57,133
After applying ISO skim cut: 14,238 (+10,454.1422, +73.4%)
After applying 1OS skim cut: 1,850 (+1,377.856, +74.5%)
After applying 2OS skim cut: 2,222 (+1,866.38035, +84.0%)
After applying DD skim cut: 10,273 (+7,947.7361, +77.4%)
After applying L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)): 53,130
After applying ISO skim cut: 13,283 (+9,499.1422, +71.5%)
After applying 1OS skim cut: 1,723 (+1,250.856, +72.6%)
After applying 2OS skim cut: 2,072 (+1,716.38035, +82.8%)
After applying DD skim cut: 9,501 (+7,175.7361, +75.5%)
After applying !muVeto && DLLe < 1.0 && BDTmu > 0.25 && mu_P > 3.0e3 && mu_P < 100.0e3 && mu_ETA > 1.7 && mu_ETA < 5.0: 25,161
After applying ISO skim cut: 6,682 (+2,898.1422, +43.4%)
After applying 1OS skim cut: 828 (+355.856, +43.0%)
After applying 2OS skim cut: 912 (+556.38035, +61.0%)
After applying DD skim cut: 4,207 (+1,881.7361, +44.7%)
After applying dxy < 7.0 && Y_M < 5280.0: 24,970
After applying ISO skim cut: 6,618 (+2,834.1422, +42.8%)
After applying 1OS skim cut: 821 (+348.856, +42.5%)
After applying 2OS skim cut: 908 (+552.38035, +60.8%)
After applying DD skim cut: 4,182 (+1,856.7361, +44.4%)
After applying abs(Dst_M-D0_M-145.454) < 2.0: 15,786
After applying ISO skim cut: 4,446 (+662.1421999999998, +14.9%)
After applying 1OS skim cut: 531 (+58.855999999999995, +11.1%)
After applying 2OS skim cut: 456 (+100.38035000000002, +22.0%)
After applying DD skim cut: 2,555 (+229.73610000000008, +9.0%)
After applying !(reweighting_69_gen3_pt2 < 0.01 || reweighting_89_gen3_pt2 < 0.01): 15,786
After applying ISO skim cut: 4,446 (+662.1421999999998, +14.9%)
After applying 1OS skim cut: 531 (+58.855999999999995, +11.1%)
After applying 2OS skim cut: 456 (+100.38035000000002, +22.0%)
After applying DD skim cut: 2,555 (+229.73610000000008, +9.0%)
This looks like a sensible result: Our numbers are always larger than Phoebe's, because we are NOT removing misID contributions. Also it looks like the misID represents ~10%-20% of the comb. bkg.
According to Phoebe, the reason why the wrong-sign pi numbers are off is because "there is a scaling applied to the templates to match the normalization to the result of fitting the deltaM spectrum".
But she confirms that the wrong-sign samples have the same global cuts as the right-sign samples just with the sign changes. Since we already validated the global cuts with the right-sign samples, and the sign changes are clear from redoHistos
, we consider the wrong-sign cuts validated.
Now we just need to implement them in our workflow.
@manuelfs I've compared the wrong-sign Mu
and Pi
between our DV ntuple and Phoebe's step-1.5 (looking at 2011 MD only), and the results are consistent.
I consider this validated. I've also updated the documentation at here.
If you are also happy with the doc, feel free to close this issue.
BTW, We can't do 2016 wrong-sign ntuples, yet, because the 2016 production was done w/ cutflow_data
mode, which lacks the required wrong-sign trees. A reproduction of 2016 data is needed.
Also updated the YAML file for run 2 postprocessing. This is actually trivial, because the wrong-sign samples are in different trees so that cuts are TRULY identical to the data one (different trees have different sign requirements, so Phoebe's DstIDprod
and IDprod
cuts are already baked-in).
For the comparison with most loose cuts: I noticed that in OUR Run 1 ntuple, the Mu
has a PID cut of mu_PIDmu > 2.0
at DaVinci level, so I have to apply this cut.
What get omitted is the mu_PIDe < 1.0
cut.
The raw numbers are compatible, but the intersection is notably smaller than either of the raw numbers:
For wrong-sign Mu
:
# From Phoebe's step-1.5
uiddump -n gen/ref-rdx-ntuple-run1-data-Dst-comp/ntuple/Dst_data_2011_md_ws_Mu--22_01_17--mix--all--2011-2012--md-mu--phoebe.root -t tree
Num of events: 12884, Num of IDs: 12219, Num of UIDs: 11613
Num of duplicated IDs: 606, Num of duplicated events: 665, duplicate rate: 5.16%
# From our step-1
> uiddump -n gen/rdx-ntuple-run1-data-Dst-comp/ntuple/Dst_ws_Mu--22_01_17--std--data--2011--md.root -t tree
Num of events: 12497, Num of IDs: 11796, Num of UIDs: 11162
Num of duplicated IDs: 634, Num of duplicated events: 701, duplicate rate: 5.61%
# Intersection between the 2
> uidcommon -n gen/ref-rdx-ntuple-run1-data-Dst-comp/ntuple/Dst_data_2011_md_ws_Mu--22_01_13--mix--all--2011-2012--md-mu--phoebe.root -N gen/rdx-ntuple-run1-data-Dst-comp/ntuple/Dst_ws_Mu--22_01_13--std--data--2011--md.root -t tree -T tree
Total common IDs: 9338
For wrong-sign slow Pi
:
# From Phoebe's step-1.5
> uiddump -n gen/ref-rdx-ntuple-run1-data-Dst-comp/ntuple/Dst_data_2011_md_ws_Pi--22_01_17--mix--all--2011-2012--md-mu--phoebe.root -t tree
Num of events: 20446, Num of IDs: 19409, Num of UIDs: 18436
Num of duplicated IDs: 973, Num of duplicated events: 1037, duplicate rate: 5.07%
# From our step-1
> uiddump -n gen/rdx-ntuple-run1-data-Dst-comp/ntuple/Dst_ws_Pi--22_01_17--std--data--2011--md.root -t tree
Num of events: 22308, Num of IDs: 21165, Num of UIDs: 20091
Num of duplicated IDs: 1074, Num of duplicated events: 1143, duplicate rate: 5.12%
# Intersection between the 2
> uidcommon -n gen/ref-rdx-ntuple-run1-data-Dst-comp/ntuple/Dst_data_2011_md_ws_Pi--22_01_17--mix--all--2011-2012--md-mu--phoebe.root -N gen/ref-rdx-ntuple-run1-data-Dst-comp/ntuple/Dst_data_2011_md_ws_Pi--22_01_17--mix--all--2011-2012--md-mu--phoebe.root -t tree -T tree
Total common IDs: 19409
I think a plausible explanation is: This is a combinatoric background, so DaVinci is reconstructing the events that satisfy our cuts with RANDOM particles (the ordering may have changed), so the agreement is not as well as the correct-sign tree.
@manuelfs
No, the RANDOMNESS doesn't really make sense, as we should be running over the same input files.
What could be the case is that different DaVinci version is looking over different subset of particles in the same event, thus some of the events are reco'ed as a comb. in one DaVinci version, but not the other.
I've updated the wrong-sign slow Pi
number above. That number looks more consistent, so my hypothesis above that different DaVinci versions look at different subset of available particles doesn't hold.
I am now confused by this. The YAML is at: https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/master/postprocess/ref-rdx-run1/ref-rdx-run1-Dst.yml
The only different between the 2 wrong-sign trees in Phoebe's case is literally require different signs. In our case it's using 2 different input trees.
I reproduced the normal data (correct-sign) numbers, with the following caveats:
D0
. Here the wrong-sign is with D*
D0
mass window cut and B
meson mass window cut for D0
. I did NOT apply this cut for D*
Perhaps I should apply the mass window cuts for D*
and try again. Although I think those tight, offline mass window cuts are NOT applied in either Phoebe's step-1.5 nor our step-1, so my previous comparisons are valid, but are NOT consistent w/ the D0
observations.
I decide to REMOVE the mass window cuts for D0
, and try again:
# Phoebe's step-1.5
> uiddump -n gen/ref-rdx-ntuple-run1-data-D0-comp/ntuple/D0_data_2011_md--22_01_18--mix--all--2011-2012--md-mu--phoebe.root -t tree
Num of events: 549374, Num of IDs: 547649, Num of UIDs: 545929
Num of duplicated IDs: 1720, Num of duplicated events: 1725, duplicate rate: 0.31%
# Ours step-1
> uiddump -n gen/rdx-ntuple-run1-data-D0-comp/ntuple/D0--22_01_18--std--data--2011--md.root -t tree
Num of events: 546429, Num of IDs: 544678, Num of UIDs: 542933
Num of duplicated IDs: 1745, Num of duplicated events: 1751, duplicate rate: 0.32%
# Intersection
> uidcommon -n gen/rdx-ntuple-run1-data-D0-comp/ntuple/D0--22_01_18--std--data--2011--md.root -N gen/ref-rdx-ntuple-run1-data-D0-comp/ntuple/D0_data_2011_md--22_01_18--mix--all--2011-2012--md-mu--phoebe.root -t tree -T tree
Total common IDs: 543601
I'd say these numbers are still pretty compatible.
I'll look into the D*
normal data first to see if the numbers are compatible with D*
wrong-sign number. If the normal data is consistent, I suspect our DaVinci script may not be fully consistent w/ Phoebes.
@manuelfs I updated the D0
numbers, including wrong-sign samples at https://github.com/umd-lhcb/rdx-run2-analysis/blob/master/docs/cuts/cut_validation.md
The interesting thing is: If I don't apply the mu_pid_e < 1.0
cut, then there's some noticeable disagreement between Phoebe's and ours, in both correct- and wrong-sign samples.
Applying that cut, the difference is gone.
I checked both Phoebe's AddD0B_temp.C
and OUR DaVinci Mu
PID cut:
https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/aad8528216a9b2a201082b4b5c3db8bdb94dda43/run1-rdx/reco_Dst_D0.py#L371
I believe my reduced PID cuts are consistent. This means that mu_pid_e
or mu_is_mu
has a significant change BETWEEN DaVinci versions!
Correction: I'll list the nominal and reduced Mu
PID cuts:
mu_is_mu && mu_pid_mu > 2.0 && mu_pid_e < 1.0
mu_pid_mu > 2.0
I updated the documentation to make the distinction between Nominal PID and Reduced PID clear.
I'm still confused. You say that when you apply the mu_pid_e < 1.0
cut the difference is gone, but if mu_pid_e
was different in different DV releases, the difference in yields would appear after applying that cut.
Also, your Reduced PID yields are smaller than the nominal, so it has to have more cuts.
I'm still confused. You say that when you apply the
mu_pid_e < 1.0
cut the difference is gone, but ifmu_pid_e
was different in different DV releases, the difference in yields would appear after applying that cut.
Yeah you are right. Still thinking about why.
Also, your Reduced PID yields are smaller than the nominal, so it has to have more cuts.
I accidentally swapped between the two. Because for the study, the DEFAULT is the Reduced. I confused myself there. I'll update the doc.
I'm still confused. You say that when you apply the
mu_pid_e < 1.0
cut the difference is gone, but ifmu_pid_e
was different in different DV releases, the difference in yields would appear after applying that cut.
The first thought that jumped into my mind is: There's some inconsistency between the cut I applied and Phoebe applied. But I checked those files already to make sure The Only Mu
PID cut applied is the mu_pid_mu > 2.0
cut. I linked the files in a previous post, and the YAML file is also checked:
Well, could it be that mu_pid_mu
already has a different meaning so at DaVinci step the output is already quite different? And we recovered the old number because the mu_pid_e
is still effective in rejecting some of the Mu
that leaked though the mu_pid_mu
cut?
Aside from known differences in PID and ISO BDT, there's a difference in mass window.
For the RIGHT-SIGN 2011 MD samples, we have a wider mass window compared to Phoebe's. This is based on the step-1.5 line ntuples.
The offline mass-window cuts for the D*
trees are:
Bool_t FLAG_SEL_D0_MASS(Double_t d0_m, Double_t d0_m_ref = 1864.83) {
return ABS(d0_m - d0_m_ref) < 23.4;
}
Bool_t FLAG_SEL_DST_MASS(Double_t dst_m, Double_t d0_m) {
auto dst_ref_deltam = ABS(dst_m - d0_m - 145.454);
return dst_ref_deltam < 2.0;
}
@manuelfs I've used finner binned and add vertical lines to indicate the approximate mass window boundaries (The boundaries are exact for D0
, approximate for D*
, because for D*
the cuts is the difference between D*
and D0
mass, which varies event-by-event)
@manuelfs I've updated the doc https://github.com/umd-lhcb/rdx-run2-analysis/blob/master/docs/cuts/cut_validation.md
and summarized the latest numbers in a more consistent manner in this talk: https://github.com/yipengsun/talks/releases/download/0.31/220202.pdf
I think this really shows that the additional Mu
PID cuts play an important role to make the final numbers consistent.
If you are happy about the result, feel free to close this issue.
Thank you Yipeng, the new tables are very nice, and the numbers do look consistent.
Let's close this after the group meeting on Wednesday if there are no further comments.
We see that for WS Pi
, the chi2/dof distribution is indeed different between DV versions. We have improved vertexing so we are removing slightly more events.
This is more complicated than I thought, so I decide to start with a new issue. I'll use the top post to track the main progress