Closed yipengsun closed 2 years ago
Here's all branches in Phoebe's step-2 ntuple:
ntp1:
AntiISOnum: int32_t
BDTmu: float
B_MPTb: double
B_TRUEP: double
B_TRUEPT: double
B_TRUEP_Z: double
B_XY_ERR: double
Btype: int32_t
Chi: double
Chi2: double
CosFlightReco: double
D0IP: double
D0IPCHI2: double
D0_DIRA_OWNPV: double
D0_FD: double
D0_M: double
D0_P: double
D0_PT: double
DLLe: double
DLLmu: double
DMUDIRA: double
DeltaChi2: double
DstIDprod: double
DstOk: double
Dst_2010_minus_MC_MOTHER_ID: int32_t
Dst_2010_minus_MC_MOTHER_KEY: int32_t
Dst_ENDVERTEX_CHI2: double
Dst_ID: int32_t
Dst_M: double
Dst_MC_MOTHER_ND: int32_t
Dst_P: double
Dst_PT: double
Dst_TRUEID: int32_t
Dst_mom: int32_t
Dst_mom_m: double
Dststtype: int32_t
El: double
El2: double
ElR: double
Elaltm: double
Elaltp: double
ElotherVtx: double
Elsmear: double
ElsmearG: double
ElsmearK: double
ElsmearK0: double
ElsmearK0_eCut: double
ElsmearK0_eSel: double
ElsmearK_eCut: double
ElsmearK_eSel: double
Elsmearpi: double
Elsmearpi0: double
Elsmearpi0_eCut: double
Elsmearpi0_eSel: double
Elsmearpi_eCut: double
Elsmearpi_eSel: double
Elt: double
Etaut: double
FFweight: double
FFweightALT: double
FFweightu1m: double
FFweightu1p: double
FFweightu2m: double
FFweightu2p: double
FFweightu3m: double
FFweightu3p: double
FFweightu4m: double
FFweightu4p: double
FFweightu5m: double
FFweightu5p: double
FFweightu6m: double
FFweightu6p: double
FFweightu7m: double
FFweightu7p: double
FFweightu8m: double
FFweightu8p: double
FFweightu9m: double
FFweightu9p: double
FFweightuAm: double
FFweightuAp: double
FFweightv1m: double
FFweightv1p: double
FFweightv2m: double
FFweightv2p: double
FFweightv3m: double
FFweightv3p: double
FFweightv4m: double
FFweightv4p: double
FFweightvv1m: double
FFweightvv1p: double
FFweightvv2m: double
FFweightvv2p: double
FFweightvv3m: double
FFweightvv3p: double
GhostProb: double
GsmearAngle: double
HADDEC: bool
HADTIS: bool
HLTTCK: UInt_t
Hlt1: bool
Hlt1K: bool
Hlt1TAL0K: bool
Hlt1TAL0pi: bool
Hlt1pi: bool
Hlt2: bool
IDprod: double
ISOnum: int32_t
JustDst: double
KPID: double
KPIDerror: double
KPIDweight: double
KPIDweight2: double
K_P: double
K_PT: double
Kplus_P: double
Kplus_rho: double
L0: bool
L0DUTCK: UInt_t
MHD: double
MUDEC: bool
MUTIS: bool
NNmu: double
NShared: int32_t
Nbody: int32_t
Oldpi2mu_nonuBDTeCut: double
Oldpi2mu_nonuBDTeSel: double
Oldpi2mu_uBDTeCut: double
Oldpi2mu_uBDTeSel: double
P1: double
PK2h: double
PV_XY_ERR: double
Pe2h: double
PestZ: double
Polarity: int16_t
Pp2h: double
Ppi2h: double
Pu2h: double
YTIS: bool
YTOS: bool
Y_BKGCAT: int32_t
Y_BKGCAT_OLD: int32_t
Y_DIRA_OWNPV: double
Y_DISCARDMu_CHI2: double
Y_ENDVERTEX_CHI2: double
Y_ETA: double
Y_FDCHI2_OWNPV: double
Y_FD_OWNPV: double
Y_IPCHI2_OWNPV: double
Y_IP_OWNPV: double
Y_M: double
Y_MMERR: double
Y_MsmearK: double
Y_MsmearK0: double
Y_MsmearK0_eCut: double
Y_MsmearK0_eSel: double
Y_MsmearK_eCut: double
Y_MsmearK_eSel: double
Y_Msmearpi: double
Y_Msmearpi0: double
Y_Msmearpi0_eCut: double
Y_Msmearpi0_eSel: double
Y_Msmearpi_eCut: double
Y_Msmearpi_eSel: double
Y_P: double
Y_PT: double
Y_SIGMA_IP: double
Y_myDOCA: double
Y_myDOCAchi2: double
altM: double
badSlowPi: double
badSlowPiTau: double
cmult100: int32_t
cmult60: int32_t
cpt60: double
dxy: double
dxy_err: double
e2mu: double
e2mu_nonuBDTeCut: double
e2mu_nonuBDTeSel: double
e2mu_uBDT: double
e2mu_uBDTeCut: double
e2mu_uBDTeSel: double
e2notmu: double
eratio_uBDT: double
eventNumber: ULong64_t
f_k: double
f_p: double
f_pi: double
flag2011: bool
flagBadMu: double
flagBadSoln: double
flagBmu: double
flagComb: double
flagD0mu: float
flagDoubleD: double
flagDstSB: double
flagGhost: double
flagTauonicD: double
flagtaumu: double
higherD0hel: double
iBin: int32_t
iBinK: int32_t
iBinpi: int32_t
isData: double
ishigher: bool
iso: double
iso2: double
iso_BDT: double
iso_BDT2: double
iso_BDT3: double
iso_CHARGE: float
iso_CHARGE2: float
iso_CHARGE3: float
iso_CHI2: double
iso_DeltaM: double
iso_NNk: float
iso_NNk2: float
iso_NNk3: float
iso_NNkw: double
iso_NNkw2: double
iso_NNkw3: double
iso_NNp: float
iso_NNp2: float
iso_NNp3: float
iso_P: float
iso_P2: float
iso_P3: float
iso_PE: float
iso_PE2: float
iso_PT: float
iso_PT2: float
iso_PT3: float
iso_Type: float
iso_Type2: float
iso_Type3: float
iso_clonevar: double
k2k: double
k2mu: double
k2mu_nonuBDTeCut: double
k2mu_nonuBDTeSel: double
k2mu_uBDT: double
k2mu_uBDTeCut: double
k2mu_uBDTeSel: double
k2notmu: double
k2pi: double
kWeight: double
kWeightErr: double
keepme: bool
logDOCA: double
mDD: double
mDDnew: double
mX_DD: double
mXnew_DD: double
m_corr: double
m_nu1: double
m_nu1altm: double
m_nu1altp: double
m_nu1otherVtx: double
m_nu1smear: double
m_nu1smearG: double
m_nu1smearK: double
m_nu1smearK0: double
m_nu1smearK0_eCut: double
m_nu1smearK0_eSel: double
m_nu1smearK_eCut: double
m_nu1smearK_eSel: double
m_nu1smearpi: double
m_nu1smearpi0: double
m_nu1smearpi0_eCut: double
m_nu1smearpi0_eSel: double
m_nu1smearpi_eCut: double
m_nu1smearpi_eSel: double
m_nu2: double
m_nuG: double
m_nuR: double
m_nuT: double
matchChi2: double
mcWeight: double
mm_DD: double
mm_mom: double
momWeight: double
muHAD: bool
muIP: double
muIPCHI2: double
muPID: float
muPIDerror: double
muPIDweight: double
muPIDweight_nonuBDT: double
muPIDweight_nonuBDTeCut: double
muPIDweight_nonuBDTeSel: double
muPIDweight_uBDT: double
muPIDweight_uBDTeCut: double
muPIDweight_uBDTeSel: double
muTOS: bool
muVeto: bool
mu_CosTheta: double
mu_ETA: double
mu_P: double
mu_PT: double
mu_PTb: double
mu_PTsmearK: double
mu_PTsmearK0: double
mu_PTsmearK0_eCut: double
mu_PTsmearK0_eSel: double
mu_PTsmearK_eCut: double
mu_PTsmearK_eSel: double
mu_PTsmearpi: double
mu_PTsmearpi0: double
mu_PTsmearpi0_eCut: double
mu_PTsmearpi0_eSel: double
mu_PTsmearpi_eCut: double
mu_PTsmearpi_eSel: double
mu_PsmearK: double
mu_PsmearK0: double
mu_PsmearK0_eCut: double
mu_PsmearK0_eSel: double
mu_PsmearK_eCut: double
mu_PsmearK_eSel: double
mu_Psmearpi: double
mu_Psmearpi0: double
mu_Psmearpi0_eCut: double
mu_Psmearpi0_eSel: double
mu_Psmearpi_eCut: double
mu_Psmearpi_eSel: double
mu_has: bool
mu_is: bool
mu_isT: bool
muplus_MC_MOTHER_ID: int32_t
muplus_MC_MOTHER_KEY: int32_t
muplus_MC_MOTHER_ND: int32_t
muplus_TRUEID: int32_t
muplus_rho: double
nISO: int32_t
nSPDhits: double
nTracks: double
noChi2: int32_t
noDChi2: int32_t
p2mu: double
p2mu_nonuBDTeCut: double
p2mu_nonuBDTeSel: double
p2mu_uBDT: double
p2mu_uBDTeCut: double
p2mu_uBDTeSel: double
p2notmu: double
p2p: double
pWeight: double
pWeightErr: double
pi2k: double
pi2mu: double
pi2mu_nonuBDTeCut: double
pi2mu_nonuBDTeSel: double
pi2mu_uBDT: double
pi2mu_uBDTeCut: double
pi2mu_uBDTeSel: double
pi2notmu: double
pi2pi: double
piPID: double
piPIDerror: double
piPIDweight: double
piPIDweight2: double
piWeight: double
piWeightErr: double
pi_P: double
pi_PT: double
piminus0_P: double
piminus_TRACK_Type: int32_t
piminus_rho: double
pislow_GhostProb: double
pislow_IP: double
pislow_IPCHI2: double
pislow_P: double
pislow_PT: double
pislow_ProbNNk: double
pislow_ismu: bool
pislow_muAcc: bool
pislow_rho: double
q2: double
q2R: double
q2altm: double
q2altp: double
q2otherVtx: double
q2smear: double
q2smearG: double
q2smearK: double
q2smearK0: double
q2smearK0_eCut: double
q2smearK0_eSel: double
q2smearK_eCut: double
q2smearK_eSel: double
q2smearpi: double
q2smearpi0: double
q2smearpi0_eCut: double
q2smearpi0_eSel: double
q2smearpi_eCut: double
q2smearpi_eSel: double
q2t: double
q2tD: double
reweighting_68: float
reweighting_69_gen2: float
reweighting_69_gen2_pt2: float
reweighting_69_pt2: float
reweighting_89: float
reweighting_89_gen2: float
reweighting_89_gen2_pt2: float
reweighting_89_pt2: float
reweighting_JpsiK09_v1: float
reweighting_JpsiK09_v2: float
runNumber: UInt_t
selcounter: UInt_t
simpleDstst: double
tantheta: double
tanthetaotherVtx: double
thetaD: double
thetaFlight: double
thetaFlightT: double
thetaL: double
totWeight: double
totWeight2: double
totWeight2_uBDT: double
totWeight_uBDT: double
transverseRes: double
transverseResDmu: double
u2mu: double
u2mu_nonuBDTeCut: double
u2mu_nonuBDTeSel: double
u2mu_uBDT: double
u2mu_uBDTeCut: double
u2mu_uBDTeSel: double
u2notmu: double
wCorr: double
wDkin: double
wSPD: double
wTRIG: double
weightD: double
weightTRKeff: double
weightnTRK: double
wt: double
Looks like somehow the global cuts for D*
are slightly tighter in our extracted cuts. I applied OUR Global cuts to Phoebe's step-2 ntuple, then just apply basic ISO cut, I get
> uiddump -n Dst_data--21_10_14--mix--all--2011-2012--md-mu--phoebe.root -t tree -c "iso_bdt1 < 0.15 & mu_ubdt > 0.25"
Num of events: 416353, Num of IDs: 416353, Num of UIDs: 416353
Num of duplicated IDs: 0, Num of duplicated events: 0, duplicate rate: 0.00%
where in the template in number of events should be 421224
Note that the ntuple is generated with:
make ref-rdx-ntuple-run1-data-Dst
And the global cuts for D0
are slightly looser:
> uiddump -n D0_data--21_10_14--mix--all--2011-2012--md-mu--phoebe.root -t tree -c is_iso
Num of events: 1770274, Num of IDs: 1769666, Num of UIDs: 1769058
Num of duplicated IDs: 608, Num of duplicated events: 608, duplicate rate: 0.03%
where in the template the number of events should be 1734133.
The 1OS and 2OS numbers a not very consistent. I think this warrants further investigation. My plan is: Try to run Phoebe's selection code with minimal modification and compare numbers (I already tried this yesterday afternoon, but was not successful. I just need to try harder).
I'm trying to estimate the Run 1/Run 2 template number of event numbers more precisely.
So, a better estimation would be:
(1.11+2.07) / (1.56/2*2*1.13) = 1.80
To arrive at the observed ~2.6 ratio, we need to have an efficiency of ~0.8 for run1/run2
, that is, run 1 cut is less efficient on run 2 data. Actually, from an even older cutflow study on real data, the efficiency is about 0.73, so this more or less adds up, and the efficiency from the cocktail study may be not very believable.
I'm not sure you got the right luminosity numbers: 1.11 for 2011 and 1.56 for 2016, where did you get those from? In the report I see 1.11 and 1.66 for delivered luminosity. In any case, what we really need is the QA-ed luminosity, which is a subset of the recorded luminosity, and I don't see that one in the report. Svende had found those numbers for Run 2 (from Dirac, I think)
As for the efficiency change, the 40% comes from the data cutflow we presented to the semileptonic group, which should include similar step 2 cuts?
@yipengsun Can you provide links in this issue to the table comparing yields and the code/original ntuples you used to generate them?
You mean the 2011 vs 2016 yield comparison? I think that is part of the rdx_cutflow
workflow:
@manuelfs I'll use this issue to discuss the preliminary results I found for various comparisons between ours and Phoebe's ntuples, and record the definite version of the study in https://github.com/umd-lhcb/rdx-run2-analysis/blob/master/docs/cuts/cut_validation.md.
Looking at Pheobe's and ours 2011 MD D*
ntuple:
# Phoebes
> uiddump -n Dst--20_09_16--std--data--2011--md--phoebe.root -t YCands/DecayTree
Num of events: 217936, Num of IDs: 208846, Num of UIDs: 200406
Num of duplicated IDs: 8440, Num of duplicated events: 9090, duplicate rate: 4.17%
# Us (produced in Oct 2011)
> uiddump -n Dst_D0--21_10_07--std--LHCb_Collision11_Beam3500GeV-VeloClosed-MagDown_Real_Data_Reco14_Stripping21r1_90000000_SEMILEPTONIC.DST.root -t TupleB0/DecayTree
Num of events: 229552, Num of IDs: 216987, Num of UIDs: 205508
Num of duplicated IDs: 11479, Num of duplicated events: 12565, duplicate rate: 5.47%
# Find comment candidates
> uidcommon -n Dst--20_09_16--std--data--2011--md--phoebe.root -t YCands/DecayTree -N ../../0.9.5-bugfix/Dst_D0-std/Dst_D0--21_10_07--std--LHCb_Collision11_Beam3500GeV-VeloClosed-MagDown_Real_Data_Reco14_Stripping21r1_90000000_SEMILEPTONIC.DST.root -T TupleB0/DecayTree
Total common IDs: 194325
We actually don't have Phoebe's 2011 D0
ntuple annexed. I'll go to Phoebe's CERN box and annex the ntuple.
Actually, I don't think Phoebe has put her 2011 D0
ntuple on her CERNbox (I don't see any shared folder from her on CERNbox). @manuelfs Can you annex the 2011 MD D0
ntuple from your external USB drive under the folder
ntuples/ref-rdx-run1/D0-std
Phoebe's latest step-2 ntuples also include fit templates, so I guess now we have a more consistent templates to compare to.
I'm not familiar w/ Phoebe's template naming convention, and I can't locate her D*
ISO template in the latest ntuple. I tried h_data
but that contains 500k entries and can't be just the ISO sample. I'll use our existing numbers instead.
Applying the same cuts on Phoebe's latest ntuples, I don't see any improvements in the data template size. It's likely that we are not applying the same cuts as Phoebe does.
From ntuple: Dst_data--21_10_21--mix--all--2011-2012--md-mu--phoebe.root
ISO: 416,353
1OS: 23,153
2OS: 8,681
DD: 30,357
From ntuple: D0_data--21_10_21--mix--all--2011-2012--md-mu--phoebe.root
ISO: 1,769,303
1OS: 224,390
2OS: 53,084
DD: 216,153
Also, Phoebe's current implementation of keeping only 1 candidate:
currTry=0;
do{
selcounter=0;
fChain->GetTree()->GetEntry(entry+ranking[currTry]);
currTry++;
if(debugSingle) cerr << eventNumber << '\t' << totCandidates << '\t' << currTry << endl;
where ranking
is sorted by the pseudo random sequence. I think there she just write the final candidates sorted, but she's keeping all final events.
OK, in her redo_Histo
, she requires single cand with the following:
singleCand=(AntiISOnum==0);
I removed some K, Pi
momentum cuts because Phoebe doesn't have them anymore, also included the DiscardMu_CHI2
back (not mentioned in the note), and tweaked the fit variable cut ranges.
Also disabled my single-candidate selection and fully uses Phoebe's
I still can't figure out the discrepancy. I'd like to ask Phoebe the following questions:
For single candidate selection, there are ISOnum == 0
for ISO skims and AntiISOnum == 0
all other skims. Does it mean that for DD, 1OS, 2OS samples, a single candidate may fulfill multiple templates? Or it is still functionally require a single candidate globally? (Here's Phoebe's usage of ISOnum
and AntiISOnum
)
Edit: Since DD, 1OS, 2OS are mutually exclusive, it should be that Phoebe's implementation is equivalent to globally keep only 1 candidate and the duplication of eventNumber+runNumber
should be very small, but that is not the case in her latest ntuples.
Some note: I checked Phoebe's merged ntuples, and apparently the duplication rate is non-negligible:
> uiddump -n Dst--21_10_21--mix--all--2011-2012--md-mu--phoebe.root -t ntp1
Num of events: 6452737, Num of IDs: 5826243, Num of UIDs: 5352881
Num of duplicated IDs: 473362, Num of duplicated events: 626494, duplicate rate: 9.71%
uiddump
check duplicated runNumber-eventNumber
combo
For the ISO, DD, 1OS, 2OS samples, for the real data, are they only differ by some isolation-related cuts? Can you check our skim cuts and see if they are consistent w/ yours?
Our skims cuts are defined at here, only the FLAG_*
functions are related here.
Note that the add_flags
is defined as mu_ubdt > 0.25 && ISOnum==0
for ISO skim, and mu_ubdt > 0.25 && AntiISOnum==0
for the rest skims.
If you really want to see where these are applied, you can take a look at our selection YAML, but this shouldn't be needed.
We are trying to implement global cuts + skim cuts, where the only differences between ISO, DD, 1OS, 2OS are just skim cuts. Do you have a similar global cuts in the latest ntuple you shared? Can you point out the lines that define such global cuts in your code (for the real data, not MC) so we can double-check ours?
I found a selection flag in Phoebe's code here but this is way too loose:
(selcounter & (4096 * 64 - 1)) == (4096 * 64 - 1)
If I just apply the cut above AND ISO cuts, I get ~700k candidates in the ISO sample alone.
This is the latest stats for D*
skims after applying all changes described above:
> print_skim_size.py gen/ref-rdx-ntuple-run1-data-Dst/ntuple/Dst_data--21_10_21--mix--all--2011-2012--md-mu--phoebe.root
From ntuple: gen/ref-rdx-ntuple-run1-data-Dst/ntuple/Dst_data--21_10_21--mix--all--2011-2012--md-mu--phoebe.root
ISO: 419,630 (421,224)
1OS: 23,187 (19,692)
2OS: 8,610 (8,403)
DD: 30,387 (30,948)
where numbers in ()
are found in Phoebe's fit template.
Recovered the TRIGGER_HLT1 && PT > 1700
cut from https://gitlab.cern.ch/bhamilto/rdvsrdst-histfactory/-/blob/master/proc/redoHistos_Dst.C#L1535-1570
Also some of the known global cuts: https://gitlab.cern.ch/bhamilto/rdvsrdst-histfactory/-/blob/master/proc/AddB.C#L2953-2974
Phoebe also mentioned that there's a D*
side-band cut: https://gitlab.cern.ch/bhamilto/rdvsrdst-histfactory/-/blob/master/proc/redoHistos_Dst.C#L2736-2741
Checking on Phoebe's latest templates from her gitlab repo:
The proctuples
folder is pinned to a specific commit. The last update was about 1 week ago.
proctuples/BCandHistos_Dst.root
: 420,646 [414,565]proctuples/1OS/BCandHistos_Dst.root
: 19,666 [22,926] proctuples/2OS/BCandHistos_Dst.root
: 8,389 [8,464]proctuples/DD/BCandHistos_Dst.root
: 30,918 [29,796]where numbers in []
are OUR numbers
To better understand the efficiency of each cut, I created a specialized cutflow script that apply cuts step-by-step, and for each step, I apply ISO and DD skim cuts so that we can see the impact of the cuts defined in each step.
The script is located at here. To use it:
# First get Phoebe's latest D* ntuple
git annex get ntuples/ref-rdx-run1/Dst-mix/Dst--21_10_21--mix--all--2011-2012--md-mu--phoebe.root
# Now go to the folder of the script
cd studies/cutflow-sync_with_phoebe
./cutflow-sync_with_phoebe.py
The output is the following:
Before applying any cut: 6,452,737
After applying isData && DstIDprod > 0 && IDprod > 0 && -2.0 <= m_nu1 && m_nu1 <= 10.9 && 0.0 <= GEV(El) && GEV(El) <= 2.65 && -0.4 <= GEV2(q2) && GEV2(q2) <= 12.6: 1,369,870
After applying ISO skim cut: 699,952
After applying DD skim cut: 119,502
After applying L0 && (YTIS || YTOS) && Hlt1 && Hlt2: 1,315,034
After applying ISO skim cut: 674,745
After applying DD skim cut: 113,762
After applying (Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0): 1,291,396
After applying ISO skim cut: 664,936
After applying DD skim cut: 111,162
After applying !muVeto && muPID > 0 && DLLe < 1.0 && BDTmu > 0.25 && mu_P > 3.0e3 && mu_P < 100.0e3 && mu_ETA > 1.7 && mu_ETA < 5.0 && GhostProb < 0.5: 674,897
After applying ISO skim cut: 471,671
After applying DD skim cut: 37,727
After applying dxy < 7.0 && Y_DISCARDMu_CHI2 < 6.0 && Y_ENDVERTEX_CHI2 < 24.0 && Y_DIRA_OWNPV > 0.9995 && pislow_GhostProb < 0.25: 672,423
After applying ISO skim cut: 469,697
After applying DD skim cut: 37,616
After applying Y_M < 5280.0: 672,360
After applying ISO skim cut: 469,648
After applying DD skim cut: 37,614
After applying ABS(Dst_M-D0_M-145.454) < 2.0: 570,724
After applying ISO skim cut: 414,565
After applying DD skim cut: 29,496
After applying K_PT > 500.0 && pi_PT > 500.0 && K_PT+pi_PT > 1400.0 && D0_PT > 2000.0: 570,724
After applying ISO skim cut: 414,565
After applying DD skim cut: 29,496
Turns out the cut ABS(Dst_M-D0_M-145.454) < 2.0
is very harsh. Maybe this is related to Phoebe's handling of D*
side-band?
Spotted a bug of not using GeV properly when postprocessing Phoebe's step-1.5 ntuple. After fixing this bug, we have:
From ntuple: gen/ref-rdx-ntuple-run1-data-Dst/ntuple/Dst_data--21_10_22--mix--all--2011-2012--md-mu--phoebe.root
ISO: 414,565
1OS: 19,186
2OS: 7,909
DD: 29,496
The ntuple used can be generated w/ make ref-rdx-ntuple-run1-data-Dst
.
The statistics is generated with a simple script (in the scripts
folder):
print_skim_size.py gen/ref-rdx-ntuple-run1-data-Dst/ntuple/Dst_data--21_10_21--mix--all--2011-2012--md-mu--phoebe.root
Keeping D*
side-band, we over-shoot candidates by ~6000:
After applying Y_M < 5280.0: 672,360
After applying ISO skim cut: 469,648
After applying DD skim cut: 37,614
After applying MIN(ABS(Dst_M-D0_M-145.454-9), ABS(Dst_M-D0_M-145.454)) < 2.0 : 594,641
After applying ISO skim cut: 426,851
After applying DD skim cut: 31,488
After applying K_PT > 500.0 && pi_PT > 500.0 && K_PT+pi_PT > 1400.0 && D0_PT > 2000.0: 594,641
After applying ISO skim cut: 426,851
After applying DD skim cut: 31,488
Note: Debugging in process, this is just a reminder to myself and should not be read by anyone else.
Currently I have identified these cuts and here's the output:
Before applying any cut: 6,452,737
After applying isData && DstIDprod > 0 && IDprod > 0 && IN_RANGE(m_nu1, -2.0, 10.9, true) && IN_RANGE(GEV(El), 0.1, 2.65, true) && IN_RANGE(GEV2(q2), -0.4, 12.6, true): 1,369,870
After applying ISO skim cut: 699,952
After applying 1OS skim cut: 35,250
After applying 2OS skim cut: 43,255
After applying DD skim cut: 119,502
After applying L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)): 1,291,396
After applying ISO skim cut: 664,936
After applying 1OS skim cut: 33,458
After applying 2OS skim cut: 40,458
After applying DD skim cut: 111,162
After applying !muVeto && muPID > 0 && DLLe < 1.0 && BDTmu > 0.25 && IN_RANGE(mu_P, 3.0e3, 100.0e3) && IN_RANGE(mu_ETA, 1.7, 5.0): 679,610
After applying ISO skim cut: 474,561
After applying 1OS skim cut: 21,300
After applying 2OS skim cut: 10,209
After applying DD skim cut: 38,148
After applying GhostProb < 0.5 && muIPCHI2 > 45.0: 674,897
After applying ISO skim cut: 471,671
After applying 1OS skim cut: 21,159
After applying 2OS skim cut: 10,091
After applying DD skim cut: 37,727
After applying dxy < 7.0 && Y_M < 5280.0: 672,360
After applying ISO skim cut: 469,648
After applying 1OS skim cut: 21,080
After applying 2OS skim cut: 10,058
After applying DD skim cut: 37,614
After applying ABS(Dst_M-D0_M-145.454-9) < 2.0 || ABS(Dst_M-D0_M-145.454) < 2.0: 594,641
After applying ISO skim cut: 426,851
After applying 1OS skim cut: 19,550
After applying 2OS skim cut: 8,424
After applying DD skim cut: 31,488
I noted that Phoebe's step-1.5 ntuples are the output of the AddB.C
, so there's no need to look into AddB.C
for additional cuts. Instead, all cuts for the templates must be inside the redoHistos_Dst.C
. (I'm using Phoebe's CERN gitlab repo and links are already included).
The input ntuple is from Pheobe's EOS:
/eos/user/b/bhamilto/Proctuples/BCandsMerge_Dst.root
which was updated on Thu 21, Oct 2021.
The data templates are obtained from this commit:
I found the following GLOBAL cuts for the data fit templates:
I applied these cuts in a dedicated cutflow study script: https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/6f5cb344ee47f4e5f6d5937805fc3853e122eccf/studies/cutflow-sync_with_phoebe/cutflow-sync_with_phoebe.py#L28-L49
And the output is the following:
The reference templates have the following entries:
ISO: 420,646
1OS: 19,666
2OS: 8,389
DD: 30,918
Before applying any cut: 6,452,737
After applying isData > 0 && DstIDprod > 0 && IDprod > 0 && muPID > 0 && IN_RANGE(m_nu1, -2.0, 10.9, true) && IN_RANGE(GEV(El), 0.1, 2.65, true) && IN_RANGE(GEV2(q2), -0.4, 12.6, true): 817,151
After applying ISO skim cut: 548,018 (+127,372, +23.2%)
After applying 1OS skim cut: 25,138 (+5,472, +21.8%)
After applying 2OS skim cut: 14,809 (+6,420, +43.4%)
After applying DD skim cut: 52,379 (+21,461, +41.0%)
After applying L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)): 773,959
After applying ISO skim cut: 521,370 (+100,724, +19.3%)
After applying 1OS skim cut: 23,891 (+4,225, +17.7%)
After applying 2OS skim cut: 13,891 (+5,502, +39.6%)
After applying DD skim cut: 48,803 (+17,885, +36.6%)
After applying !muVeto && DLLe < 1.0 && BDTmu > 0.25 && IN_RANGE(mu_P, 3.0e3, 100.0e3) && IN_RANGE(mu_ETA, 1.7, 5.0): 679,610
After applying ISO skim cut: 474,561 (+53,915, +11.4%)
After applying 1OS skim cut: 21,300 (+1,634, +7.7%)
After applying 2OS skim cut: 10,209 (+1,820, +17.8%)
After applying DD skim cut: 38,148 (+7,230, +19.0%)
After applying dxy < 7.0 && Y_M < 5280.0: 677,041
After applying ISO skim cut: 472,518 (+51,872, +11.0%)
After applying 1OS skim cut: 21,218 (+1,552, +7.3%)
After applying 2OS skim cut: 10,175 (+1,786, +17.6%)
After applying DD skim cut: 38,032 (+7,114, +18.7%)
After applying ABS(Dst_M-D0_M-145.454) < 2.0: 574,645
After applying ISO skim cut: 417,075 (-3,571, -0.9%)
After applying 1OS skim cut: 19,310 (-356, -1.8%)
After applying 2OS skim cut: 8,006 (-383, -4.8%)
After applying DD skim cut: 29,836 (-1,082, -3.6%)
Included a screenshot in case it's more readable:
If we just focus on ISO skim, which has the cut ISOnum == 0 && iso_BDT < 0.15
, we are already 0.9% less than Phoebe's number.
You can go to studies/cutflow-sync_with_phoebe
and run the script inside that folder.
So I'm only applying the cuts that I already found, yet we are already -1%~-5% less than Phoebe's reported numbers. Maybe our reference templates are still inconsistent? (The reference template was updated ~1w ago, whereas the step-1.5 ntuple we are working on were obtained around Th).
@manuelfs @Svende @afernez FYI.
Phoebe figured out that her redoHistos
was not applying the single candidate selection due to a bug, and we were missing this cut.
The idea here is that if the data/MC weights tend to zero out certain regions of the MC, then it is philosophically most self-consistent to also remove those kinematic regions from data. So the data are run through the same machinery to remove these. I only have it for the MCMC pt2 weights and I think thats the only step Greg recommended it for
Taking that into account, we match the ISO entries for D*
, so we can now proceed to check the other skim cuts
rut /eos/user/b/bhamilto/Proctuples/BCandsMerge_Dst.root
root [10] ntp1->GetEntries("isData > 0 && DstIDprod > 0 && IDprod > 0 && muPID > 0 && m_nu1>-2 && m_nu1<10.9 && El>100 && El<2650 && q2>-400000 && q2 <12600000 &&L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)) && !muVeto && DLLe < 1.0 && BDTmu > 0.25 && mu_P>3000 && mu_P<100000 && mu_ETA > 1.7 && mu_ETA<5 && dxy < 7.0 && Y_M < 5280.0 && abs(Dst_M-D0_M-145.454) < 2 && iso_BDT < 0.15 && !(reweighting_69_gen3_pt2 < 0.01 || reweighting_89_gen3_pt2 < 0.01)")
(long long) 420646
@yipengsun For future debugging, could you print the cuts with the ROOT interactive format? That will allow everyone to do quick independent checks
Now it should only use standard functions for the global cut. I also added the missing weight cut and printed out the global cuts to apply for easier copy-pasting.
Keeping the single candidate selection cut:
Cuts we are about to apply:
isData > 0 && DstIDprod > 0 && IDprod > 0 && muPID > 0 && m_nu1 >= -2.0 && m_nu1 <= 10.9 && El >= 0.1e3 && El <= 2.65e3 && q2 >= -0.4e6 && q2 <= 12.6e6 && L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)) && !muVeto && DLLe < 1.0 && BDTmu > 0.25 && mu_P > 3.0e3 && mu_P < 100.0e3 && mu_ETA > 1.7 && mu_ETA < 5.0 && dxy < 7.0 && Y_M < 5280.0 && abs(Dst_M-D0_M-145.454) < 2.0 && !(reweighting_69_gen3_pt2 < 0.01 || reweighting_89_gen3_pt2 < 0.01)
The reference templates have the following entries:
ISO: 420,646
1OS: 19,666
2OS: 8,389
DD: 30,918
Before applying any cut: 6,452,737
After applying isData > 0 && DstIDprod > 0 && IDprod > 0 && muPID > 0 && m_nu1 >= -2.0 && m_nu1 <= 10.9 && El >= 0.1e3 && El <= 2.65e3 && q2 >= -0.4e6 && q2 <= 12.6e6: 817,151
After applying ISO skim cut: 548,018 (+127,372, +23.2%)
After applying 1OS skim cut: 25,138 (+5,472, +21.8%)
After applying 2OS skim cut: 14,809 (+6,420, +43.4%)
After applying DD skim cut: 52,379 (+21,461, +41.0%)
After applying L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)): 773,959
After applying ISO skim cut: 521,370 (+100,724, +19.3%)
After applying 1OS skim cut: 23,891 (+4,225, +17.7%)
After applying 2OS skim cut: 13,891 (+5,502, +39.6%)
After applying DD skim cut: 48,803 (+17,885, +36.6%)
After applying !muVeto && DLLe < 1.0 && BDTmu > 0.25 && mu_P > 3.0e3 && mu_P < 100.0e3 && mu_ETA > 1.7 && mu_ETA < 5.0: 679,610
After applying ISO skim cut: 474,561 (+53,915, +11.4%)
After applying 1OS skim cut: 21,300 (+1,634, +7.7%)
After applying 2OS skim cut: 10,209 (+1,820, +17.8%)
After applying DD skim cut: 38,148 (+7,230, +19.0%)
After applying dxy < 7.0 && Y_M < 5280.0: 677,041
After applying ISO skim cut: 472,518 (+51,872, +11.0%)
After applying 1OS skim cut: 21,218 (+1,552, +7.3%)
After applying 2OS skim cut: 10,175 (+1,786, +17.6%)
After applying DD skim cut: 38,032 (+7,114, +18.7%)
After applying abs(Dst_M-D0_M-145.454) < 2.0: 574,645
After applying ISO skim cut: 417,075 (-3,571, -0.9%)
After applying 1OS skim cut: 19,310 (-356, -1.8%)
After applying 2OS skim cut: 8,006 (-383, -4.8%)
After applying DD skim cut: 29,836 (-1,082, -3.6%)
After applying !(reweighting_69_gen3_pt2 < 0.01 || reweighting_89_gen3_pt2 < 0.01): 573,853
After applying ISO skim cut: 416,502 (-4,144, -1.0%)
After applying 1OS skim cut: 19,284 (-382, -2.0%)
After applying 2OS skim cut: 7,993 (-396, -5.0%)
After applying DD skim cut: 29,808 (-1,110, -3.7%)
If I disable single candidate selections and apply the missing cut, indeed I fully recover the numbers reported by Phoebe's templates:
After applying !(reweighting_69_gen3_pt2 < 0.01 || reweighting_89_gen3_pt2 < 0.01): 573,853
After applying ISO skim cut: 420,646 (+0, +0.0%)
After applying 1OS skim cut: 19,666 (+0, +0.0%)
After applying 2OS skim cut: 8,389 (+0, +0.0%)
After applying DD skim cut: 30,918 (+0, +0.0%)
I think this shows that our skim cuts are definitely consistent w/ Phoebes.
That's great. So our current working hypothesis is that the PID is different between Run 1 and Run 2. Perhaps the factor of 2.6 is due to the cut on uBDT
, and the 1.6 on the DD
sample is due to applying the iso_nnk
on top of that (the 1OS skim not being affected because it is a veto and makes less of a difference).
We can check this hypothesis with a cutflow leaving the PID cuts for last.
I made a cutflow that apply the PID last, which can be generated with workflows/rdx_cutflows.py rdx-cutflow-data-pid-last
:
Cut | Run 1 | Run 2 | Run 1 $\epsilon$ | Run 2 $\epsilon$ | $\epsilon$ ratio |
---|---|---|---|---|---|
Total events | 216987 | 5349722 | - | - | - |
Offline $D^0$ cuts | 102287 | 1056832 | 47.1 | 19.8 | 0.42 |
Offline $\mu$ cuts | 96703 | 874676 | 94.5 | 82.8 | 0.88 |
Offline $D^* \mu$ combo cuts | 77498 | 658823 | 80.1 | 75.3 | 0.94 |
$K \pi$ PID | 75002 | 630664 | 96.8 | 95.7 | 0.99 |
$\mu$ PID | 74245 | 507542 | 99.0 | 80.5 | 0.81 |
$BDT_{iso} < 0.15$ | 48004 | 323746 | 64.7 | 63.8 | 0.99 |
Total eff. | - | - | 22.1 | 6.1 | 0.27 |
Yield ratio x 0.35 | 48004 | 323746 | - | - | 2.39 |
A couple of observations:
What ntuples were used for this cutflow? What is the expected ratio of yields?
Ntuples used:
The expected ratio should be ~2.6 (the number we got from the run1-2 fit template comparisons)
I forgot to apply trigger cuts to the cutflow above. Here's the fixed cutflow (still w/ the same inputs):
Cut | Run 1 | Run 2 | Run 1 $\epsilon$ | Run 2 $\epsilon$ | $\epsilon$ ratio |
---|---|---|---|---|---|
Total events | 216987 | 5349722 | - | - | - |
Trigger | 203010 | 3104680 | 93.6 | 58.0 | 0.62 |
Offline $D^0$ cuts | 99358 | 688032 | 48.9 | 22.2 | 0.45 |
Offline $\mu$ cuts | 93899 | 572628 | 94.5 | 83.2 | 0.88 |
Offline $D^* \mu$ combo cuts | 75281 | 429461 | 80.2 | 75.0 | 0.94 |
$K \pi$ PID | 73571 | 414057 | 97.7 | 96.4 | 0.99 |
$\mu$ PID | 72832 | 308792 | 99.0 | 74.6 | 0.75 |
$BDT_{iso} < 0.15$ | 47060 | 190133 | 64.6 | 61.6 | 0.95 |
Total eff. | - | - | 21.7 | 3.6 | 0.16 |
Yield ratio x 0.35 | 47060 | 190133 | - | - | 1.43 |
Now, this number is consistent w/ the 1.4x increase (DON'T confuse this w/ the 2.6, like I did). So what's going on here? Here's a few theories:
The 2011 MD ntuple is not generated correctly
I doubt it, as I compared the number of events between ours and Phoebes:
> uidcommon -n 0.9.5-bugfix/Dst_D0-std/Dst_D0--21_10_07--std--LHCb_Collision11_Beam3500GeV-VeloClosed-MagDown_Real_Data_Reco14_Stripping21r1_90000000_SEMILEPTONIC.DST.root -N ref-rdx-run1/Dst-std/Dst--20_09_16--std--data--2011--md--phoebe.root -t TupleB0/DecayTree -T YCands/DecayTree
Total common IDs: 194325
> uiddump -n 0.9.5-bugfix/Dst_D0-std/Dst_D0--21_10_07--std--LHCb_Collision11_Beam3500GeV-VeloClosed-MagDown_Real_Data_Reco14_Stripping21r1_90000000_SEMILEPTONIC.DST.root -t TupleB0/DecayTree
Num of events: 229552, Num of IDs: 216987, Num of UIDs: 205508
Num of duplicated IDs: 11479, Num of duplicated events: 12565, duplicate rate: 5.47%
> uiddump -n ref-rdx-run1/Dst-std/Dst--20_09_16--std--data--2011--md--phoebe.root -t YCands/DecayTree
Num of events: 217936, Num of IDs: 208846, Num of UIDs: 200406
Num of duplicated IDs: 8440, Num of duplicated events: 9090, duplicate rate: 4.17%
The one candidate only implementation is very different between cutflow script and babymaker
postprocessing
is_iso
cut:
D*
tree: 310869 candidates, 0.47% duplication rateD0
tree: 1211847 candidates, 0.06% duplication rateD*
tree: 310352 candidates, 0% duplD0
tree: 1210668 candidates, 0% duplNote that 310352 doesn't agree w/ 190133 at all! So the second theory seems right. Now I need to think about what's going on here. The number is also very different from the previously reported 170118 from the real data.
I think I found the problem: Previously incorrect ordering was passed in both cutflow and babymaker (probably it was copy-pasted) so that:
//d0_endvtx_chi2 / d0_endvtx_ndof // correct
d0_endvtx_ndof / d0_endvtx_chi2 // previously what was actually applied
Old cutflow table:
Cut | Run 1 | Run 2 | Run 1 $\epsilon$ | Run 2 $\epsilon$ | $\epsilon$ ratio |
---|---|---|---|---|---|
Total events | 216987 | 5349722 | - | - | - |
Trigger | 203010 | 3104680 | 93.6 | 58.0 | 0.62 |
Offline $D^0$ cuts | 99358 | 688032 | 48.9 | 22.2 | 0.45 |
Offline $\mu$ cuts | 93899 | 572628 | 94.5 | 83.2 | 0.88 |
Offline $D^* \mu$ combo cuts | 75281 | 429461 | 80.2 | 75.0 | 0.94 |
$K \pi$ PID | 73571 | 414057 | 97.7 | 96.4 | 0.99 |
$\mu$ PID | 72832 | 308792 | 99.0 | 74.6 | 0.75 |
$BDT_{iso} < 0.15$ | 47060 | 190133 | 64.6 | 61.6 | 0.95 |
Total eff. | - | - | 21.7 | 3.6 | 0.16 |
Yield ratio x 0.35 | 47060 | 190133 | - | - | 1.43 |
So we keep much fewer number of events. I fixed that first in the babymaker YAML, without paying too much attention, and that's why I observed the differences above.
I also fixed that in the cutflow script, and also made a minor fix on Muon selection. Now the cutflow number and the babymaker number (w/o single candidate, and is in cutflow mode) fully agrees:
Cutflow table:
Cut | Run 1 | Run 2 | Run 1 $\epsilon$ | Run 2 $\epsilon$ | $\epsilon$ ratio |
---|---|---|---|---|---|
Total events | 216987 | 5349722 | - | - | - |
Trigger | 203010 | 3104680 | 93.6 | 58.0 | 0.62 |
Offline $D^0$ cuts | 157096 | 1137211 | 77.4 | 36.6 | 0.47 |
Offline $\mu$ cuts | 148253 | 945709 | 94.4 | 83.2 | 0.88 |
Offline $D^* \mu$ combo cuts | 119157 | 716047 | 80.4 | 75.7 | 0.94 |
$K \pi$ PID | 116497 | 690521 | 97.8 | 96.4 | 0.99 |
$\mu$ PID | 115321 | 516350 | 99.0 | 74.8 | 0.76 |
$BDT_{iso} < 0.15$ | 74535 | 318207 | 64.6 | 61.6 | 0.95 |
Total eff. | - | - | 34.3 | 5.9 | 0.17 |
Yield ratio x 0.35 | 74535 | 318207 | - | - | 1.51 |
babymaker:
> make rdx-ntuple-run2-data-oldcut-debug
> uiddump -n gen/rdx-ntuple-run2-data-oldcut-debug/ntuple/Dst--21_11_04--cutflow_data--data--2016--md.root -t tree -c 'l0 & hlt1 & hlt2 & d0_ok & mu_ok & dstmu_ok & d0_pid_ok & mu_pid_ok & iso_bdt1 < 0.15'
Num of events: 319750, Num of IDs: 318207, Num of UIDs: 316701
Num of duplicated IDs: 1506, Num of duplicated events: 1543, duplicate rate: 0.48%
Note the number 318207.
I applied our step-2 offline cuts and skim cuts to Phoebe's 2011 MD ntuple, and compared the output to Pheobe's step-1.5 ntuples w/ skim and year/polarity cuts applied, and they mostly agree (up to +/- 1 candidate per skim). The general workflow and conclusion is documented at: https://github.com/umd-lhcb/rdx-run2-analysis/blob/master/docs/cuts/cut_validation.md.
Consider D*
cut validation down for normal templates.
I searched the following keywords in Phoebe's 20210105
version of the ANA:
I don't see any mentioning of additional cut other than requiring the Mu
/Pi
to have the opposite sign.
I believe in Phoebe's step-1.5 ntuples, the wrong-sign samples can be distinguished from the normal sample w/ DstIDprod
and IDprod
: The IDprod < 0 && DstIDprod > 0
means wrong-sign Mu
; the IDprod > 0 && DstIDprod < 0
means wrong-sign Pi
.
There's some plot w/o D0/D*
mass window cut in the ANA note to show that the cuts select mostly real D0/D*
. I tried to reproduce these plots
And noticed that the D*
plots have an additional cut. It is a DaVinci-level D*
mass window cut:
https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/4307634453eab6031afaf7b6e779d2ea5ba260e5/run1-rdx/reco_Dst_D0.py#L477
Anyway, this doesn't affect our final result in any way and consider this checked.
Note the definitions of wrong-sign-related variables in Phoebe's AddB.C
:
IDprod = (double)muplus_ID*D0_ID;
DstIDprod = (double)D0_ID*piminus_ID;
Consider validation of the right-sign sample done.
Previous we validated the global step-2 cuts, which apply cuts to Phoebe's step-1 ntuple and compare the output w/ Phoebe's step-2.
Note that Pheobe's step-2 doesn't seem to contain skim booleans like
is_iso
oris_1os
, so I don't think we have an anchor point from the ntuple directly. These flags are mostly for MC:However, what we can do is: Use Phoebe's step-2 ntuple as input, apply our skim cuts and build templates, then compare the template entry from Phoebe's run 1 (2011) template.