Validate global and skim cuts for D0, D* regular data templates

yipengsun commented 3 years ago

Previous we validated the global step-2 cuts, which apply cuts to Phoebe's step-1 ntuple and compare the output w/ Phoebe's step-2.

Note that Pheobe's step-2 doesn't seem to contain skim booleans like is_iso or is_1os, so I don't think we have an anchor point from the ntuple directly. These flags are mostly for MC:

  flag2011: bool
  flagBadMu: double
  flagBadSoln: double
  flagBmu: double
  flagComb: double
  flagD0mu: float
  flagDoubleD: double
  flagDstSB: double
  flagGhost: double
  flagTauonicD: double
  flagtaumu: double

However, what we can do is: Use Phoebe's step-2 ntuple as input, apply our skim cuts and build templates, then compare the template entry from Phoebe's run 1 (2011) template.

yipengsun commented 3 years ago

Here's all branches in Phoebe's step-2 ntuple:

ntp1:
  AntiISOnum: int32_t
  BDTmu: float
  B_MPTb: double
  B_TRUEP: double
  B_TRUEPT: double
  B_TRUEP_Z: double
  B_XY_ERR: double
  Btype: int32_t
  Chi: double
  Chi2: double
  CosFlightReco: double
  D0IP: double
  D0IPCHI2: double
  D0_DIRA_OWNPV: double
  D0_FD: double
  D0_M: double
  D0_P: double
  D0_PT: double
  DLLe: double
  DLLmu: double
  DMUDIRA: double
  DeltaChi2: double
  DstIDprod: double
  DstOk: double
  Dst_2010_minus_MC_MOTHER_ID: int32_t
  Dst_2010_minus_MC_MOTHER_KEY: int32_t
  Dst_ENDVERTEX_CHI2: double
  Dst_ID: int32_t
  Dst_M: double
  Dst_MC_MOTHER_ND: int32_t
  Dst_P: double
  Dst_PT: double
  Dst_TRUEID: int32_t
  Dst_mom: int32_t
  Dst_mom_m: double
  Dststtype: int32_t
  El: double
  El2: double
  ElR: double
  Elaltm: double
  Elaltp: double
  ElotherVtx: double
  Elsmear: double
  ElsmearG: double
  ElsmearK: double
  ElsmearK0: double
  ElsmearK0_eCut: double
  ElsmearK0_eSel: double
  ElsmearK_eCut: double
  ElsmearK_eSel: double
  Elsmearpi: double
  Elsmearpi0: double
  Elsmearpi0_eCut: double
  Elsmearpi0_eSel: double
  Elsmearpi_eCut: double
  Elsmearpi_eSel: double
  Elt: double
  Etaut: double
  FFweight: double
  FFweightALT: double
  FFweightu1m: double
  FFweightu1p: double
  FFweightu2m: double
  FFweightu2p: double
  FFweightu3m: double
  FFweightu3p: double
  FFweightu4m: double
  FFweightu4p: double
  FFweightu5m: double
  FFweightu5p: double
  FFweightu6m: double
  FFweightu6p: double
  FFweightu7m: double
  FFweightu7p: double
  FFweightu8m: double
  FFweightu8p: double
  FFweightu9m: double
  FFweightu9p: double
  FFweightuAm: double
  FFweightuAp: double
  FFweightv1m: double
  FFweightv1p: double
  FFweightv2m: double
  FFweightv2p: double
  FFweightv3m: double
  FFweightv3p: double
  FFweightv4m: double
  FFweightv4p: double
  FFweightvv1m: double
  FFweightvv1p: double
  FFweightvv2m: double
  FFweightvv2p: double
  FFweightvv3m: double
  FFweightvv3p: double
  GhostProb: double
  GsmearAngle: double
  HADDEC: bool
  HADTIS: bool
  HLTTCK: UInt_t
  Hlt1: bool
  Hlt1K: bool
  Hlt1TAL0K: bool
  Hlt1TAL0pi: bool
  Hlt1pi: bool
  Hlt2: bool
  IDprod: double
  ISOnum: int32_t
  JustDst: double
  KPID: double
  KPIDerror: double
  KPIDweight: double
  KPIDweight2: double
  K_P: double
  K_PT: double
  Kplus_P: double
  Kplus_rho: double
  L0: bool
  L0DUTCK: UInt_t
  MHD: double
  MUDEC: bool
  MUTIS: bool
  NNmu: double
  NShared: int32_t
  Nbody: int32_t
  Oldpi2mu_nonuBDTeCut: double
  Oldpi2mu_nonuBDTeSel: double
  Oldpi2mu_uBDTeCut: double
  Oldpi2mu_uBDTeSel: double
  P1: double
  PK2h: double
  PV_XY_ERR: double
  Pe2h: double
  PestZ: double
  Polarity: int16_t
  Pp2h: double
  Ppi2h: double
  Pu2h: double
  YTIS: bool
  YTOS: bool
  Y_BKGCAT: int32_t
  Y_BKGCAT_OLD: int32_t
  Y_DIRA_OWNPV: double
  Y_DISCARDMu_CHI2: double
  Y_ENDVERTEX_CHI2: double
  Y_ETA: double
  Y_FDCHI2_OWNPV: double
  Y_FD_OWNPV: double
  Y_IPCHI2_OWNPV: double
  Y_IP_OWNPV: double
  Y_M: double
  Y_MMERR: double
  Y_MsmearK: double
  Y_MsmearK0: double
  Y_MsmearK0_eCut: double
  Y_MsmearK0_eSel: double
  Y_MsmearK_eCut: double
  Y_MsmearK_eSel: double
  Y_Msmearpi: double
  Y_Msmearpi0: double
  Y_Msmearpi0_eCut: double
  Y_Msmearpi0_eSel: double
  Y_Msmearpi_eCut: double
  Y_Msmearpi_eSel: double
  Y_P: double
  Y_PT: double
  Y_SIGMA_IP: double
  Y_myDOCA: double
  Y_myDOCAchi2: double
  altM: double
  badSlowPi: double
  badSlowPiTau: double
  cmult100: int32_t
  cmult60: int32_t
  cpt60: double
  dxy: double
  dxy_err: double
  e2mu: double
  e2mu_nonuBDTeCut: double
  e2mu_nonuBDTeSel: double
  e2mu_uBDT: double
  e2mu_uBDTeCut: double
  e2mu_uBDTeSel: double
  e2notmu: double
  eratio_uBDT: double
  eventNumber: ULong64_t
  f_k: double
  f_p: double
  f_pi: double
  flag2011: bool
  flagBadMu: double
  flagBadSoln: double
  flagBmu: double
  flagComb: double
  flagD0mu: float
  flagDoubleD: double
  flagDstSB: double
  flagGhost: double
  flagTauonicD: double
  flagtaumu: double
  higherD0hel: double
  iBin: int32_t
  iBinK: int32_t
  iBinpi: int32_t
  isData: double
  ishigher: bool
  iso: double
  iso2: double
  iso_BDT: double
  iso_BDT2: double
  iso_BDT3: double
  iso_CHARGE: float
  iso_CHARGE2: float
  iso_CHARGE3: float
  iso_CHI2: double
  iso_DeltaM: double
  iso_NNk: float
  iso_NNk2: float
  iso_NNk3: float
  iso_NNkw: double
  iso_NNkw2: double
  iso_NNkw3: double
  iso_NNp: float
  iso_NNp2: float
  iso_NNp3: float
  iso_P: float
  iso_P2: float
  iso_P3: float
  iso_PE: float
  iso_PE2: float
  iso_PT: float
  iso_PT2: float
  iso_PT3: float
  iso_Type: float
  iso_Type2: float
  iso_Type3: float
  iso_clonevar: double
  k2k: double
  k2mu: double
  k2mu_nonuBDTeCut: double
  k2mu_nonuBDTeSel: double
  k2mu_uBDT: double
  k2mu_uBDTeCut: double
  k2mu_uBDTeSel: double
  k2notmu: double
  k2pi: double
  kWeight: double
  kWeightErr: double
  keepme: bool
  logDOCA: double
  mDD: double
  mDDnew: double
  mX_DD: double
  mXnew_DD: double
  m_corr: double
  m_nu1: double
  m_nu1altm: double
  m_nu1altp: double
  m_nu1otherVtx: double
  m_nu1smear: double
  m_nu1smearG: double
  m_nu1smearK: double
  m_nu1smearK0: double
  m_nu1smearK0_eCut: double
  m_nu1smearK0_eSel: double
  m_nu1smearK_eCut: double
  m_nu1smearK_eSel: double
  m_nu1smearpi: double
  m_nu1smearpi0: double
  m_nu1smearpi0_eCut: double
  m_nu1smearpi0_eSel: double
  m_nu1smearpi_eCut: double
  m_nu1smearpi_eSel: double
  m_nu2: double
  m_nuG: double
  m_nuR: double
  m_nuT: double
  matchChi2: double
  mcWeight: double
  mm_DD: double
  mm_mom: double
  momWeight: double
  muHAD: bool
  muIP: double
  muIPCHI2: double
  muPID: float
  muPIDerror: double
  muPIDweight: double
  muPIDweight_nonuBDT: double
  muPIDweight_nonuBDTeCut: double
  muPIDweight_nonuBDTeSel: double
  muPIDweight_uBDT: double
  muPIDweight_uBDTeCut: double
  muPIDweight_uBDTeSel: double
  muTOS: bool
  muVeto: bool
  mu_CosTheta: double
  mu_ETA: double
  mu_P: double
  mu_PT: double
  mu_PTb: double
  mu_PTsmearK: double
  mu_PTsmearK0: double
  mu_PTsmearK0_eCut: double
  mu_PTsmearK0_eSel: double
  mu_PTsmearK_eCut: double
  mu_PTsmearK_eSel: double
  mu_PTsmearpi: double
  mu_PTsmearpi0: double
  mu_PTsmearpi0_eCut: double
  mu_PTsmearpi0_eSel: double
  mu_PTsmearpi_eCut: double
  mu_PTsmearpi_eSel: double
  mu_PsmearK: double
  mu_PsmearK0: double
  mu_PsmearK0_eCut: double
  mu_PsmearK0_eSel: double
  mu_PsmearK_eCut: double
  mu_PsmearK_eSel: double
  mu_Psmearpi: double
  mu_Psmearpi0: double
  mu_Psmearpi0_eCut: double
  mu_Psmearpi0_eSel: double
  mu_Psmearpi_eCut: double
  mu_Psmearpi_eSel: double
  mu_has: bool
  mu_is: bool
  mu_isT: bool
  muplus_MC_MOTHER_ID: int32_t
  muplus_MC_MOTHER_KEY: int32_t
  muplus_MC_MOTHER_ND: int32_t
  muplus_TRUEID: int32_t
  muplus_rho: double
  nISO: int32_t
  nSPDhits: double
  nTracks: double
  noChi2: int32_t
  noDChi2: int32_t
  p2mu: double
  p2mu_nonuBDTeCut: double
  p2mu_nonuBDTeSel: double
  p2mu_uBDT: double
  p2mu_uBDTeCut: double
  p2mu_uBDTeSel: double
  p2notmu: double
  p2p: double
  pWeight: double
  pWeightErr: double
  pi2k: double
  pi2mu: double
  pi2mu_nonuBDTeCut: double
  pi2mu_nonuBDTeSel: double
  pi2mu_uBDT: double
  pi2mu_uBDTeCut: double
  pi2mu_uBDTeSel: double
  pi2notmu: double
  pi2pi: double
  piPID: double
  piPIDerror: double
  piPIDweight: double
  piPIDweight2: double
  piWeight: double
  piWeightErr: double
  pi_P: double
  pi_PT: double
  piminus0_P: double
  piminus_TRACK_Type: int32_t
  piminus_rho: double
  pislow_GhostProb: double
  pislow_IP: double
  pislow_IPCHI2: double
  pislow_P: double
  pislow_PT: double
  pislow_ProbNNk: double
  pislow_ismu: bool
  pislow_muAcc: bool
  pislow_rho: double
  q2: double
  q2R: double
  q2altm: double
  q2altp: double
  q2otherVtx: double
  q2smear: double
  q2smearG: double
  q2smearK: double
  q2smearK0: double
  q2smearK0_eCut: double
  q2smearK0_eSel: double
  q2smearK_eCut: double
  q2smearK_eSel: double
  q2smearpi: double
  q2smearpi0: double
  q2smearpi0_eCut: double
  q2smearpi0_eSel: double
  q2smearpi_eCut: double
  q2smearpi_eSel: double
  q2t: double
  q2tD: double
  reweighting_68: float
  reweighting_69_gen2: float
  reweighting_69_gen2_pt2: float
  reweighting_69_pt2: float
  reweighting_89: float
  reweighting_89_gen2: float
  reweighting_89_gen2_pt2: float
  reweighting_89_pt2: float
  reweighting_JpsiK09_v1: float
  reweighting_JpsiK09_v2: float
  runNumber: UInt_t
  selcounter: UInt_t
  simpleDstst: double
  tantheta: double
  tanthetaotherVtx: double
  thetaD: double
  thetaFlight: double
  thetaFlightT: double
  thetaL: double
  totWeight: double
  totWeight2: double
  totWeight2_uBDT: double
  totWeight_uBDT: double
  transverseRes: double
  transverseResDmu: double
  u2mu: double
  u2mu_nonuBDTeCut: double
  u2mu_nonuBDTeSel: double
  u2mu_uBDT: double
  u2mu_uBDTeCut: double
  u2mu_uBDTeSel: double
  u2notmu: double
  wCorr: double
  wDkin: double
  wSPD: double
  wTRIG: double
  weightD: double
  weightTRKeff: double
  weightnTRK: double
  wt: double

yipengsun commented 3 years ago

Looks like somehow the global cuts for D* are slightly tighter in our extracted cuts. I applied OUR Global cuts to Phoebe's step-2 ntuple, then just apply basic ISO cut, I get

> uiddump -n Dst_data--21_10_14--mix--all--2011-2012--md-mu--phoebe.root -t tree -c "iso_bdt1 < 0.15 & mu_ubdt > 0.25"
Num of events: 416353, Num of IDs: 416353, Num of UIDs: 416353
Num of duplicated IDs: 0, Num of duplicated events: 0, duplicate rate: 0.00%

where in the template in number of events should be 421224

Note that the ntuple is generated with:

make ref-rdx-ntuple-run1-data-Dst

yipengsun commented 3 years ago

And the global cuts for D0 are slightly looser:

> uiddump -n D0_data--21_10_14--mix--all--2011-2012--md-mu--phoebe.root -t tree -c is_iso
Num of events: 1770274, Num of IDs: 1769666, Num of UIDs: 1769058
Num of duplicated IDs: 608, Num of duplicated events: 608, duplicate rate: 0.03%

where in the template the number of events should be 1734133.

yipengsun commented 3 years ago

The 1OS and 2OS numbers a not very consistent. I think this warrants further investigation. My plan is: Try to run Phoebe's selection code with minimal modification and compare numbers (I already tried this yesterday afternoon, but was not successful. I just need to try harder).

yipengsun commented 3 years ago

I'm trying to estimate the Run 1/Run 2 template number of event numbers more precisely.

For luminosity, this report contains more detailed lumi over the years
Also, we can't naively multiply by 1.4 for increase in trigger efficiency, because in step-2 we are applying run 1 cuts back, which partially included the run 1 trigger
- If we were to believe the cutflow study number from the cocktail MC study, the efficiency is just 1.13 (this is w/o UBDT cut)

So, a better estimation would be:

(1.11+2.07) / (1.56/2*2*1.13) = 1.80

To arrive at the observed ~2.6 ratio, we need to have an efficiency of ~0.8 for run1/run2, that is, run 1 cut is less efficient on run 2 data. Actually, from an even older cutflow study on real data, the efficiency is about 0.73, so this more or less adds up, and the efficiency from the cocktail study may be not very believable.

manuelfs commented 3 years ago

I'm not sure you got the right luminosity numbers: 1.11 for 2011 and 1.56 for 2016, where did you get those from? In the report I see 1.11 and 1.66 for delivered luminosity. In any case, what we really need is the QA-ed luminosity, which is a subset of the recorded luminosity, and I don't see that one in the report. Svende had found those numbers for Run 2 (from Dirac, I think)

As for the efficiency change, the 40% comes from the data cutflow we presented to the semileptonic group, which should include similar step 2 cuts?

manuelfs commented 3 years ago

@yipengsun Can you provide links in this issue to the table comparing yields and the code/original ntuples you used to generate them?

yipengsun commented 3 years ago

You mean the 2011 vs 2016 yield comparison? I think that is part of the rdx_cutflow workflow:

https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/f66ed43bddb13287a9060eeca2171ae8903ac9cd/workflows/rdx_cutflows.py#L88

yipengsun commented 3 years ago

@manuelfs I'll use this issue to discuss the preliminary results I found for various comparisons between ours and Phoebe's ntuples, and record the definite version of the study in https://github.com/umd-lhcb/rdx-run2-analysis/blob/master/docs/cuts/cut_validation.md.

yipengsun commented 3 years ago

Looking at Pheobe's and ours 2011 MD D* ntuple:

# Phoebes
> uiddump -n Dst--20_09_16--std--data--2011--md--phoebe.root -t YCands/DecayTree
Num of events: 217936, Num of IDs: 208846, Num of UIDs: 200406
Num of duplicated IDs: 8440, Num of duplicated events: 9090, duplicate rate: 4.17%

# Us (produced in Oct 2011)
> uiddump -n Dst_D0--21_10_07--std--LHCb_Collision11_Beam3500GeV-VeloClosed-MagDown_Real_Data_Reco14_Stripping21r1_90000000_SEMILEPTONIC.DST.root -t TupleB0/DecayTree
Num of events: 229552, Num of IDs: 216987, Num of UIDs: 205508
Num of duplicated IDs: 11479, Num of duplicated events: 12565, duplicate rate: 5.47%

# Find comment candidates
> uidcommon -n Dst--20_09_16--std--data--2011--md--phoebe.root -t YCands/DecayTree -N ../../0.9.5-bugfix/Dst_D0-std/Dst_D0--21_10_07--std--LHCb_Collision11_Beam3500GeV-VeloClosed-MagDown_Real_Data_Reco14_Stripping21r1_90000000_SEMILEPTONIC.DST.root -T TupleB0/DecayTree
Total common IDs: 194325

yipengsun commented 3 years ago

We actually don't have Phoebe's 2011 D0 ntuple annexed. I'll go to Phoebe's CERN box and annex the ntuple.

yipengsun commented 3 years ago

Actually, I don't think Phoebe has put her 2011 D0 ntuple on her CERNbox (I don't see any shared folder from her on CERNbox). @manuelfs Can you annex the 2011 MD D0 ntuple from your external USB drive under the folder

ntuples/ref-rdx-run1/D0-std

yipengsun commented 3 years ago

Phoebe's latest step-2 ntuples also include fit templates, so I guess now we have a more consistent templates to compare to.

yipengsun commented 3 years ago

I'm not familiar w/ Phoebe's template naming convention, and I can't locate her D* ISO template in the latest ntuple. I tried h_data but that contains 500k entries and can't be just the ISO sample. I'll use our existing numbers instead.

yipengsun commented 3 years ago

Applying the same cuts on Phoebe's latest ntuples, I don't see any improvements in the data template size. It's likely that we are not applying the same cuts as Phoebe does.

From ntuple: Dst_data--21_10_21--mix--all--2011-2012--md-mu--phoebe.root
   ISO:      416,353
   1OS:       23,153
   2OS:        8,681
    DD:       30,357

From ntuple: D0_data--21_10_21--mix--all--2011-2012--md-mu--phoebe.root
   ISO:    1,769,303
   1OS:      224,390
   2OS:       53,084
    DD:      216,153

yipengsun commented 3 years ago

Also, Phoebe's current implementation of keeping only 1 candidate:

  currTry=0;

  do{
    selcounter=0;
    fChain->GetTree()->GetEntry(entry+ranking[currTry]);
    currTry++;
    if(debugSingle) cerr << eventNumber << '\t' << totCandidates << '\t' << currTry << endl;

where ranking is sorted by the pseudo random sequence. I think there she just write the final candidates sorted, but she's keeping all final events.

OK, in her redo_Histo, she requires single cand with the following:

    singleCand=(AntiISOnum==0);

yipengsun commented 3 years ago

I removed some K, Pi momentum cuts because Phoebe doesn't have them anymore, also included the DiscardMu_CHI2 back (not mentioned in the note), and tweaked the fit variable cut ranges.

Also disabled my single-candidate selection and fully uses Phoebe's

I still can't figure out the discrepancy. I'd like to ask Phoebe the following questions:

For single candidate selection, there are ISOnum == 0 for ISO skims and AntiISOnum == 0 all other skims. Does it mean that for DD, 1OS, 2OS samples, a single candidate may fulfill multiple templates? Or it is still functionally require a single candidate globally? (Here's Phoebe's usage of ISOnum and AntiISOnum)

Edit: Since DD, 1OS, 2OS are mutually exclusive, it should be that Phoebe's implementation is equivalent to globally keep only 1 candidate and the duplication of eventNumber+runNumber should be very small, but that is not the case in her latest ntuples.

Some note: I checked Phoebe's merged ntuples, and apparently the duplication rate is non-negligible:
```
> uiddump -n Dst--21_10_21--mix--all--2011-2012--md-mu--phoebe.root -t ntp1                                                                                           
Num of events: 6452737, Num of IDs: 5826243, Num of UIDs: 5352881
Num of duplicated IDs: 473362, Num of duplicated events: 626494, duplicate rate: 9.71%
```
uiddump check duplicated runNumber-eventNumber combo
For the ISO, DD, 1OS, 2OS samples, for the real data, are they only differ by some isolation-related cuts? Can you check our skim cuts and see if they are consistent w/ yours?

Our skims cuts are defined at here, only the FLAG_* functions are related here.

Note that the add_flags is defined as mu_ubdt > 0.25 && ISOnum==0 for ISO skim, and mu_ubdt > 0.25 && AntiISOnum==0 for the rest skims.

If you really want to see where these are applied, you can take a look at our selection YAML, but this shouldn't be needed.
We are trying to implement global cuts + skim cuts, where the only differences between ISO, DD, 1OS, 2OS are just skim cuts. Do you have a similar global cuts in the latest ntuple you shared? Can you point out the lines that define such global cuts in your code (for the real data, not MC) so we can double-check ours?

I found a selection flag in Phoebe's code here but this is way too loose:
```
(selcounter & (4096 * 64 - 1)) == (4096 * 64 - 1)
```
If I just apply the cut above AND ISO cuts, I get ~700k candidates in the ISO sample alone.

This is the latest stats for D* skims after applying all changes described above:

> print_skim_size.py gen/ref-rdx-ntuple-run1-data-Dst/ntuple/Dst_data--21_10_21--mix--all--2011-2012--md-mu--phoebe.root 
From ntuple: gen/ref-rdx-ntuple-run1-data-Dst/ntuple/Dst_data--21_10_21--mix--all--2011-2012--md-mu--phoebe.root
   ISO:      419,630 (421,224)
   1OS:       23,187 (19,692)
   2OS:        8,610 (8,403)
    DD:       30,387 (30,948)

where numbers in () are found in Phoebe's fit template.

yipengsun commented 3 years ago

Recovered the TRIGGER_HLT1 && PT > 1700 cut from https://gitlab.cern.ch/bhamilto/rdvsrdst-histfactory/-/blob/master/proc/redoHistos_Dst.C#L1535-1570

Also some of the known global cuts: https://gitlab.cern.ch/bhamilto/rdvsrdst-histfactory/-/blob/master/proc/AddB.C#L2953-2974

Phoebe also mentioned that there's a D* side-band cut: https://gitlab.cern.ch/bhamilto/rdvsrdst-histfactory/-/blob/master/proc/redoHistos_Dst.C#L2736-2741

yipengsun commented 3 years ago

Checking on Phoebe's latest templates from her gitlab repo:

The proctuples folder is pinned to a specific commit. The last update was about 1 week ago.

ISO: proctuples/BCandHistos_Dst.root: 420,646 [414,565]
1OS: proctuples/1OS/BCandHistos_Dst.root: 19,666 [22,926]
2OS: proctuples/2OS/BCandHistos_Dst.root: 8,389 [8,464]
DD: proctuples/DD/BCandHistos_Dst.root: 30,918 [29,796]

where numbers in [] are OUR numbers

yipengsun commented 3 years ago

To better understand the efficiency of each cut, I created a specialized cutflow script that apply cuts step-by-step, and for each step, I apply ISO and DD skim cuts so that we can see the impact of the cuts defined in each step.

The script is located at here. To use it:

# First get Phoebe's latest D* ntuple
git annex get ntuples/ref-rdx-run1/Dst-mix/Dst--21_10_21--mix--all--2011-2012--md-mu--phoebe.root

# Now go to the folder of the script
cd studies/cutflow-sync_with_phoebe
./cutflow-sync_with_phoebe.py

The output is the following:

Before applying any cut: 6,452,737
After applying isData && DstIDprod > 0 && IDprod > 0 && -2.0 <= m_nu1 && m_nu1 <= 10.9 && 0.0 <= GEV(El) && GEV(El) <= 2.65 && -0.4 <= GEV2(q2) && GEV2(q2) <= 12.6: 1,369,870
    After applying ISO skim cut: 699,952
    After applying DD skim cut: 119,502
After applying L0 && (YTIS || YTOS) && Hlt1 && Hlt2: 1,315,034
    After applying ISO skim cut: 674,745
    After applying DD skim cut: 113,762
After applying (Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0): 1,291,396
    After applying ISO skim cut: 664,936
    After applying DD skim cut: 111,162
After applying !muVeto && muPID > 0 && DLLe < 1.0 && BDTmu > 0.25 && mu_P > 3.0e3 && mu_P < 100.0e3 && mu_ETA > 1.7 && mu_ETA < 5.0 && GhostProb < 0.5: 674,897
    After applying ISO skim cut: 471,671
    After applying DD skim cut: 37,727
After applying dxy < 7.0 && Y_DISCARDMu_CHI2 < 6.0 && Y_ENDVERTEX_CHI2 < 24.0 && Y_DIRA_OWNPV > 0.9995 && pislow_GhostProb < 0.25: 672,423
    After applying ISO skim cut: 469,697
    After applying DD skim cut: 37,616
After applying Y_M < 5280.0: 672,360
    After applying ISO skim cut: 469,648
    After applying DD skim cut: 37,614
After applying ABS(Dst_M-D0_M-145.454) < 2.0: 570,724
    After applying ISO skim cut: 414,565
    After applying DD skim cut: 29,496
After applying K_PT > 500.0 && pi_PT > 500.0 && K_PT+pi_PT > 1400.0 && D0_PT > 2000.0: 570,724
    After applying ISO skim cut: 414,565
    After applying DD skim cut: 29,496

Turns out the cut ABS(Dst_M-D0_M-145.454) < 2.0 is very harsh. Maybe this is related to Phoebe's handling of D* side-band?

yipengsun commented 3 years ago

Spotted a bug of not using GeV properly when postprocessing Phoebe's step-1.5 ntuple. After fixing this bug, we have:

From ntuple: gen/ref-rdx-ntuple-run1-data-Dst/ntuple/Dst_data--21_10_22--mix--all--2011-2012--md-mu--phoebe.root
   ISO:      414,565
   1OS:       19,186
   2OS:        7,909
    DD:       29,496

The ntuple used can be generated w/ make ref-rdx-ntuple-run1-data-Dst.

The statistics is generated with a simple script (in the scripts folder):

print_skim_size.py gen/ref-rdx-ntuple-run1-data-Dst/ntuple/Dst_data--21_10_21--mix--all--2011-2012--md-mu--phoebe.root

yipengsun commented 3 years ago

Keeping D* side-band, we over-shoot candidates by ~6000:

After applying Y_M < 5280.0: 672,360
    After applying ISO skim cut: 469,648
    After applying DD skim cut: 37,614
After applying MIN(ABS(Dst_M-D0_M-145.454-9), ABS(Dst_M-D0_M-145.454)) < 2.0 : 594,641
    After applying ISO skim cut: 426,851
    After applying DD skim cut: 31,488
After applying K_PT > 500.0 && pi_PT > 500.0 && K_PT+pi_PT > 1400.0 && D0_PT > 2000.0: 594,641
    After applying ISO skim cut: 426,851
    After applying DD skim cut: 31,488

yipengsun commented 3 years ago

Note: Debugging in process, this is just a reminder to myself and should not be read by anyone else.

Currently I have identified these cuts and here's the output:

Before applying any cut: 6,452,737
After applying isData && DstIDprod > 0 && IDprod > 0 && IN_RANGE(m_nu1, -2.0, 10.9, true) && IN_RANGE(GEV(El), 0.1, 2.65, true) && IN_RANGE(GEV2(q2), -0.4, 12.6, true): 1,369,870
    After applying ISO skim cut: 699,952
    After applying 1OS skim cut: 35,250
    After applying 2OS skim cut: 43,255
    After applying DD skim cut: 119,502
After applying L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)): 1,291,396
    After applying ISO skim cut: 664,936
    After applying 1OS skim cut: 33,458
    After applying 2OS skim cut: 40,458
    After applying DD skim cut: 111,162
After applying !muVeto && muPID > 0 && DLLe < 1.0 && BDTmu > 0.25 && IN_RANGE(mu_P, 3.0e3, 100.0e3) && IN_RANGE(mu_ETA, 1.7, 5.0): 679,610
    After applying ISO skim cut: 474,561
    After applying 1OS skim cut: 21,300
    After applying 2OS skim cut: 10,209
    After applying DD skim cut: 38,148
After applying GhostProb < 0.5 && muIPCHI2 > 45.0: 674,897
    After applying ISO skim cut: 471,671
    After applying 1OS skim cut: 21,159
    After applying 2OS skim cut: 10,091
    After applying DD skim cut: 37,727
After applying dxy < 7.0 && Y_M < 5280.0: 672,360
    After applying ISO skim cut: 469,648
    After applying 1OS skim cut: 21,080
    After applying 2OS skim cut: 10,058
    After applying DD skim cut: 37,614
After applying ABS(Dst_M-D0_M-145.454-9) < 2.0 || ABS(Dst_M-D0_M-145.454) < 2.0: 594,641
    After applying ISO skim cut: 426,851
    After applying 1OS skim cut: 19,550
    After applying 2OS skim cut: 8,424
    After applying DD skim cut: 31,488

yipengsun commented 2 years ago

I noted that Phoebe's step-1.5 ntuples are the output of the AddB.C, so there's no need to look into AddB.C for additional cuts. Instead, all cuts for the templates must be inside the redoHistos_Dst.C. (I'm using Phoebe's CERN gitlab repo and links are already included).

The input ntuple is from Pheobe's EOS:

/eos/user/b/bhamilto/Proctuples/BCandsMerge_Dst.root

which was updated on Thu 21, Oct 2021.

The data templates are obtained from this commit:

https://gitlab.cern.ch/bhamilto/rdvsrdst-histfactory/-/tree/6ad83965963fab928e0e72fa9290f7a9af885d56/proctuples

I found the following GLOBAL cuts for the data fit templates:

Phoebe's global cuts: https://gitlab.cern.ch/bhamilto/rdvsrdst-histfactory/-/blob/master/proc/redoHistos_Dst.C#L1535-1570
The additional cuts for data template: https://gitlab.cern.ch/bhamilto/rdvsrdst-histfactory/-/blob/master/proc/redoHistos_Dst.C#L3651
The binning range cut: https://gitlab.cern.ch/bhamilto/rdvsrdst-histfactory/-/blob/master/proc/redoHistos_Dst.C#L1445-1452

I applied these cuts in a dedicated cutflow study script: https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/6f5cb344ee47f4e5f6d5937805fc3853e122eccf/studies/cutflow-sync_with_phoebe/cutflow-sync_with_phoebe.py#L28-L49

And the output is the following:

The reference templates have the following entries:
    ISO: 420,646
    1OS: 19,666
    2OS: 8,389
     DD: 30,918
Before applying any cut: 6,452,737
After applying isData > 0 && DstIDprod > 0 && IDprod > 0 && muPID > 0 && IN_RANGE(m_nu1, -2.0, 10.9, true) && IN_RANGE(GEV(El), 0.1, 2.65, true) && IN_RANGE(GEV2(q2), -0.4, 12.6, true): 817,151
    After applying ISO skim cut: 548,018 (+127,372, +23.2%)
    After applying 1OS skim cut: 25,138 (+5,472, +21.8%)
    After applying 2OS skim cut: 14,809 (+6,420, +43.4%)
    After applying  DD skim cut: 52,379 (+21,461, +41.0%)
After applying L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)): 773,959
    After applying ISO skim cut: 521,370 (+100,724, +19.3%)
    After applying 1OS skim cut: 23,891 (+4,225, +17.7%)
    After applying 2OS skim cut: 13,891 (+5,502, +39.6%)
    After applying  DD skim cut: 48,803 (+17,885, +36.6%)
After applying !muVeto && DLLe < 1.0 && BDTmu > 0.25 && IN_RANGE(mu_P, 3.0e3, 100.0e3) && IN_RANGE(mu_ETA, 1.7, 5.0): 679,610
    After applying ISO skim cut: 474,561 (+53,915, +11.4%)
    After applying 1OS skim cut: 21,300 (+1,634, +7.7%)
    After applying 2OS skim cut: 10,209 (+1,820, +17.8%)
    After applying  DD skim cut: 38,148 (+7,230, +19.0%)
After applying dxy < 7.0 && Y_M < 5280.0: 677,041
    After applying ISO skim cut: 472,518 (+51,872, +11.0%)
    After applying 1OS skim cut: 21,218 (+1,552, +7.3%)
    After applying 2OS skim cut: 10,175 (+1,786, +17.6%)
    After applying  DD skim cut: 38,032 (+7,114, +18.7%)
After applying ABS(Dst_M-D0_M-145.454) < 2.0: 574,645
    After applying ISO skim cut: 417,075 (-3,571, -0.9%)
    After applying 1OS skim cut: 19,310 (-356, -1.8%)
    After applying 2OS skim cut: 8,006 (-383, -4.8%)
    After applying  DD skim cut: 29,836 (-1,082, -3.6%)

Included a screenshot in case it's more readable: sync_w_phoebe_dst

If we just focus on ISO skim, which has the cut ISOnum == 0 && iso_BDT < 0.15, we are already 0.9% less than Phoebe's number.

You can go to studies/cutflow-sync_with_phoebe and run the script inside that folder.

So I'm only applying the cuts that I already found, yet we are already -1%~-5% less than Phoebe's reported numbers. Maybe our reference templates are still inconsistent? (The reference template was updated ~1w ago, whereas the step-1.5 ntuple we are working on were obtained around Th).

@manuelfs @Svende @afernez FYI.

manuelfs commented 2 years ago

Phoebe figured out that her redoHistos was not applying the single candidate selection due to a bug, and we were missing this cut.

The idea here is that if the data/MC weights tend to zero out certain regions of the MC, then it is philosophically most self-consistent to also remove those kinematic regions from data. So the data are run through the same machinery to remove these. I only have it for the MCMC pt2 weights and I think thats the only step Greg recommended it for

Taking that into account, we match the ISO entries for D*, so we can now proceed to check the other skim cuts

rut /eos/user/b/bhamilto/Proctuples/BCandsMerge_Dst.root
root [10] ntp1->GetEntries("isData > 0 && DstIDprod > 0 && IDprod > 0 && muPID > 0 && m_nu1>-2 && m_nu1<10.9 && El>100 && El<2650 && q2>-400000 && q2 <12600000 &&L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)) && !muVeto && DLLe < 1.0 && BDTmu > 0.25 && mu_P>3000 && mu_P<100000 && mu_ETA > 1.7 && mu_ETA<5 && dxy < 7.0 && Y_M < 5280.0 && abs(Dst_M-D0_M-145.454) < 2 && iso_BDT < 0.15 && !(reweighting_69_gen3_pt2 < 0.01 || reweighting_89_gen3_pt2 < 0.01)")
(long long) 420646

manuelfs commented 2 years ago

@yipengsun For future debugging, could you print the cuts with the ROOT interactive format? That will allow everyone to do quick independent checks

yipengsun commented 2 years ago

Now it should only use standard functions for the global cut. I also added the missing weight cut and printed out the global cuts to apply for easier copy-pasting.

yipengsun commented 2 years ago

Keeping the single candidate selection cut:

Cuts we are about to apply:
    isData > 0 && DstIDprod > 0 && IDprod > 0 && muPID > 0 && m_nu1 >= -2.0 && m_nu1 <= 10.9 && El >= 0.1e3 && El <= 2.65e3 && q2 >= -0.4e6 && q2 <= 12.6e6 && L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)) && !muVeto && DLLe < 1.0 && BDTmu > 0.25 && mu_P > 3.0e3 && mu_P < 100.0e3 && mu_ETA > 1.7 && mu_ETA < 5.0 && dxy < 7.0 && Y_M < 5280.0 && abs(Dst_M-D0_M-145.454) < 2.0 && !(reweighting_69_gen3_pt2 < 0.01 || reweighting_89_gen3_pt2 < 0.01)
The reference templates have the following entries:
    ISO: 420,646
    1OS: 19,666
    2OS: 8,389
     DD: 30,918
Before applying any cut: 6,452,737
After applying isData > 0 && DstIDprod > 0 && IDprod > 0 && muPID > 0 && m_nu1 >= -2.0 && m_nu1 <= 10.9 && El >= 0.1e3 && El <= 2.65e3 && q2 >= -0.4e6 && q2 <= 12.6e6: 817,151
    After applying ISO skim cut: 548,018 (+127,372, +23.2%)
    After applying 1OS skim cut: 25,138 (+5,472, +21.8%)
    After applying 2OS skim cut: 14,809 (+6,420, +43.4%)
    After applying  DD skim cut: 52,379 (+21,461, +41.0%)
After applying L0 && (YTIS || YTOS) && Hlt1 && Hlt2 && ((Hlt1TAL0K && K_PT > 1700.0) || (Hlt1TAL0pi && pi_PT > 1700.0)): 773,959
    After applying ISO skim cut: 521,370 (+100,724, +19.3%)
    After applying 1OS skim cut: 23,891 (+4,225, +17.7%)
    After applying 2OS skim cut: 13,891 (+5,502, +39.6%)
    After applying  DD skim cut: 48,803 (+17,885, +36.6%)
After applying !muVeto && DLLe < 1.0 && BDTmu > 0.25 && mu_P > 3.0e3 && mu_P < 100.0e3 && mu_ETA > 1.7 && mu_ETA < 5.0: 679,610
    After applying ISO skim cut: 474,561 (+53,915, +11.4%)
    After applying 1OS skim cut: 21,300 (+1,634, +7.7%)
    After applying 2OS skim cut: 10,209 (+1,820, +17.8%)
    After applying  DD skim cut: 38,148 (+7,230, +19.0%)
After applying dxy < 7.0 && Y_M < 5280.0: 677,041
    After applying ISO skim cut: 472,518 (+51,872, +11.0%)
    After applying 1OS skim cut: 21,218 (+1,552, +7.3%)
    After applying 2OS skim cut: 10,175 (+1,786, +17.6%)
    After applying  DD skim cut: 38,032 (+7,114, +18.7%)
After applying abs(Dst_M-D0_M-145.454) < 2.0: 574,645
    After applying ISO skim cut: 417,075 (-3,571, -0.9%)
    After applying 1OS skim cut: 19,310 (-356, -1.8%)
    After applying 2OS skim cut: 8,006 (-383, -4.8%)
    After applying  DD skim cut: 29,836 (-1,082, -3.6%)
After applying !(reweighting_69_gen3_pt2 < 0.01 || reweighting_89_gen3_pt2 < 0.01): 573,853
    After applying ISO skim cut: 416,502 (-4,144, -1.0%)
    After applying 1OS skim cut: 19,284 (-382, -2.0%)
    After applying 2OS skim cut: 7,993 (-396, -5.0%)
    After applying  DD skim cut: 29,808 (-1,110, -3.7%)

yipengsun commented 2 years ago

If I disable single candidate selections and apply the missing cut, indeed I fully recover the numbers reported by Phoebe's templates:

After applying !(reweighting_69_gen3_pt2 < 0.01 || reweighting_89_gen3_pt2 < 0.01): 573,853
    After applying ISO skim cut: 420,646 (+0, +0.0%)
    After applying 1OS skim cut: 19,666 (+0, +0.0%)
    After applying 2OS skim cut: 8,389 (+0, +0.0%)
    After applying  DD skim cut: 30,918 (+0, +0.0%)

I think this shows that our skim cuts are definitely consistent w/ Phoebes.

manuelfs commented 2 years ago

That's great. So our current working hypothesis is that the PID is different between Run 1 and Run 2. Perhaps the factor of 2.6 is due to the cut on uBDT, and the 1.6 on the DD sample is due to applying the iso_nnk on top of that (the 1OS skim not being affected because it is a veto and makes less of a difference).

We can check this hypothesis with a cutflow leaving the PID cuts for last.

yipengsun commented 2 years ago

I made a cutflow that apply the PID last, which can be generated with workflows/rdx_cutflows.py rdx-cutflow-data-pid-last:

Cut	Run 1	Run 2	Run 1 $\epsilon$	Run 2 $\epsilon$	$\epsilon$ ratio
Total events	216987	5349722	-	-	-
Offline $D^0$ cuts	102287	1056832	47.1	19.8	0.42
Offline $\mu$ cuts	96703	874676	94.5	82.8	0.88
Offline $D^* \mu$ combo cuts	77498	658823	80.1	75.3	0.94
$K \pi$ PID	75002	630664	96.8	95.7	0.99
$\mu$ PID	74245	507542	99.0	80.5	0.81
$BDT_{iso} < 0.15$	48004	323746	64.7	63.8	0.99
Total eff.	-	-	22.1	6.1	0.27
Yield ratio x 0.35	48004	323746	-	-	2.39

A couple of observations:

The final number (2.39) is consistent (we are missing UBDT cut; also treating the one-candidate differently, the cutflow is more forgiving)
The main efficiency loss actually comes from the Offline D0 cuts, w/o PID

manuelfs commented 2 years ago

What ntuples were used for this cutflow? What is the expected ratio of yields?

yipengsun commented 2 years ago

Ntuples used:

https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/1e6e4ceea20f2a2c23aab217583e14cfa9560e3d/workflows/rdx_cutflows.py#L109-L117

The expected ratio should be ~2.6 (the number we got from the run1-2 fit template comparisons)

yipengsun commented 2 years ago

I forgot to apply trigger cuts to the cutflow above. Here's the fixed cutflow (still w/ the same inputs):

Cut	Run 1	Run 2	Run 1 $\epsilon$	Run 2 $\epsilon$	$\epsilon$ ratio
Total events	216987	5349722	-	-	-
Trigger	203010	3104680	93.6	58.0	0.62
Offline $D^0$ cuts	99358	688032	48.9	22.2	0.45
Offline $\mu$ cuts	93899	572628	94.5	83.2	0.88
Offline $D^* \mu$ combo cuts	75281	429461	80.2	75.0	0.94
$K \pi$ PID	73571	414057	97.7	96.4	0.99
$\mu$ PID	72832	308792	99.0	74.6	0.75
$BDT_{iso} < 0.15$	47060	190133	64.6	61.6	0.95
Total eff.	-	-	21.7	3.6	0.16
Yield ratio x 0.35	47060	190133	-	-	1.43

Now, this number is consistent w/ the 1.4x increase (DON'T confuse this w/ the 2.6, like I did). So what's going on here? Here's a few theories:

The 2011 MD ntuple is not generated correctly

I doubt it, as I compared the number of events between ours and Phoebes:

> uidcommon -n 0.9.5-bugfix/Dst_D0-std/Dst_D0--21_10_07--std--LHCb_Collision11_Beam3500GeV-VeloClosed-MagDown_Real_Data_Reco14_Stripping21r1_90000000_SEMILEPTONIC.DST.root -N ref-rdx-run1/Dst-std/Dst--20_09_16--std--data--2011--md--phoebe.root -t TupleB0/DecayTree -T YCands/DecayTree
Total common IDs: 194325

> uiddump -n 0.9.5-bugfix/Dst_D0-std/Dst_D0--21_10_07--std--LHCb_Collision11_Beam3500GeV-VeloClosed-MagDown_Real_Data_Reco14_Stripping21r1_90000000_SEMILEPTONIC.DST.root -t TupleB0/DecayTree
Num of events: 229552, Num of IDs: 216987, Num of UIDs: 205508
Num of duplicated IDs: 11479, Num of duplicated events: 12565, duplicate rate: 5.47%

> uiddump -n ref-rdx-run1/Dst-std/Dst--20_09_16--std--data--2011--md--phoebe.root -t YCands/DecayTree
Num of events: 217936, Num of IDs: 208846, Num of UIDs: 200406
Num of duplicated IDs: 8440, Num of duplicated events: 9090, duplicate rate: 4.17%

The one candidate only implementation is very different between cutflow script and babymaker postprocessing
- We check it the following way:
- For run 2 oldcut, we first TURN OFF one-cand only. In the output step-2 ntuple, applying is_iso cut:
  - In D* tree: 310869 candidates, 0.47% duplication rate
  - In D0 tree: 1211847 candidates, 0.06% duplication rate
- Now we TURN ON one-cand selection
  - In D* tree: 310352 candidates, 0% dupl
  - In D0 tree: 1210668 candidates, 0% dupl

Note that 310352 doesn't agree w/ 190133 at all! So the second theory seems right. Now I need to think about what's going on here. The number is also very different from the previously reported 170118 from the real data.

yipengsun commented 2 years ago

I think I found the problem: Previously incorrect ordering was passed in both cutflow and babymaker (probably it was copy-pasted) so that:

//d0_endvtx_chi2 / d0_endvtx_ndof // correct
d0_endvtx_ndof / d0_endvtx_chi2 // previously what was actually applied

Old cutflow table:

Cut	Run 1	Run 2	Run 1 $\epsilon$	Run 2 $\epsilon$	$\epsilon$ ratio
Total events	216987	5349722	-	-	-
Trigger	203010	3104680	93.6	58.0	0.62
Offline $D^0$ cuts	99358	688032	48.9	22.2	0.45
Offline $\mu$ cuts	93899	572628	94.5	83.2	0.88
Offline $D^* \mu$ combo cuts	75281	429461	80.2	75.0	0.94
$K \pi$ PID	73571	414057	97.7	96.4	0.99
$\mu$ PID	72832	308792	99.0	74.6	0.75
$BDT_{iso} < 0.15$	47060	190133	64.6	61.6	0.95
Total eff.	-	-	21.7	3.6	0.16
Yield ratio x 0.35	47060	190133	-	-	1.43

So we keep much fewer number of events. I fixed that first in the babymaker YAML, without paying too much attention, and that's why I observed the differences above.

I also fixed that in the cutflow script, and also made a minor fix on Muon selection. Now the cutflow number and the babymaker number (w/o single candidate, and is in cutflow mode) fully agrees:

Cutflow table:

Cut	Run 1	Run 2	Run 1 $\epsilon$	Run 2 $\epsilon$	$\epsilon$ ratio
Total events	216987	5349722	-	-	-
Trigger	203010	3104680	93.6	58.0	0.62
Offline $D^0$ cuts	157096	1137211	77.4	36.6	0.47
Offline $\mu$ cuts	148253	945709	94.4	83.2	0.88
Offline $D^* \mu$ combo cuts	119157	716047	80.4	75.7	0.94
$K \pi$ PID	116497	690521	97.8	96.4	0.99
$\mu$ PID	115321	516350	99.0	74.8	0.76
$BDT_{iso} < 0.15$	74535	318207	64.6	61.6	0.95
Total eff.	-	-	34.3	5.9	0.17
Yield ratio x 0.35	74535	318207	-	-	1.51

babymaker:

> make rdx-ntuple-run2-data-oldcut-debug

> uiddump -n gen/rdx-ntuple-run2-data-oldcut-debug/ntuple/Dst--21_11_04--cutflow_data--data--2016--md.root -t tree -c 'l0 & hlt1 & hlt2 & d0_ok & mu_ok & dstmu_ok & d0_pid_ok & mu_pid_ok & iso_bdt1 < 0.15' 
Num of events: 319750, Num of IDs: 318207, Num of UIDs: 316701
Num of duplicated IDs: 1506, Num of duplicated events: 1543, duplicate rate: 0.48%

Note the number 318207.

yipengsun commented 2 years ago

I applied our step-2 offline cuts and skim cuts to Phoebe's 2011 MD ntuple, and compared the output to Pheobe's step-1.5 ntuples w/ skim and year/polarity cuts applied, and they mostly agree (up to +/- 1 candidate per skim). The general workflow and conclusion is documented at: https://github.com/umd-lhcb/rdx-run2-analysis/blob/master/docs/cuts/cut_validation.md.

Consider D* cut validation down for normal templates.

yipengsun commented 2 years ago

I searched the following keywords in Phoebe's 20210105 version of the ANA:

wrong sign
wrong-sign
same sign
same-sign

I don't see any mentioning of additional cut other than requiring the Mu/Pi to have the opposite sign.

yipengsun commented 2 years ago

I believe in Phoebe's step-1.5 ntuples, the wrong-sign samples can be distinguished from the normal sample w/ DstIDprod and IDprod: The IDprod < 0 && DstIDprod > 0 means wrong-sign Mu; the IDprod > 0 && DstIDprod < 0 means wrong-sign Pi.

yipengsun commented 2 years ago

There's some plot w/o D0/D* mass window cut in the ANA note to show that the cuts select mostly real D0/D*. I tried to reproduce these plots

ref_d0 D0_KPi_mass_no_mass_window_cut

ref_dst

Dst_KPi_mass_no_mass_window_cut Dst_KPiPislow_mass_no_mass_window_cut

And noticed that the D* plots have an additional cut. It is a DaVinci-level D* mass window cut: https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/4307634453eab6031afaf7b6e779d2ea5ba260e5/run1-rdx/reco_Dst_D0.py#L477

Anyway, this doesn't affect our final result in any way and consider this checked.

yipengsun commented 2 years ago

Note the definitions of wrong-sign-related variables in Phoebe's AddB.C:

    IDprod = (double)muplus_ID*D0_ID;
    DstIDprod = (double)D0_ID*piminus_ID;

yipengsun commented 2 years ago

Consider validation of the right-sign sample done.

umd-lhcb / lhcb-ntuples-gen

Validate global and skim cuts for D0, D* regular data templates #88