Differences between data and MC cut

yipengsun commented 3 years ago

Here I'll document the differences between data and MC cuts and my implementations. Some of the cuts requires #83 to be finished first.

The general idea is: We DONT require truth-matching, except for Mu (because we have a dedicated Mu mis-ID samples to study the effects of misID).

cut name	data	MC
`K` PID	`PIDK > 4 & !isMuon`	apply PID weights
`Pi` PID	`PIDK < 2 & !isMuon`	apply PID weights
`Mu` PID	`isMuon & PIDmu > 2 & PIDe < 1 & BDTmu > 0.25`	has a true `Mu` and apply official PID weights and uBDT weights (The true `Mu` requirement is enforced at truth-matching already)

yipengsun commented 3 years ago

I separated the UBDT cuts from the regular PID cuts, because candidates that fail these cuts are still used as some control samples.

yipengsun commented 3 years ago

For run 2, 95% candidates that pass the regular selection cuts also pass the UBDT cut. This is true for both D0 and D* trees.

yipengsun commented 3 years ago

Phoebe has:

The DD skim cut regarding iso_NNkw at: https://gitlab.cern.ch/bhamilto/rdvsrdst-histfactory/-/blob/master/proc/redoHistos_Dst.C#L1584
Which comes into play for 1OS/D** at: https://gitlab.cern.ch/bhamilto/rdvsrdst-histfactory/-/blob/master/proc/redoHistos_Dst.C#L1642
For 2OS, https://gitlab.cern.ch/bhamilto/rdvsrdst-histfactory/-/blob/master/proc/redoHistos_Dst.C#L1618

I don't see truth-matching there.

yipengsun commented 3 years ago

We confirmed with Pheobe that for K, Pi, we don't require the particle to be truth-matched. This is because we don't have a separate misID sample.

yipengsun commented 2 years ago

Here I document the main difference between the data and MC cuts, as shown in our run 2 postprocessing YAML:

MC has NO PID cut (note how d0_pid_ok and mu_pid_ok are both set to true)
- The mu_ubdt_ok is unused for MC, we'll apply it as a weight (for now the wpid_ubdt is set to 1.0)
We require MC events to pass Phoebe/Alex's truth matching
We require MC to have a valid HAMMER weight (except for DDX and D**s samples)
- Note that before feeding input to HAMMER, I have a naive truth-matching which should be looser than Phoebe/Alex's
- We truth-matching the Muon ONLY.
- Then we require that HAMMER processed the event fine
The L0 triggers are applied as weights, so the l0 boolean is set to true for MC.
- The emulated L0 triggers are applied as a weight wtrg. Note that we require either D0 L0Hadron TOS or B L0Global TIS, so the L0 weight is defined as :d0_l0_hadron_tos_emu+wtrg_l0_tis - d0_l0_hadron_tos_emu*wtrg_l0_tis
Hlt1 is applied normally since they are emulated as booleans.
There's no Hlt2 cut since it's applied in DaVinci. So hlt2 is set to true for MC as well.
The skim cuts are applied as weights, and the scheme is consistent w/ what we discussed at here (private)
This commit has a global weight cut (because in one of the samples there's a single candidate that has a w of ~120, while the rest is below 10). Still, I think it's a bad idea to cut on weight so early so I'll remove it before committing the step-2 MC ntupes

FYI @manuelfs @Svende @afernez

manuelfs commented 2 years ago

This is very helpful, thank you very much Yipeng 🙏

umd-lhcb / lhcb-ntuples-gen

Differences between data and MC cut #86