Suspicious yields for 2OS and DD skims due to ProbNNk PID weights

yipengsun commented 2 years ago

So far, in our $D^0$ templates, the 2OS and DD skims have suspicious yields compared to run 1.

Comparing skim weights

We apply the global cuts, including D0 mass window cut and in fit range cut

ntuple	ISO	1OS	2OS	DD
2016 data, MD, `D0`	0.515	0.049	0.014	0.092
2016 MC, MD, `D0Mu`	0.167	0.0035	0.00033	0.0047

ISO ProbNNk weights

We require iso_nnk1 < 0.2 && iso_nnk2 < 0.2. In MC, these cuts are applied as weights, named as wpid_iso_nnkN_low_prob. We compare the mean between data and MC.:

variable	mean in data	mean in MC	stdev in data	stdev in MC	mean in MC, upd	stdev in MC, upd
ISO ProbNNk1	0.906	0.139	0.291	0.186	0.929	0.155
ISO ProbNNk2	0.938	0.162	0.242	0.213	0.931	0.160
NNk1 * NNk2	0.860	0.030	0.346	0.076	0.867	0.209

By data, I mean (double)(iso_nnkN < 0.2). By MC, I mean wpid_iso_nnkN_low_prob.

The study is done in 2016 MD for data, D0Mu 2016 MD for MC. NO SKIM CUT APPLIED.

yipengsun commented 2 years ago

My current way of assigning weights to emulate iso_nnk cuts is complete nonsense.

We have tracked down the proper (Phobe's) way to do it. Will implement based on that.

yipengsun commented 2 years ago

To be more specific: My current way of applying weights was just apply a single efficiency from the PIDCalib true $K$ samples passing iso_nnk > 0.2 cut. It is nonsense because we don't take the true ID of the track into account! Blindly applying some weight obtained from PIDCalib true $K$ certainly doesn't make sense.

What Phoebe does is: Apply PIDCalib true weights based on the true ID of the particle: https://github.com/umd-lhcb/RDRDstRun1AnalysisPreservation/blob/7fc28713eb8ff359bd6a4a6f0f7d188118f78afd/proc/AddB.C#L5199-L5255

yipengsun commented 2 years ago

Attached is the latest comparison, with updated iso_nnk weights. I'd say now it looks better, though the efficiency gains among skims are still not constant, which still need more thinking.

22_06_26-2016_D0_sig_norm.pdf

yipengsun commented 2 years ago

All skim cuts are validated on Phoebe's run 1 ntuples; the only difference between MC and data is: For MC, we applying PID cuts as weights. Now the PID weights for MC has reasonable agreement w/ data (comparing 2016 MD data w/ 2016 MD normalization). We consider this naively validated.

umd-lhcb / lhcb-ntuples-gen

Suspicious yields for 2OS and DD skims due to ProbNNk PID weights #118

Comparing skim weights

ISO ProbNNk weights