Port Greg/RD+'s J/psiK reweighting code for RDX run 2

yipengsun commented 2 years ago

Greg has shared his J/psi K reweighting code at:

/afs/cern.ch/user/g/gciezare/public/forRDRun2/JpsiK

We need to port this to RDX run 2.

Current status

[x] Produce J/psi K data sample ntuples locally
[x] Produce J/psi K MC sample ntuples locally
[x] Update Greg's selection
[x] Run Greg's fit
[x] Submit J/psi K data production to the GRID
[x] Submit J/psi K MC (12143001) production to the GRID
[x] Add standard weights to MC
[x] Compute efficiency ratios in the following sequence:
- b_OWNPV_NDOF and b_nTracks
- b_P and b_ETA
[x] Integrate this weight to our workflow

Possible improvements

Update Greg's selection
- Currently the selections are mostly aligned w/ run 1 analysis. We may need to update the selections once we finish our selection optimization
Apply a DiMuon weight. See here for more details
Apply an efficiency correction based on DiMuon / All Trigger efficiency, binned in P,PT of the B meson
Use an Ipatia function to describe signal
Apply a nSPDhits < 450 cut for data, a weight for MC. This can be derived s.t. nSPDhits = nSPDhits(nTracks) by looking at the L0DiMuon trigger line
- RD+ did this
- Not sure if we need it, because our L0HadronTOS effectively does this

Validations

Stripping: See line here. There's actually a PIDK > 0 in the K requirement. It is not ideal, but we have a tighter PIDK > 4 in the offline cut, and hopefully the generated weights will make this baked-in cut non-effective
Fit quality: See this comment
sWeight test: See this post

References

Slides on sWeights: https://indico.cern.ch/event/940874/contributions/3953530/attachments/2127199/3581560/sweights_ms.pdf
Paper: https://arxiv.org/abs/physics/0402083

yipengsun commented 2 years ago

Latest P-ETA weights are tabulated below (again, DISREGARD uncertainties). I did see major improvements. Previously the colume 12 contains some non-sensible weights (some of the weights are ~600). Now they are look more reasonable.

η \ p	1 (0.0,1250.0)	2 (2500.0)	3 (3750.0)	4 (5000.0)	5 (6250.0)	6 (7500.0)	7 (8750.0)	8 (10000.0)	9 (11250.0)	10 (12500.0)	11 (13750.0)	12 (15000.0)	13 (16250.0)	14 (17500.0)	15 (18750.0)	16 (20000.0)	17 (21250.0)	18 (22500.0)	19 (23750.0)	20 (25000.0)
1 (2.0,2.3)	0.00 ± 0.00	0.00 ± 0.00	0.00 ± 0.00	0.60 ± 0.78	0.76 ± 0.87	0.78 ± 0.88	0.70 ± 0.84	0.63 ± 0.79	0.73 ± 0.85	0.71 ± 0.84	0.70 ± 0.84	0.65 ± 0.80	0.72 ± 0.85	0.68 ± 0.82	0.64 ± 0.80	0.71 ± 0.85	0.74 ± 0.86	0.69 ± 0.83	0.69 ± 0.83	0.74 ± 0.86
2 (2.7)	0.00 ± 0.00	0.00 ± 0.00	0.94 ± 0.97	1.00 ± 1.00	0.94 ± 0.97	0.85 ± 0.92	0.81 ± 0.90	0.76 ± 0.87	0.74 ± 0.86	0.74 ± 0.86	0.73 ± 0.85	0.74 ± 0.86	0.69 ± 0.83	0.67 ± 0.82	0.68 ± 0.82	0.67 ± 0.82	0.65 ± 0.81	0.66 ± 0.81	0.71 ± 0.84	0.71 ± 0.84
3 (3.0)	0.00 ± 0.00	1.06 ± 1.03	1.11 ± 1.06	1.09 ± 1.05	0.97 ± 0.98	0.88 ± 0.94	0.81 ± 0.90	0.78 ± 0.88	0.76 ± 0.87	0.77 ± 0.88	0.76 ± 0.87	0.72 ± 0.85	0.69 ± 0.83	0.75 ± 0.86	0.78 ± 0.89	0.77 ± 0.88	0.79 ± 0.89	0.84 ± 0.91	0.74 ± 0.86	0.64 ± 0.80
4 (3.3)	0.00 ± 0.00	1.15 ± 1.07	1.20 ± 1.09	1.14 ± 1.07	0.99 ± 0.99	0.89 ± 0.95	0.82 ± 0.91	0.79 ± 0.89	0.77 ± 0.88	0.73 ± 0.85	0.74 ± 0.86	0.75 ± 0.86	0.78 ± 0.88	0.76 ± 0.87	0.75 ± 0.87	0.66 ± 0.82	0.73 ± 0.86	0.74 ± 0.86	0.67 ± 0.82	0.61 ± 0.78
5 (3.7)	0.90 ± 0.95	1.25 ± 1.12	1.24 ± 1.11	1.13 ± 1.06	1.02 ± 1.01	0.88 ± 0.94	0.84 ± 0.92	0.78 ± 0.88	0.78 ± 0.89	0.74 ± 0.86	0.78 ± 0.88	0.70 ± 0.84	0.66 ± 0.81	0.69 ± 0.83	0.66 ± 0.81	0.65 ± 0.81	0.70 ± 0.84	0.62 ± 0.79	0.82 ± 0.90	0.80 ± 0.90
6 (4.0)	1.11 ± 1.05	1.28 ± 1.13	1.30 ± 1.14	1.19 ± 1.09	1.06 ± 1.03	0.94 ± 0.97	0.87 ± 0.93	0.83 ± 0.91	0.74 ± 0.86	0.72 ± 0.85	0.75 ± 0.87	0.71 ± 0.84	0.68 ± 0.82	0.73 ± 0.86	0.71 ± 0.84	0.80 ± 0.89	0.72 ± 0.85	0.85 ± 0.92	0.92 ± 0.96	1.07 ± 1.04
7 (4.3)	1.20 ± 1.09	1.32 ± 1.15	1.32 ± 1.15	1.29 ± 1.13	1.21 ± 1.10	1.10 ± 1.05	0.96 ± 0.98	0.99 ± 1.00	0.88 ± 0.94	0.85 ± 0.92	0.93 ± 0.96	0.87 ± 0.93	1.01 ± 1.00	0.64 ± 0.80	0.94 ± 0.97	0.79 ± 0.89	0.92 ± 0.96	0.69 ± 0.83	1.40 ± 1.18	2.60 ± 1.61
8 (4.7)	1.22 ± 1.11	1.40 ± 1.18	1.46 ± 1.21	1.50 ± 1.23	1.47 ± 1.21	1.47 ± 1.21	1.53 ± 1.24	1.50 ± 1.22	1.51 ± 1.23	1.24 ± 1.11	1.28 ± 1.13	1.24 ± 1.11	2.20 ± 1.48	1.82 ± 1.35	1.02 ± 1.01	0.83 ± 0.91	3.48 ± 1.87	0.60 ± 0.78	0.00 ± 0.00	0.00 ± 0.00
9 (5.0)	1.31 ± 1.14	1.48 ± 1.22	1.82 ± 1.35	1.92 ± 1.39	2.22 ± 1.49	2.04 ± 1.43	1.32 ± 1.15	3.08 ± 1.76	88.28 ± 9.40	26.42 ± 5.14	0.00 ± 0.00	0.00 ± 0.00	0.00 ± 0.00	0.00 ± 0.00	0.00 ± 0.00	0.00 ± 0.00	0.00 ± 0.00	0.00 ± 0.00	0.00 ± 0.00	0.00 ± 0.00

yipengsun commented 2 years ago

In the end, I enlarged PT and PV NDOF (yes, PV NDOF was capped at 200, which makes ~1.6-1.8% events outside the binning range. I increased that to 250), and will apply nearest bins for ALL variables (incl. nTracks) when applying weights to RDX.

I'll update the doc shortly.

yipengsun commented 2 years ago

I'm checking the raw data / MC ratios (without any nan / inf substitution).

There's a couple of nan and inf entries, suggesting that MC doesn't cover some of the bins whereas data does, thus the reweighting is not going to be perfect.

Previously I was replacing nan and inf with 0, but I think it actually makes more sense to replace them with 1.

This is a minor thing but I think I should change that to 1, because 1 means "we don't know the ratio in this bin, but for data this bin is perhaps filled, so let's just keep MC as-is).

What do you think @manuelfs?

yipengsun commented 2 years ago

Actually, 0 / 0 -> nan, x / 0 -> inf, so the most consistent treatment would be:

nan -> 0
inf -> 1

manuelfs commented 2 years ago

Good argument, I fully agree.

yipengsun commented 2 years ago

After 2-staged (first NDOF-nTracks then PT-ETA, the data-MC comparisons:

The data-MC agreement is not perfect because:

For data, some bins have negative weights (due to sWeight). For these bins, data counts are manually set to 1
For MC, it doesn't fully cover the data range. That is, there are bins where data counts are non-0 but MC counts are 0

b_ownpv_ndof ntracks