umd-lhcb / lhcb-ntuples-gen

ntuples generation with DaVinci and in-house offline components
BSD 2-Clause "Simplified" License
1 stars 0 forks source link

Port Greg/RD+'s J/psiK reweighting code for RDX run 2 #97

Closed yipengsun closed 2 years ago

yipengsun commented 2 years ago

Greg has shared his J/psi K reweighting code at:

/afs/cern.ch/user/g/gciezare/public/forRDRun2/JpsiK

We need to port this to RDX run 2.

Current status

Possible improvements

Validations

References

yipengsun commented 2 years ago

Latest P-ETA weights are tabulated below (again, DISREGARD uncertainties). I did see major improvements. Previously the colume 12 contains some non-sensible weights (some of the weights are ~600). Now they are look more reasonable.

η \ p 1 (0.0,1250.0) 2 (2500.0) 3 (3750.0) 4 (5000.0) 5 (6250.0) 6 (7500.0) 7 (8750.0) 8 (10000.0) 9 (11250.0) 10 (12500.0) 11 (13750.0) 12 (15000.0) 13 (16250.0) 14 (17500.0) 15 (18750.0) 16 (20000.0) 17 (21250.0) 18 (22500.0) 19 (23750.0) 20 (25000.0)
1 (2.0,2.3) 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.60 ± 0.78 0.76 ± 0.87 0.78 ± 0.88 0.70 ± 0.84 0.63 ± 0.79 0.73 ± 0.85 0.71 ± 0.84 0.70 ± 0.84 0.65 ± 0.80 0.72 ± 0.85 0.68 ± 0.82 0.64 ± 0.80 0.71 ± 0.85 0.74 ± 0.86 0.69 ± 0.83 0.69 ± 0.83 0.74 ± 0.86
2 (2.7) 0.00 ± 0.00 0.00 ± 0.00 0.94 ± 0.97 1.00 ± 1.00 0.94 ± 0.97 0.85 ± 0.92 0.81 ± 0.90 0.76 ± 0.87 0.74 ± 0.86 0.74 ± 0.86 0.73 ± 0.85 0.74 ± 0.86 0.69 ± 0.83 0.67 ± 0.82 0.68 ± 0.82 0.67 ± 0.82 0.65 ± 0.81 0.66 ± 0.81 0.71 ± 0.84 0.71 ± 0.84
3 (3.0) 0.00 ± 0.00 1.06 ± 1.03 1.11 ± 1.06 1.09 ± 1.05 0.97 ± 0.98 0.88 ± 0.94 0.81 ± 0.90 0.78 ± 0.88 0.76 ± 0.87 0.77 ± 0.88 0.76 ± 0.87 0.72 ± 0.85 0.69 ± 0.83 0.75 ± 0.86 0.78 ± 0.89 0.77 ± 0.88 0.79 ± 0.89 0.84 ± 0.91 0.74 ± 0.86 0.64 ± 0.80
4 (3.3) 0.00 ± 0.00 1.15 ± 1.07 1.20 ± 1.09 1.14 ± 1.07 0.99 ± 0.99 0.89 ± 0.95 0.82 ± 0.91 0.79 ± 0.89 0.77 ± 0.88 0.73 ± 0.85 0.74 ± 0.86 0.75 ± 0.86 0.78 ± 0.88 0.76 ± 0.87 0.75 ± 0.87 0.66 ± 0.82 0.73 ± 0.86 0.74 ± 0.86 0.67 ± 0.82 0.61 ± 0.78
5 (3.7) 0.90 ± 0.95 1.25 ± 1.12 1.24 ± 1.11 1.13 ± 1.06 1.02 ± 1.01 0.88 ± 0.94 0.84 ± 0.92 0.78 ± 0.88 0.78 ± 0.89 0.74 ± 0.86 0.78 ± 0.88 0.70 ± 0.84 0.66 ± 0.81 0.69 ± 0.83 0.66 ± 0.81 0.65 ± 0.81 0.70 ± 0.84 0.62 ± 0.79 0.82 ± 0.90 0.80 ± 0.90
6 (4.0) 1.11 ± 1.05 1.28 ± 1.13 1.30 ± 1.14 1.19 ± 1.09 1.06 ± 1.03 0.94 ± 0.97 0.87 ± 0.93 0.83 ± 0.91 0.74 ± 0.86 0.72 ± 0.85 0.75 ± 0.87 0.71 ± 0.84 0.68 ± 0.82 0.73 ± 0.86 0.71 ± 0.84 0.80 ± 0.89 0.72 ± 0.85 0.85 ± 0.92 0.92 ± 0.96 1.07 ± 1.04
7 (4.3) 1.20 ± 1.09 1.32 ± 1.15 1.32 ± 1.15 1.29 ± 1.13 1.21 ± 1.10 1.10 ± 1.05 0.96 ± 0.98 0.99 ± 1.00 0.88 ± 0.94 0.85 ± 0.92 0.93 ± 0.96 0.87 ± 0.93 1.01 ± 1.00 0.64 ± 0.80 0.94 ± 0.97 0.79 ± 0.89 0.92 ± 0.96 0.69 ± 0.83 1.40 ± 1.18 2.60 ± 1.61
8 (4.7) 1.22 ± 1.11 1.40 ± 1.18 1.46 ± 1.21 1.50 ± 1.23 1.47 ± 1.21 1.47 ± 1.21 1.53 ± 1.24 1.50 ± 1.22 1.51 ± 1.23 1.24 ± 1.11 1.28 ± 1.13 1.24 ± 1.11 2.20 ± 1.48 1.82 ± 1.35 1.02 ± 1.01 0.83 ± 0.91 3.48 ± 1.87 0.60 ± 0.78 0.00 ± 0.00 0.00 ± 0.00
9 (5.0) 1.31 ± 1.14 1.48 ± 1.22 1.82 ± 1.35 1.92 ± 1.39 2.22 ± 1.49 2.04 ± 1.43 1.32 ± 1.15 3.08 ± 1.76 88.28 ± 9.40 26.42 ± 5.14 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00
yipengsun commented 2 years ago

In the end, I enlarged PT and PV NDOF (yes, PV NDOF was capped at 200, which makes ~1.6-1.8% events outside the binning range. I increased that to 250), and will apply nearest bins for ALL variables (incl. nTracks) when applying weights to RDX.

I'll update the doc shortly.

yipengsun commented 2 years ago

I'm checking the raw data / MC ratios (without any nan / inf substitution).

There's a couple of nan and inf entries, suggesting that MC doesn't cover some of the bins whereas data does, thus the reweighting is not going to be perfect.

Previously I was replacing nan and inf with 0, but I think it actually makes more sense to replace them with 1.

This is a minor thing but I think I should change that to 1, because 1 means "we don't know the ratio in this bin, but for data this bin is perhaps filled, so let's just keep MC as-is).

What do you think @manuelfs?

yipengsun commented 2 years ago

Actually, 0 / 0 -> nan, x / 0 -> inf, so the most consistent treatment would be:

manuelfs commented 2 years ago

Good argument, I fully agree.

yipengsun commented 2 years ago

After 2-staged (first NDOF-nTracks then PT-ETA, the data-MC comparisons:

The data-MC agreement is not perfect because:

b_ownpv_ndof ntracks

b_pt b_eta

yipengsun commented 2 years ago

Doc is updated at: https://github.com/umd-lhcb/rdx-run2-analysis/blob/master/docs/reweight/JpsiK_reweight.md

@manuelfs I think now this is done. I'm going to add this weight to our RDX MC and running it on a single file to test.

yipengsun commented 2 years ago

Added a J/psi K workflow for RDX. Tested on a single file and the mean of this weights is around 1.1. Consider it done.