Closed manuelfs closed 2 years ago
Checked that Yipeng found the right number of events for
0.9.3-production_for_validation/Dst_D0-mc/Dst_D0--21_01_30--mc--MC_2012_Beam4000GeV-2012-MagDown-Nu2.5-Pythia8_Sim08e_Digi13_Trig0x409f0045_Reco14a_Stripping20Filtered_11574020_DSTTAUNU.SAFESTRIPTRIG.DST.root
|04:49:40|lxplus768:~/code/lhcb-ntuples-gen/tools$ ./size_mc_samples.py -i 11574020 -m year --debug -b MagUp Sim09 Sim08a Pythia6
Before proceed, don't forget to run lhcb-proxy-init!!
Skip LFN: 142437 evts in /MC/2012/Beam4000GeV-2012-MagDown-Nu2.5-Pythia6/Sim08a/Digi13/Trig0x409f0045/Reco14a/Stripping20Filtered/11574020/DSTTAUNU.SAFESTRIPTRIG.DST
Skip LFN: 204982 evts in /MC/2012/Beam4000GeV-2012-MagUp-Nu2.5-Pythia6/Sim08a/Digi13/Trig0x409f0045/Reco14a/Stripping20Filtered/11574020/DSTTAUNU.SAFESTRIPTRIG.DST
Skip LFN: 187391 evts in /MC/2012/Beam4000GeV-2012-MagDown-Nu2.5-Pythia8/Sim08a/Digi13/Trig0x409f0045/Reco14a/Stripping20Filtered/11574020/DSTTAUNU.SAFESTRIPTRIG.DST
Skip LFN: 171534 evts in /MC/2012/Beam4000GeV-2012-MagUp-Nu2.5-Pythia8/Sim08a/Digi13/Trig0x409f0045/Reco14a/Stripping20Filtered/11574020/DSTTAUNU.SAFESTRIPTRIG.DST
Skip LFN: 583473 evts in /MC/2012/Beam4000GeV-2012-MagDown-Nu2.5-Pythia6/Sim08e/Digi13/Trig0x409f0045/Reco14a/Stripping20Filtered/11574020/DSTTAUNU.SAFESTRIPTRIG.DST
Skip LFN: 579446 evts in /MC/2012/Beam4000GeV-2012-MagUp-Nu2.5-Pythia6/Sim08e/Digi13/Trig0x409f0045/Reco14a/Stripping20Filtered/11574020/DSTTAUNU.SAFESTRIPTRIG.DST
Use LFN: 614577 evts in /MC/2012/Beam4000GeV-2012-MagDown-Nu2.5-Pythia8/Sim08e/Digi13/Trig0x409f0045/Reco14a/Stripping20Filtered/11574020/DSTTAUNU.SAFESTRIPTRIG.DST
Skip LFN: 612740 evts in /MC/2012/Beam4000GeV-2012-MagUp-Nu2.5-Pythia8/Sim08e/Digi13/Trig0x409f0045/Reco14a/Stripping20Filtered/11574020/DSTTAUNU.SAFESTRIPTRIG.DST
Skip LFN: 1551331 evts in /MC/2012/Beam4000GeV-2012-MagUp-NoRICHesSim-Nu2.5-Pythia8/Sim09a/Trig0x409f0045-NoRichPIDLines/Reco14c/Stripping21Filtered/11574020/DSTTAUNU.SAFESTRIPTRIG.DST
Skip LFN: 1588036 evts in /MC/2012/Beam4000GeV-2012-MagDown-NoRICHesSim-Nu2.5-Pythia8/Sim09a/Trig0x409f0045-NoRichPIDLines/Reco14c/Stripping21Filtered/11574020/DSTTAUNU.SAFESTRIPTRIG.DST
For MC ID 11574020
2012: 614577
0.9.4-trigger_emulation/Dst_D0-mc/Dst_D0--21_04_21--mc--MC_2016_Beam6500GeV-2016-MagDown-Nu1.6-25ns-Pythia8_Sim09j_Trig0x6139160F_Reco16_Turbo03a_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST.root
|04:48:29|lxplus768:~/code/lhcb-ntuples-gen/tools$ ./size_mc_samples.py -i 11574021 -m year -b MagUp Sim08 Pythia6 TrackerOnly --debug
Before proceed, don't forget to run lhcb-proxy-init!!
Skip LFN: 3698975 evts in /MC/2016/Beam6500GeV-2016-MagUp-TrackerOnly-Nu1.6-25ns-Pythia8/Sim09j/Reco16/Filtered/11574021/D0TAUNU.SAFESTRIPTRIG.DST
Skip LFN: 3735122 evts in /MC/2016/Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8/Sim09j/Reco16/Filtered/11574021/D0TAUNU.SAFESTRIPTRIG.DST
Skip LFN: 1511634 evts in /MC/2016/Beam6500GeV-2016-MagUp-Nu1.6-25ns-Pythia8/Sim09j/Trig0x6139160F/Reco16/Turbo03a/Filtered/11574021/D0TAUNU.SAFESTRIPTRIG.DST
Use LFN: 1500395 evts in /MC/2016/Beam6500GeV-2016-MagDown-Nu1.6-25ns-Pythia8/Sim09j/Trig0x6139160F/Reco16/Turbo03a/Filtered/11574021/D0TAUNU.SAFESTRIPTRIG.DST
For MC ID 11574021
2016: 1500395
I want to do some code cleanups to make sure the styling is consistent in this repo. However, I see no point doing it in the middle of development. Once you finalize your changes, can you ping me here so I can do some quick cleanups on these scripts?
BTW, by "consistent coding style", I mean that the code checker flake8
and pylint
won't show any error nor warnings. This project has .flake8
and .pylintrc
included to suppress some default warnings.
Since you are migrating to VS code, you might be able to enable these warnings in VS code, if you like.
I activated pylint
in VSCode and saw that it doesn't like to import
several packages in one line, or no additional line at the end, so I quickly fixed that.
While I think it is helpful to aim at having consistent styles and I'll try to follow the pylint
specifications, I'm wary about enforcing it too strongly, for various reasons
So let us all try to be reasonable.
Ran again the 3 types of cutflow using the same scripts/run_cutflows.py script, and obtained Run 2/Run 1
efficiency ratios consistent with above
D*+ tau
bare
MC 0.9.4 ntuples: ratio corrected by (522494+502736)/(520046+515913.)
Cut | Run 1 | Run 2 | Run 1 $\epsilon$ | Run 2 $\epsilon$ | $\epsilon$ ratio |
---|---|---|---|---|---|
Total events | 118213 | 126958 | - | - | - |
Signal truth-matching | 4388 | 4638 | 3.7 | 3.7 | 0.98 |
Trig. + Strip. | 151 | 397 | 3.4 | 8.6 | 2.49 |
Offline $D^0$ cuts | 71 | 106 | 47.0 | 26.7 | 0.57 |
Offline $\mu$ cuts | 66 | 77 | 93.0 | 72.6 | 0.78 |
Offline $D^* \mu$ combo cuts | 50 | 60 | 75.8 | 77.9 | 1.03 |
$BDT_{iso} < 0.15$ | 43 | 39 | 86.0 | 65.0 | 0.76 |
Total eff. | - | - | 0.04 | 0.03 | 0.84 |
Yield ratio x 0.99 | - | - | 43 | 39 | 0.90 |
D*+ mu
bare
MC 0.9.4 ntuples: ratio corrected by (522494+502736)/(520046+515913.)
Cut | Run 1 | Run 2 | Run 1 $\epsilon$ | Run 2 $\epsilon$ | $\epsilon$ ratio |
---|---|---|---|---|---|
Total events | 118213 | 126958 | - | - | - |
Normalization truth-matching | 76567 | 82950 | 64.8 | 65.3 | 1.01 |
Trig. + Strip. | 2898 | 8702 | 3.8 | 10.5 | 2.77 |
Offline $D^0$ cuts | 1546 | 2171 | 53.3 | 24.9 | 0.47 |
Offline $\mu$ cuts | 1348 | 1731 | 87.2 | 79.7 | 0.91 |
Offline $D^* \mu$ combo cuts | 1039 | 1267 | 77.1 | 73.2 | 0.95 |
$BDT_{iso} < 0.15$ | 880 | 1012 | 84.7 | 79.9 | 0.94 |
Total eff. | - | - | 0.74 | 0.80 | 1.07 |
Yield ratio x 0.99 | - | - | 880 | 1012 | 1.14 |
D** mu
bare
MC 0.9.4 ntuples: ratio corrected by (522494+502736)/(520046+515913.)
Cut | Run 1 | Run 2 | Run 1 $\epsilon$ | Run 2 $\epsilon$ | $\epsilon$ ratio |
---|---|---|---|---|---|
Total events | 118213 | 126958 | - | - | - |
$D^{**}$ truth-matching | 35827 | 37755 | 30.3 | 29.7 | 0.98 |
Trig. + Strip. | 1225 | 3818 | 3.4 | 10.1 | 2.96 |
Offline $D^0$ cuts | 687 | 942 | 56.1 | 24.7 | 0.44 |
Offline $\mu$ cuts | 617 | 755 | 89.8 | 80.1 | 0.89 |
Offline $D^* \mu$ combo cuts | 185 | 244 | 30.0 | 32.3 | 1.08 |
$BDT_{iso} < 0.15$ | 60 | 76 | 32.4 | 31.1 | 0.96 |
Total eff. | - | - | 0.05 | 0.06 | 1.18 |
Yield ratio x 0.99 | - | - | 60 | 76 | 1.25 |
D*+ mu
large MC ntuples: ratio corrected by 614577*0.23*0.080/(1500395*0.07*0.059)
Cut | Run 1 | Run 2 | Run 1 $\epsilon$ | Run 2 $\epsilon$ | $\epsilon$ ratio |
---|---|---|---|---|---|
Total events | 116295 | 834002 | - | - | - |
Trig. + Strip. | 98982 | 264911 | 85.1 | 31.8 | 0.37 |
Offline $D^0$ cuts | 52192 | 67807 | 52.7 | 25.6 | 0.49 |
Offline $\mu$ cuts | 46244 | 54339 | 88.6 | 80.1 | 0.90 |
Offline $D^* \mu$ combo cuts | 43996 | 50911 | 95.1 | 93.7 | 0.98 |
$BDT_{iso} < 0.15$ | 36563 | 40741 | 83.1 | 80.0 | 0.96 |
Total eff. | - | - | 31.4 | 4.9 | 0.16 |
Yield ratio x 1.82 | - | - | 36563 | 40741 | 2.03 |
D*+ mu
): ratio corrected by 1/1.41/2
Cut | Run 1 | Run 2 | Run 1 $\epsilon$ | Run 2 $\epsilon$ | $\epsilon$ ratio |
---|---|---|---|---|---|
Total events | 342641 | 3066797 | - | - | - |
Trig. + Strip. | 202992 | 3006318 | 59.2 | 98.0 | 1.65 |
Offline $D^0$ cuts | 96838 | 568383 | 47.7 | 18.9 | 0.40 |
Offline $\mu$ cuts | 90589 | 348105 | 93.5 | 61.2 | 0.65 |
Offline $D^* \mu$ combo cuts | 73018 | 276307 | 80.6 | 79.4 | 0.98 |
$BDT_{iso} < 0.15$ | 47178 | 172140 | 64.6 | 62.3 | 0.96 |
Total eff. | - | - | 13.8 | 5.6 | 0.41 |
Yield ratio x 0.35 | - | - | 47178 | 172140 | 1.29 |
The absolute efficiencies for B -> D*+ mu nu
in the bare
and large MC samples, calculated as N_aftercut*eff_gen*eff_filter/(N_BKK*eff_BF)
. are
bare
: 880*0.3331/((522494+502736)0.6463) = 0.0442%bare
: 1012*0.3331/((520046+515913)0.6463) = 0.0503%For the bare
(MC ID 11874091
), the D*+ mu nu
BF is 64.63%, the 33.3% generator-level efficiency comes from here, and the filter efficiency should be 1.
With respect to the bare
efficiencies, Run 1 large is 56% and Run 2 large 99%, so it looks like the former may be wrong.
A key difference between these samples are the FFs used for the MC generation
HQET 1.20 1.426 0.818 0.908
for the Run 1 large (MC ID 11574020
), which was old and buggyHQET2 1.122 0.908 1.270 0.852 1.15
for the Run 2 large (MC ID 11574021
)HQET2 1.207 0.908 1.406 0.853
for the bare
(MC ID 11874091
)If the difference comes from the different FFs, recalculating the efficiencies after FF reweighting should improve the agreement, but perhaps not fully given that the generator efficiency for the Run 1 large sample is calculated based on the buggy FFs.
For the Run1 exclusive production (sim09 ver) I did the exercise of getting the filter efficiency as # of events coming out of the filter-stage jobs divided by # of events coming out of the generator-stage jobs and got 10.5% (up from the 7% expectation, but normalization-like modes seem to do better in the filter for whatever reason than, e.g., DD). This still only gets you up to 0.0369%, which is better but not matching Run2 yet
Phoebe detailed the process to find the total number of generated events before filtering, which can help determine the filter efficiency together with the BKK number
Applications -> Data -> Transformation Monitor
Applications -> Data -> Production Request
For instance, for the Run 2 FullSim MD production, the ProdID is 121220. The Request ID is 74234, the subrequest 74252, the number of generated events 6087052 and events that pass the filter 1502907, resulting in a filter efficiency of 24.7%.
Phobe did the same exercise for the Run 1 request and found a filter efficiency of 10.5% (for the Sim09 production), which makes the absolute efficiency 0.0369%, just 16% below the cocktail's 0.0442%.
The 16% difference now may be coming from the FFs. We can check if this is plausible with two tests
D*+ tau nu
(MC ID 11574010). In this case, both the Run 1 large and cocktail samples are generated with ISGW2
, while the Run 2 large uses HQET2
.I think after fixing the chi2/ndof < 4
selection bug, the efficiencies between bare SIGNAL COMPONENT and FullSim SIGNAL are very similar:
Note: This is w/o applying any FF weights.
Cut | Run 1 | Run 2 | Run 1 $\epsilon$ | Run 2 $\epsilon$ | $\epsilon$ ratio |
---|---|---|---|---|---|
Total events | 118213 | 126958 | - | - | - |
Signal truth-matching | 4388 | 4638 | 3.7 | 3.7 | 0.98 |
Trig. + Strip. | 151 | 397 | 3.4 | 8.6 | 2.49 |
Offline $D^0$ cuts | 120 | 207 | 79.5 | 52.1 | 0.66 |
Offline $\mu$ cuts | 110 | 162 | 91.7 | 78.3 | 0.85 |
Offline $D^* \mu$ combo cuts | 70 | 115 | 63.6 | 71.0 | 1.12 |
$BDT_{iso} < 0.15$ | 61 | 91 | 87.1 | 79.1 | 0.91 |
Total eff. | - | - | 0.05 | 0.07 | 1.39 |
Yield ratio x 0.99 | 61 | 91 | - | - | 1.48 |
Cut | Run 1 | Run 2 | Run 1 $\epsilon$ | Run 2 $\epsilon$ | $\epsilon$ ratio |
---|---|---|---|---|---|
Total events | 34794 | 150544 | - | - | - |
Trig. + Strip. | 21534 | 45081 | 61.9 | 29.9 | 0.48 |
Offline $D^0$ cuts | 17315 | 23191 | 80.4 | 51.4 | 0.64 |
Offline $\mu$ cuts | 15935 | 18880 | 92.0 | 81.4 | 0.88 |
Offline $D^* \mu$ combo cuts | 14992 | 17675 | 94.1 | 93.6 | 1.00 |
$BDT_{iso} < 0.15$ | 12481 | 13961 | 83.3 | 79.0 | 0.95 |
Total eff. | - | - | 35.9 | 9.3 | 0.26 |
Yield ratio x 1.31 | 12481 | 13961 | - | - | 1.46 |
We should try to understand the 1.25 in normalization, but everything else looks fine for now.
As presented to the semileptonic group, we had seen a 37% higher efficiency in Run 2
B->D*+ mu nu
data than in Run 1 (with Run 1 cuts and accounting for a luminosity ratio of 1.41 and cross section ratio of 2)We saw similar (though lower) factors with the cocktail MC
However, when we tried to do this comparison with the large
B->D*+ mu nu
MC samples, we found a much higher efficiency in Run 2We need to understand the efficiencies of the large MC samples, which will be the ones we use in the analysis. First steps will be:
lhcb-ntuples-gen
) and latest ntuples