Closed manuelfs closed 2 years ago
I think these suggestions make sense. I'll see if the run
and event
variables can be defined inside rename
instead of calculation
(because for different ntuples they may have different types). Everything else seems fine by me.
Also, for the ntp_disk_usage.C
, we can make it directly compilable, and put it in a src
folder (to store all compilable C++ code). We can also write a rule to compile them and store the compiled executable in the scripts
folder, with a .exe
suffix, so that all directly usable scripts are still in the bin
folder.
I plan to add the naming conventions to the Nomenclature section of the wiki, @manuelfs what do you think?
Also, Probably a good idea to keep this open til the very end to discuss all naming conventions? If so, I'll pin this issue.
@manuelfs FYI, now a branch can be kept and renamed at the same time. I've already updated the run 2 w/ run 1 cut YAML to reflect this change.
I've updated cut flags and cut documentation @manuelfs :
Flags: https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/master/postprocess/rdx-run2/rdx-run2_oldcut.yml Doc: https://github.com/umd-lhcb/rdx-run2-analysis/blob/master/docs/cuts/cut_flag_review.md
Old cuts: is_normal & d_mass_window_ok & is_<skim>
From ntuple: /home/syp/downloads/sample_ntuples/D0--22_02_24--std--data--2016--md--000--old.root
ISO: 5,822
1OS: 537
2OS: 146
DD: 935
New cuts: mu_ubdt_ok & is_<skim>
From ntuple: /home/syp/downloads/sample_ntuples/D0--22_02_24--std--data--2016--md--000--new.root
ISO: 5,822
1OS: 537
2OS: 146
DD: 935
New cuts, without UBDT: is_<skim>
print_skim_size.py ~/downloads/sample_ntuples/D0--22_02_24--std--data--2016--md--000--new.root
From ntuple: /home/syp/downloads/sample_ntuples/D0--22_02_24--std--data--2016--md--000--new.root
ISO: 5,982
1OS: 561
2OS: 170
DD: 1,049
I've upload the sample ntuples to glacer
at: /home/syp/public/sample_ntuples
.
Thank you, at first sight it looks like a great improvement. I checked that indeed is_iso
is the same as is_iso_loose && b_m_ok && dx_m_ok && in_fit_range
.
The one thing I'm not sure of is how to select the fit templates for the MC. We could define the is_<skim>
and is_<skim>_loose
flags for MC to include all the cuts that should be applied to MC, that is, same cuts as data minus PID/some trigger. That way, plotting data or MC in a skim would both be tree->Draw("mm2", "is_iso * weight")
, with the weight
of data being 1, and the weight
of MC including the PID/some trigger cuts.
For the MC skims, it's just wiso
, which is defined to be w*wskim_iso*skim_global_ok
, where w
is some global weights, and wskim_iso
is like is_iso_loose
, and skim_global_ok
is a boolean that only contains kinematic cuts.
The nice thing about MC is that we don't care the SB region.
And for MC you should forget about the is_iso
boolean because we need a weight for MC, not a boolean.
Ah, I had forgotten that we needed different weights for the different skims.
Still, I wonder whether it would be worth to homogenize data and MC. Given that wiso
encapsulates both weights and cuts for MC, perhaps we can put the is_iso
cuts for data in wiso
instead? That would allow us to plot data and MC as tree->Draw("mm2", "wiso")
That's a good point, but is_iso
is of type bool
and wiso
double
. Maybe we can do the following:
is_iso
branch as-is (boolean for data, doesn't exist in MC)wiso
branch for data to be is_iso
cased to double
wiso
as-is for MC@manuelfs We discussed that we don't need to differentiate between B0/B+
, D0/D+
after all. So let's just refer to both as b/d
instead of b0,b/d0,d
.
Let's wait a couple of days to see if we need additional changes.
I tried to count the total size of 2016 MagDown 1277341
. It is ~7 GB. Not too bad.
I feel changing the names for D0
is too much, as there's lots of flags that has name d0_XXX
(and the documentation needs to be changed too for consistency). I plan to change b/b0
to b
only, while leaving d0
as-is.
This hybrid approach is not too bad, as you don't need to remember which B
meson you are working on, and the dst/d0
is already differentiated anyway.
What do you think about this @manuelfs ?
The compromise sounds good
Closed for now.
I think we may have started an issue to discuss branch naming convention, but I can't find it nor there is anything in the wiki, so let us discuss here.
The more I work with the step 2 ntuples, the more I love them. All of them having a
TTree
namedtree
is incredibly convenient, and most of the names that @yipengsun chose are so easy to type and remember. I just committed a few changes based of the rules below to make them a wee bit shorter and clearerother_trks
→trks_other
)flag_
a bit confusing (often I think of flagged events as bad ones) and long, so I propose_ok
at the end (eg,flag_d0
→d0_ok
)is_2os
)flag_l0
→l0
)event
andrun
which are easier to type. I lefteventNumber
andrunNumber
because the code needed themIf these conventions are fine, we should add them to the wiki.
By the way, starting with the name of the particle has helped me in the past find which group of branches are contributing most to the size of the tree by grouping all branches starting with the same suffix before the first
_
(eg,d0_
). For instance, the output of thescripts/ntp_disk_usage.C
script I just committed on the step 2 ntuples looks like thisindicating 14% of the ntuple is taken by the
d0_
variables.