umd-lhcb / lhcb-ntuples-gen

ntuples generation with DaVinci and in-house offline components
BSD 2-Clause "Simplified" License
1 stars 0 forks source link
analysis

lhcb-ntuples-gen github CI

ntuples generation with DaVinci and in-house offline components. Please refer to project wiki for more details about installation, usage, and data sources of this project.

Quick set up

Type in a terminal

git clone git@github.com:umd-lhcb/lhcb-ntuples-gen
cd lhcb-ntuples-gen
git remote add julian git@lhcb.physics.umd.edu:lhcb-ntuples-gen
git remote add glacier git@10.229.60.85:lhcb-ntuples-gen
git annex init --version=7
git submodule update --init  # Do this before git annex sync to avoid potential mess-up of submodule pointers!
git annex sync

nix develop  ## Can take an hour
make install-dep
make install-dep-pip ## To install packages needed for JpsiK reweighting, including zfit

Generation of step-1 ntuples (DaVinci)

Development of the DaVinci scripts can be done locally in your laptop by running our docker image of DaVinci. Install docker as described in the wiki and pull the image with

docker pull umdlhcb/lhcb-stack-cc7:DaVinci-v45r6-SL

For instance, to test the standard data script you would first pull the example .dst files, would then enter docker, and run the script

git annex get run2-rdx/data/data-2016-md/00102837*
make docker-dv
cd run2-rdx
./run.sh conds/cond-std-2016.py

After your script does what you want, you are ready to send ganga jobs to the LHCb grid as detailed in the wiki.

Generation of step-2 ntuples (babies)

The step-1 ntuples coming out of DaVinci are processed with the babymaker, a neat script that allows for easy branch renaming and deleting, as well as cut selection and calculation of new branches. This is configured in YAML files.

For instance, the tracker-only MC ntuples used to produce the fit templates use postprocess/rdx-run2/rdx-run2_oldcut.yml. These ntuples are currently produced by first downloading the step-1 ntuples from the annex. Since these are over 1 TB, this is typically done in glacier inside a tmux

tmux
git annex get ntuples/0.9.6-2016_production/Dst_D0-mc-tracker_only

The generation of the step-2 babies can be quite slow, currently taking about two days to run, mainly because of the normalization (and likely becaue HAMMER FF weights are recalculated each time--TODO to avoid this, these ought to be cached by saving them to the subfolders in ntuples/0.9.6-2016_production/Dst_D0-mc-tracker_only). The ntupling is run with the following (specific options can be found inside workflows/rdx.py):

tmux
cd workflows
## Takes 37 hours, output is 422GB
./rdx.py rdx-ntuple-run2-mc-to-sig-norm    | tee step2-ntuple_mc-to-sig-norm.log
## Takes 75 min, output is 58GB
./rdx.py rdx-ntuple-run2-mc-to-ddx         | tee step2-ntuple_mc-to-ddx.log
## Takes 11hours, output is 81GB
./rdx.py rdx-ntuple-run2-mc-to-dstst       | tee step2-ntuple_mc-to-dstst.log
## Takes 45 min, output is 2.7GB
./rdx.py rdx-ntuple-run2-mc-to-d_s         | tee step2-ntuple_mc-to-d_s.log
## Takes 45 min, output is 23GB
./rdx.py rdx-ntuple-run2-mc-to-dstst-heavy | tee step2-ntuple_mc-to-dstst-heavy.log
## Takes ??, output is 10GB
./rdx.py rdx-ntuple-run2-data              | tee step2-ntuple_data.log
## Takes 65 min, output is 22GB
./rdx.py rdx-ntuple-run2-mu_misid          | tee step2-ntuple_mu_misid.log
## Takes 65 min, output is 22GB
./rdx.py rdx-ntuple-run2-mu_misid-vmu      | tee step2-ntuple_mu_misid-vmu.log

This generation relies on various auxiliary ntuples and weights. Some aux ntuples need to be generated prior to running the above commands. Namely:

The other auxiliary ntuples are calculated on the fly if not cached:

The step-2 ntuples (outputted to ntuple_merged folders) can then be copied to rdx-run2-analysis/ntuples and annexed, and will be used in that repository to produce the fit templates and other studies.

Updating PID weights in Monte Carlo

MC weights are saved in histograms that we store in run2-rdx/reweight/pid/root-run2-rdx_oldcut-shifted. These histograms are calculated with the pidcalib2 package. We have three sets of scripts

If you want add new weights, you should calculate the histogram, copy it to that folder, and include a branch by modifying run2-rdx/reweight/pid/run2-rdx_oldcut.yml.