Closed yipengsun closed 2 years ago
I've asked Vitalli about this. It looks like even if we can't merge the mu_UBDT
branch directly, we can produce efficiency histograms relatively easily offline because the sWeight is alreadly available to us.
It looks like the 2016 production has finished: https://its.cern.ch/jira/projects/WGP/issues/WGP-274?filter=allissues
From Vitalli's reply, I think we should do this:
Mu_nopt
ones)pidcalib2
to find efficiencies
pidcalib2
can specify the input ntuples to use. @emilyj816 Let's focus on the downloading step for now.
First, write a Python script called ntuple_grabber.py
under scripts
, make sure that all variables are named like iAmAVariable
.
This script should read the spec YAML in the spec/pidcalib.yml
, then download to the correction location. You need to compute checksums of each file to ensure file integrity.
sha256sum <path_to_file>
and make sure the downloaded ntuple is unbroken. You also probably want to use ssh-wrapped rsync
to download, so that you'll only do the incremental download.
Also, to parse the YAML, just use pyyaml
. You can add that as a dependency in requirements.txt
.
Sorry, after some thought, let's use ntuple_grabber.py
as the script name, while maintaining the iAmAVariable
naming convention inside the code.
I see that you've created a branch for this. I made additional changes to the project, and have merged these to your branch. Don't forget to do a git pull
first before commit! @emilyj816
So the total size for 2016 official PIDCalib ntuples is just 2.7 TB. This is much smaller than what I expect.
Hi Yipeng, I'm trying to figure out the python script to call test-nix-pkg
, and I'm having a hard time looking online for certain syntax things. My plan is to pass make test-nix-pkg
to a python script which parses the yaml file and is able to pass the filenames to the Makefile. Here is my current attempt from the command line:
make test-nix-pkg gen/pidcalib_w_nix_pkg.root=gen/real_pidcalib_w_nix_pkg.root samples/Jpsi--21_11_30--pidcalib--data_turbo--2016--mu--Mu_nopt-subset.root=/home/public/pidcalib_ntuples/remote/Mu_nopt-2016-MagDown/00152085_00000001_1.pidcalib.root
This results in make: *** No rule to make target 'samples/Jpsi--21_11_30--pidcalib--data_turbo--2016--mu--Mu_nopt-subset.root', needed by 'test-nix-pkg'. Stop.
I guess it's because the input file is a prerequisite and so there must be a different way to specify the name of the file. I've done a lot of searching and haven't been able to figure it out yet, I was wondering if you knew off the top of your head an easy way to do this. Thanks!
I think you don't need to use make
at all. What you need is calling AddUBDTBranchPidCalib
executable directly in python, say, with os.system
, and its usage is listed in the Makefile
.
To make your life easier, I'll list the usage of the executable explicitly here:
AddUBDTBranchPidCalib -i <path_to_a_pidcalib_ntuple> -o <path_to_output_ntuple> -p probe -b UBDT -t <tree1>,<tree2>
An example:
AddUBDTBranchPidCalib -i /some/folder/input.root -o /some/other/folder/output.root -p probe -b UBDT -t "tree1","tree2"
Also, you should check the exit code for each run to ensure the command was executed properly. If you decide to use os.system
, you can do ret_code = os.system("<your command>"
and just check that ret_code == 0
.
Edit: Just to be clear: The make
rule was meant to tell you how to use AddUBDTBranchPidCalib
. I never pointed that out clearly. Sorry!
I understand now, thank you!
In the process of running my script to call AddUBDTBranchPidCalib
for all the ntuples, should be ready sometime in the future. I've pushed the code to my branch in scripts/nix_tester.py
, in case you wanted to take a look. The cleaned up version of ntuple_grabber.py
is also on git, and the cleaned up sha_checker.py
will follow soon.
BTW, I think a better name for nix_tester
would be just apply_ubdt
, as that's what's actually happening right :-)
Final step for Emily:
Try to generate a JSON file of the following form: https://gitlab.cern.ch/lhcb-rta/pidcalib2/-/blob/master/src/pidcalib2/data/samples.json
Hey Yipeng,
I've pushed 1) my script for writing the JSON file and 2) the JSON file to git, under the names json_writer.py
and samples.json
, respectively. Let me know if there's anything that looks wrong or needs to be renamed/cleaned up. Also, sha_checker.py
has been cleaned up and no longer has a lot of hard-coded components.
Thank! I'll try to clean up the code over the weekend and merge the branch.
I was able to use uproot
to combine the friend UBDT ntuple with the raw PIDCalib ntuple in a chunk-by-chunk manner so the whole file needs NOT to be loaded into memory. It is not trivial.
As a validation, I checked the Jpsi_M
and probe_UBDT
branches between the raw-merged and friend-merged ntuples. They agree perfectly. This suggests that the merging was successful.
Merging of PIDCalib ntuples w/ its corresponding UBDT friend ntuples started.
The merging is still bugged. I've opened an issue upstream.
OK, I decide to do the irresponsible thing and use a very large step size. This means that we are reading a huge chunk of data into memory directly (~3 GB I guess).
This is not too bad for the server and I'm going to proceed for now.
This is done.
We should discuss w/ PIDCalib experts to see if we can integrate the UBDT branch directly in the official PIDCalib sample.
The procedure would be the following:
pidcalib2
pidcalib2
source code to add UBDT branches and use our local PIDCalib samples w/ the UBDT branches.