umd-lhcb / lhcb-ntuples-gen

ntuples generation with DaVinci and in-house offline components
BSD 2-Clause "Simplified" License
1 stars 0 forks source link

Approachable scripts to run tasks #82

Closed manuelfs closed 3 years ago

manuelfs commented 3 years ago

We have been having issues running several tasks such as cutflows or ntuple-making when the commands are encoded in a MakeFile: they are difficult to parse and understand, and to run subsets of.

We discussed using simpler scripts, and here's a possible template: scripts/run_cutflows.py. It produces the bare, Dstmu, and data cutflows with the following partial printout

(.virtualenv) |09:14:21|~/code/lhcb-ntuples-gen$ ./scripts/run_cutflows.py 

======= Running cutflow and saving output to gen/cutflow_bare-sig
  ./scripts/cutflow_output_yml_gen.py ntuples/0.9.4-trigger_emulation/Dst_D0-cutflow_mc/Dst_D0--21_05_29--cutflow_mc--bare--MC_2011_Beam3500GeV-2011-MagUp-Nu2-Pythia8_Sim08h_Digi13_Trig0x40760037_Reco14c_Stripping20r1NoPrescalingFlagged_11874091_ALLSTREAMS.DST.root ntuples/0.9.4-trigger_emulation/Dst_D0-cutflow_mc/Dst_D0--21_05_29--cutflow_mc--bare--MC_2011_Beam3500GeV-2011-MagDown-Nu2-Pythia8_Sim08h_Digi13_Trig0x40760037_Reco14c_Stripping20r1NoPrescalingFlagged_11874091_ALLSTREAMS.DST.root -s -o gen/cutflow_bare-sig/run1_yields.yml -m run1-std-sig
  ./scripts/cutflow_output_yml_gen.py ntuples/0.9.4-trigger_emulation/Dst_D0-cutflow_mc/Dst_D0--21_05_29--cutflow_mc--bare--MC_2016_Beam6500GeV-2016-MagUp-Nu1.6-25ns-Pythia8_Sim09b_Trig0x6138160F_Reco16_Turbo03_Stripping26NoPrescalingFlagged_11874091_ALLSTREAMS.DST.root ntuples/0.9.4-trigger_emulation/Dst_D0-cutflow_mc/Dst_D0--21_05_29--cutflow_mc--bare--MC_2016_Beam6500GeV-2016-MagDown-Nu1.6-25ns-Pythia8_Sim09b_Trig0x6138160F_Reco16_Turbo03_Stripping26NoPrescalingFlagged_11874091_ALLSTREAMS.DST.root -s -o gen/cutflow_bare-sig/run2_yields.yml -m run2-std-sig
  ./scripts/cutflow_gen.py -o gen/cutflow_bare-sig/run1_yields.yml -t gen/cutflow_bare-sig/run2_yields.yml -n > gen/cutflow_bare-sig/cutflow.csv -r 0.9896434125288742
  cat gen/cutflow_bare-sig/cutflow.csv | tabgen.py -f latex_booktabs_raw > gen/cutflow_bare-sig/cutflow.tex
  cat gen/cutflow_bare-sig/cutflow.csv | tabgen.py -f github > gen/cutflow_bare-sig/cutflow.md

  cat gen/cutflow_bare-sig/cutflow.md

======= Running cutflow and saving output to gen/cutflow_bare-nor
  ./scripts/cutflow_output_yml_gen.py ntuples/0.9.4-trigger_emulation/Dst_D0-cutflow_mc/Dst_D0--21_05_29--cutflow_mc--bare--MC_2011_Beam3500GeV-2011-MagUp-Nu2-Pythia8_Sim08h_Digi13_Trig0x40760037_Reco14c_Stripping20r1NoPrescalingFlagged_11874091_ALLSTREAMS.DST.root ntuples/0.9.4-trigger_emulation/Dst_D0-cutflow_mc/Dst_D0--21_05_29--cutflow_mc--bare--MC_2011_Beam3500GeV-2011-MagDown-Nu2-Pythia8_Sim08h_Digi13_Trig0x40760037_Reco14c_Stripping20r1NoPrescalingFlagged_11874091_ALLSTREAMS.DST.root -s -o gen/cutflow_bare-nor/run1_yields.yml -m run1-std-nor
  ./scripts/cutflow_output_yml_gen.py ntuples/0.9.4-trigger_emulation/Dst_D0-cutflow_mc/Dst_D0--21_05_29--cutflow_mc--bare--MC_2016_Beam6500GeV-2016-MagUp-Nu1.6-25ns-Pythia8_Sim09b_Trig0x6138160F_Reco16_Turbo03_Stripping26NoPrescalingFlagged_11874091_ALLSTREAMS.DST.root ntuples/0.9.4-trigger_emulation/Dst_D0-cutflow_mc/Dst_D0--21_05_29--cutflow_mc--bare--MC_2016_Beam6500GeV-2016-MagDown-Nu1.6-25ns-Pythia8_Sim09b_Trig0x6138160F_Reco16_Turbo03_Stripping26NoPrescalingFlagged_11874091_ALLSTREAMS.DST.root -s -o gen/cutflow_bare-nor/run2_yields.yml -m run2-std-nor
  ./scripts/cutflow_gen.py -o gen/cutflow_bare-nor/run1_yields.yml -t gen/cutflow_bare-nor/run2_yields.yml -n > gen/cutflow_bare-nor/cutflow.csv -r 0.9896434125288742
  cat gen/cutflow_bare-nor/cutflow.csv | tabgen.py -f latex_booktabs_raw > gen/cutflow_bare-nor/cutflow.tex
  cat gen/cutflow_bare-nor/cutflow.csv | tabgen.py -f github > gen/cutflow_bare-nor/cutflow.md

  cat gen/cutflow_bare-nor/cutflow.md
...

Some features that I think are useful to help developers understand and modify as needed

For instance, the command printout was useful several times when I just wanted to change the format of the tables, and I could just copy/paste the ./scripts/cutflow_gen.py command easily without re-running the time-consuming ./scripts/cutflow_output_yml_gen.py steps where the yields are calculated.

A couple of things I found cumbersome

yipengsun commented 3 years ago

For tabgen.py, it uses an external library, tabulate to generate LaTeX tables. So far my idea has been: If you need to have minor tweaks of the .tex table, paste it to an actual .tex file and change there.

It does feel like we probably should move tabgen.py to this repository: lhcb-ntuples-gen/tools.

I suspect if we want to have the generated .tex file to really be the final final version, we probably should roll our own LaTeX table generation.

yipengsun commented 3 years ago

BTW, the scripts directory is already exported to PATH, so you probably can replace ./scripts/cutflow_output_yml_gen.py with just cutflow_output_yml_gen.py.

yipengsun commented 3 years ago

Also, we probably should update Makefile so that the cutflow generation is just calling your run_cutflows.py. In this case, I'm in favor of remove the actual cutflow generation procedure in the Makefile itself, and use the Makefile just like a registry/pointer.

yipengsun commented 3 years ago

To summarize, I'd like to do the following:

  1. Move tabgen.py from pyTuplingUtils to the tools folder of this project
  2. Move run_cutflows.py to workflows/rdx-cutflows.py
  3. Update Makefile so that make rdx-cutflow is just running your cutflow script
  4. Other related small cleanups.
yipengsun commented 3 years ago

I have some other ideas to simplify the scripts, tools and workflows directories. Let's discuss this on Tue.

manuelfs commented 3 years ago

Great, those all sound good, but let's discuss on Tuesday indeed.

BTW, the scripts directory is already exported to PATH, so you probably can replace ./scripts/cutflow_output_yml_gen.py with just cutflow_output_yml_gen.py.

In this case it doesn't make much of a difference because it is the primary script, but in general I find that not writing out the full path obfuscate the location of the code for the casual developers, so I'd prefer to use full paths as much as possible.

manuelfs commented 3 years ago

I finalized my commits to the cutflows, and moved run_cutflows.py to workflows/.

Yipeng and I also discussed today

yipengsun commented 3 years ago

I've updated the project as we discussed, including documentations. @manuelfs feel free to close the issue after taking a look.

manuelfs commented 3 years ago

Did you accidentally delete workflows/rdx-cutflows.py?

yipengsun commented 3 years ago

Yes I did. I forgot to add it back after renaming.

yipengsun commented 3 years ago

Can we close this issue now? @manuelfs

manuelfs commented 3 years ago

I want to be able to run this package natively in my mac. I finally figured out how to bind python to python3 in macOS without anaconda (which was giving problems because compiling ROOT picked up the native python version of 3.9.1, but conda had 3.8.8): I simply added soft links. I also installed the latest python 3.9.6 with brew as suggested here, but I'm not sure that was needed.

Also, macOS couldn't find the scripts in the MakeFile despite theirs folders being added to the PATH. In any case, I think it is better to be explicit so that we know quickly where these scripts are (workflows? scripts? test?) so I added the paths, and now it all runs natively in mac 🙂

If @yipengsun is happy with this change, you can close the issue.

yipengsun commented 3 years ago

The only problem is: You still cannot run MuonPID natively, but once we have caching set up, it should be no problem normally.