umd-lhcb / lhcb-ntuples-gen

ntuples generation with DaVinci and in-house offline components
BSD 2-Clause "Simplified" License
1 stars 0 forks source link

Prune but don't merge the GRID ntuples #92

Closed yipengsun closed 2 years ago

yipengsun commented 2 years ago

Previously we both prune and merge the GRID ntuples, resulting in HUGE step-1 ntuples. Now for TO we'd like to prune each small ntuples without merging them, and add them as:

<long_name>/<long_name>-XYZ.root

where XYZ are the job index of the original ntuple.

One possible problem is the total path length, which for macOS can be as short as 255 chars in total. I'll investigate more on this and track the progress here.

FYI @manuelfs

yipengsun commented 2 years ago

Indeed, the APFS file path limit is just 255 UTF-8 characters: https://superuser.com/questions/1561484/what-is-the-maximum-length-of-a-filename-apfs.

yipengsun commented 2 years ago

Here I compute the minimal file path length:

  1. The <long_name>: Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574011_D0TAUNU.SAFESTRIPTRIG.DST.root
  2. The base folder path: lhcb-ntuples-gen/ntuples/0.9.5-bugfix/Dst_D0-mc-tracker_only

The <long_name> is already 162 character long, the base path is 43 character long, so this is already 205 characters. There's no way for <long_name>/<long_name> to work.

yipengsun commented 2 years ago

Nah, take a deeper look, the APFS limit is on the filename, not file path. The file path limit is undisclosed.

yipengsun commented 2 years ago

Note that we also should NOT merge the generated step-2 ntuples.

yipengsun commented 2 years ago

@manuelfs This is now fully implemented. The output looks like this (see below).

I've made extensive changes to the workflow code, with the following goal in mind:

I think I've achieved all 4 goals, as the total line of code is pretty small (basically no redundancy) and the logic are generally pretty clear. The only downside is the code is not very beginner-friendly.

I think a good compromise would be: For RDX, just use the functions I defined (It is very clear for DEFINING a new workflow, it's just not very direct on how the functions are composed). For future analyses, you can just partially reuse the code in utils.py and do a less elegant, but more readable implementation for them.

rdx-ntuple-run2-mc-to-demo
├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST
│   ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--000-dv
│   │   ├── baby.cpp
│   │   ├── baby.exe
│   │   ├── D0--21_12_31--mc--11574021--2016--md--tracker_only--000.root
│   │   ├── Dst--21_12_31--mc--11574021--2016--md--tracker_only--000.root
│   │   ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--000-dv--aux_trk.root 
│   │   ├── hammer.root 
│   │   ├── pid.root 
│   │   ├── trg_emu.root 
│   │   └── trk.root
│   ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--001-dv
│   │   ├── baby.cpp
│   │   ├── baby.exe
│   │   ├── D0--21_12_31--mc--11574021--2016--md--tracker_only--001.root
│   │   ├── Dst--21_12_31--mc--11574021--2016--md--tracker_only--001.root
│   │   ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--001-dv--aux_trk.root 
│   │   ├── hammer.root 
│   │   ├── pid.root 
│   │   ├── trg_emu.root 
│   │   └── trk.root
│   ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--002-dv
│   │   ├── baby.cpp
│   │   ├── baby.exe
│   │   ├── D0--21_12_31--mc--11574021--2016--md--tracker_only--002.root
│   │   ├── Dst--21_12_31--mc--11574021--2016--md--tracker_only--002.root
│   │   ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--002-dv--aux_trk.root 
│   │   ├── hammer.root 
│   │   ├── pid.root 
│   │   ├── trg_emu.root 
│   │   └── trk.root
│   ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--003-dv
│   │   ├── baby.cpp
│   │   ├── baby.exe
│   │   ├── D0--21_12_31--mc--11574021--2016--md--tracker_only--003.root
│   │   ├── Dst--21_12_31--mc--11574021--2016--md--tracker_only--003.root
│   │   ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--003-dv--aux_trk.root 
│   │   ├── hammer.root 
│   │   ├── pid.root 
│   │   ├── trg_emu.root 
│   │   └── trk.root
│   ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--004-dv
│   │   ├── baby.cpp
│   │   ├── baby.exe
│   │   ├── D0--21_12_31--mc--11574021--2016--md--tracker_only--004.root
│   │   ├── Dst--21_12_31--mc--11574021--2016--md--tracker_only--004.root
│   │   ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--004-dv--aux_trk.root 
│   │   ├── hammer.root 
│   │   ├── pid.root 
│   │   ├── trg_emu.root 
│   │   └── trk.root
│   ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--005-dv
│   │   ├── baby.cpp
│   │   ├── baby.exe
│   │   ├── D0--21_12_31--mc--11574021--2016--md--tracker_only--005.root
│   │   ├── Dst--21_12_31--mc--11574021--2016--md--tracker_only--005.root
│   │   ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--005-dv--aux_trk.root 
│   │   ├── hammer.root 
│   │   ├── pid.root 
│   │   ├── trg_emu.root 
│   │   └── trk.root
│   ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--006-dv
│   │   ├── baby.cpp
│   │   ├── baby.exe
│   │   ├── D0--21_12_31--mc--11574021--2016--md--tracker_only--006.root
│   │   ├── Dst--21_12_31--mc--11574021--2016--md--tracker_only--006.root
│   │   ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--006-dv--aux_trk.root 
│   │   ├── hammer.root 
│   │   ├── pid.root 
│   │   ├── trg_emu.root 
│   │   └── trk.root
│   ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--007-dv
│   │   ├── baby.cpp
│   │   ├── baby.exe
│   │   ├── D0--21_12_31--mc--11574021--2016--md--tracker_only--007.root
│   │   ├── Dst--21_12_31--mc--11574021--2016--md--tracker_only--007.root
│   │   ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--007-dv--aux_trk.root 
│   │   ├── hammer.root 
│   │   ├── pid.root 
│   │   ├── trg_emu.root 
│   │   └── trk.root
│   ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--008-dv
│   │   ├── baby.cpp
│   │   ├── baby.exe
│   │   ├── D0--21_12_31--mc--11574021--2016--md--tracker_only--008.root
│   │   ├── Dst--21_12_31--mc--11574021--2016--md--tracker_only--008.root
│   │   ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--008-dv--aux_trk.root 
│   │   ├── hammer.root 
│   │   ├── pid.root 
│   │   ├── trg_emu.root 
│   │   └── trk.root
│   └── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--009-dv
│       ├── baby.cpp
│       ├── baby.exe
│       ├── D0--21_12_31--mc--11574021--2016--md--tracker_only--009.root
│       ├── Dst--21_12_31--mc--11574021--2016--md--tracker_only--009.root
│       ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--009-dv--aux_trk.root 
│       ├── hammer.root 
│       ├── pid.root 
│       ├── trg_emu.root 
│       └── trk.root
├── ntuple
│   └── 21_12_31--mc--11574021--2016--md--tracker_only
│       ├── D0--21_12_31--mc--11574021--2016--md--tracker_only--000.root 
│       ├── D0--21_12_31--mc--11574021--2016--md--tracker_only--001.root 
│       ├── D0--21_12_31--mc--11574021--2016--md--tracker_only--002.root 
│       ├── D0--21_12_31--mc--11574021--2016--md--tracker_only--003.root 
│       ├── D0--21_12_31--mc--11574021--2016--md--tracker_only--004.root 
│       ├── D0--21_12_31--mc--11574021--2016--md--tracker_only--005.root 
│       ├── D0--21_12_31--mc--11574021--2016--md--tracker_only--006.root 
│       ├── D0--21_12_31--mc--11574021--2016--md--tracker_only--007.root 
│       ├── D0--21_12_31--mc--11574021--2016--md--tracker_only--008.root 
│       ├── D0--21_12_31--mc--11574021--2016--md--tracker_only--009.root 
│       ├── Dst--21_12_31--mc--11574021--2016--md--tracker_only--000.root 
│       ├── Dst--21_12_31--mc--11574021--2016--md--tracker_only--001.root 
│       ├── Dst--21_12_31--mc--11574021--2016--md--tracker_only--002.root 
│       ├── Dst--21_12_31--mc--11574021--2016--md--tracker_only--003.root 
│       ├── Dst--21_12_31--mc--11574021--2016--md--tracker_only--004.root 
│       ├── Dst--21_12_31--mc--11574021--2016--md--tracker_only--005.root 
│       ├── Dst--21_12_31--mc--11574021--2016--md--tracker_only--006.root 
│       ├── Dst--21_12_31--mc--11574021--2016--md--tracker_only--007.root 
│       ├── Dst--21_12_31--mc--11574021--2016--md--tracker_only--008.root 
│       └── Dst--21_12_31--mc--11574021--2016--md--tracker_only--009.root 
└── ntuple_aux
    └── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST
        ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--000-dv--aux_trk.root 
        ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--001-dv--aux_trk.root 
        ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--002-dv--aux_trk.root 
        ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--003-dv--aux_trk.root 
        ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--004-dv--aux_trk.root 
        ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--005-dv--aux_trk.root 
        ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--006-dv--aux_trk.root 
        ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--007-dv--aux_trk.root 
        ├── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--008-dv--aux_trk.root 
        └── Dst_D0--21_10_16--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11574021_D0TAUNU.SAFESTRIPTRIG.DST--009-dv--aux_trk.root 

15 directories, 120 files
manuelfs commented 2 years ago

Thank you for implementing this Yipeng!

I think I've achieved all 4 goals, as the total line of code is pretty small (basically no redundancy) and the logic are generally pretty clear. The only downside is the code is not very beginner-friendly.

Sigh. I'd say that elegance is not a primary goal in itself, the primary goal would be "well-written code that is maintainable" and elegance would be a subjective goal subservient to the primary goals. Our code does not need undergrad-friendly, but it should at least be understandable/editable for developers with some experience but who are not breathing the code day in and out (ie, Alex/Svende/myself).

Hopefully this implementation meets that goal 🤞