umd-lhcb / lhcb-ntuples-gen

ntuples generation with DaVinci and in-house offline components
BSD 2-Clause "Simplified" License
1 stars 0 forks source link

v0.9.5 GRID ntuple productions #87

Closed yipengsun closed 2 years ago

yipengsun commented 2 years ago

Here we list all GRID ntuple productions for v0.9.5.

cutflow_data, data, 2016 MD

This was originally meant to be a std production but was produced as cutflow_data due to Yipeng's negligence. The production is sub-optimal but should be sufficient for rebuilding our run 2 data template.

The differences between cutflow_data and std are:

Notes

std, data, 2011 MD

mc, Sig and Norm, 2016 MD-MU, FullSim

The MC IDs are listed at here and here

mc, DDX and D**s, 2016 MD-MU, FullSim

MC IDs here and here

mc, D*+TauNu, 2012 MD-MU, FullSim, Sim08a

MC IDs: 11574010

mc, D**, 2016 MD-MU, FullSim

MC IDs here and here.

mc, D* -> Tau/Mu Nu, 2016 MD-MU, tracker-only

MC IDs: 11574011, 11574021

Postprocess of GRID ntuples

I typically remove some branches during this process. Use the scripts/haddcut.py:

haddcut.py <output_ntp> <input_ntp> -c <YAML_config>

For the <YAML_config>:

yipengsun commented 2 years ago

I decide to increase the file per job option. 4k subjobs is kind of unwieldy.

afernez commented 2 years ago

Thanks for noting this- good that the accidental cutflow flag will just mean larger ntuples rather than requiring things be reproduced. While you are re-running those failed subjobs, could I get started on the 2016 fullsim MC? I could start with MD (so basically just reproducing most of the contents here)

yipengsun commented 2 years ago

Let's divide our burden then:

Can you produce the DDX and D**_s samples listed at: https://umd-lhcb.github.io/lhcb-ntuples-gen/data/data_sources/#run-2-muonic-rd-monte-carlo

BTW, make sure to:

  1. Update your DaVinci build on lxplus
  2. Update .ganga.py as noted in the doc
yipengsun commented 2 years ago

Also, can you add your actual job submission script to run2-rdx/jobs? You can start with one of the existing script and just tweak the parameters.

yipengsun commented 2 years ago

Also, after hadd the ntuples on lxplus, I typically drop some branches offline to make the ntuples smaller (because they can be HUGE). I'll document that procedure later, but just keep this in mind that you need to run another script before doing the git annex add procedure.

yipengsun commented 2 years ago

Submitted std, data, 2011 MD.

yipengsun commented 2 years ago

@manuelfs @Svende @afernez There's something VERY fishy about the Stripping v28r2:

Stripping version polarity num of evt in BKK num of evt from DV log luminosity in BKK file size in BKK
28r2 down 999373140 999373140 842212309.924 69.2 TB
28r1 down 490996260 490996260 841274852.008 31.2 TB
28r2 up 946833490 - 777737446.214 64.8 TB
28r1 up 476955403 - 777747254.257 30.1 TB

How come that we magically has twice as many events for both polarities but the luminosity doesn't change at all??

BTW, the latest production looks fine, because num of evt in BKK = num of evt from DV log.

yipengsun commented 2 years ago

I've submitted Sig and Nor for 2016 FullSim MC w/ both polarities.

@afernez I used the job submission script here: https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/master/run2-rdx/jobs/21_10_08-fullsim_sig_nor_reprod.sh You probably can just change the MC IDs and submit the DDX and D**_s samples.

afernez commented 2 years ago

I tried running a job overnight last night to do one of the DDX modes, just to make sure I had things configured correctly. Weirdly, ganga stopped before running the jobs with an error message ERROR SandboxError: File '/afs/cern.ch/user/a/afernez/lhcb-ntuples-gen/run2-rdx/weights_soft.xml' does not exist. This file does exist... so I'm not sure I understand what went wrong. I'll look into it now, and then I'll try to run the same job again, and then I'll do all of the DDX and D**s.

Also, yeah, I'll run your script above to cut out unneeded branches before git annex adding the merged ntuples.

afernez commented 2 years ago

Well, after staring at things for a bit (and changing nothing), I ran the job again, and it worked. Not sure what happened, but seems fine now.

afernez commented 2 years ago

Ok, submitted the jobs for all 2016 fullsim DDX and D**s md and mu. I'll commit the job script and update the top post.

yipengsun commented 2 years ago

Verification for cutflow_data, data, 2016 MD

Verifying output for Job 130...
subjob 0: ntuple has a size of 5 KiB, which is too small!
subjob 15: ntuple has a size of 5 KiB, which is too small!
subjob 16: ntuple has a size of 5 KiB, which is too small!
subjob 2: ntuple has a size of 5 KiB, which is too small!
subjob 20: ntuple has a size of 5 KiB, which is too small!
subjob 21: ntuple has a size of 5 KiB, which is too small!
subjob 23: ntuple has a size of 5 KiB, which is too small!
subjob 26: ntuple has a size of 5 KiB, which is too small!
subjob 27: ntuple has a size of 5 KiB, which is too small!
subjob 28: ntuple has a size of 5 KiB, which is too small!
subjob 3: ntuple has a size of 5 KiB, which is too small!
subjob 30: ntuple has a size of 5 KiB, which is too small!
subjob 38: ntuple has a size of 5 KiB, which is too small!
subjob 39: ntuple has a size of 5 KiB, which is too small!
subjob 4: ntuple has a size of 5 KiB, which is too small!
subjob 41: ntuple has a size of 5 KiB, which is too small!
subjob 42: ntuple has a size of 5 KiB, which is too small!
subjob 45: ntuple has a size of 5 KiB, which is too small!
subjob 46: ntuple has a size of 5 KiB, which is too small!
subjob 47: ntuple has a size of 5 KiB, which is too small!
subjob 51: ntuple has a size of 5 KiB, which is too small!
subjob 58: ntuple has a size of 5 KiB, which is too small!
subjob 62: ntuple has a size of 5 KiB, which is too small!
subjob 63: ntuple has a size of 5 KiB, which is too small!
subjob 69: ntuple has a size of 5 KiB, which is too small!
subjob 7: ntuple has a size of 5 KiB, which is too small!
Job 130 output verification failed with 26 error(s).

Out of 4264 subjobs, 26 produced ntuples don't contain any candidate.

In the end I set the minimal file size to 1 KiB.

The merge failed with the following errors:

Warning in <TTree::CopyEntries>: The output TTree (DecayTree) must be associated with a writable directory (TupleB0 in eos/stash/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_Reco16_Stripping28r
2_90000000_SEMILEPTONIC.DST.root).
Warning in <TTree::CopyEntries>: The output TTree (DecayTree) must be associated with a writable directory (TupleB0 in eos/stash/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_Reco16_Stripping28r
2_90000000_SEMILEPTONIC.DST.root).
Warning in <TTree::CopyEntries>: The output TTree (DecayTree) must be associated with a writable directory (TupleB0 in eos/stash/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_Reco16_Stripping28r
2_90000000_SEMILEPTONIC.DST.root).
Warning in <TTree::CopyEntries>: The output TTree (DecayTree) must be associated with a writable directory (TupleB0 in eos/stash/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_Reco16_Stripping28r
2_90000000_SEMILEPTONIC.DST.root).
Warning in <TTree::CopyEntries>: The output TTree (DecayTree) must be associated with a writable directory (TupleB0 in eos/stash/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_Reco16_Stripping28r
2_90000000_SEMILEPTONIC.DST.root).
Warning in <TTree::CopyEntries>: The output TTree (DecayTree) must be associated with a writable directory (TupleB0 in eos/stash/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_Reco16_Stripping28r
2_90000000_SEMILEPTONIC.DST.root).
Warning in <TTree::CopyEntries>: The output TTree (DecayTree) must be associated with a writable directory (TupleB0 in eos/stash/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_Reco16_Stripping28r
2_90000000_SEMILEPTONIC.DST.root).
Warning in <TTree::CopyEntries>: The output TTree (DecayTree) must be associated with a writable directory (TupleB0 in eos/stash/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_Reco16_Stripping28r

I downloaded the full job output, and hadded them locally successfully:

hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/GetIntegratedLuminosity
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/TupleBminus
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/TupleB0
hadd Opening the next 923 files
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/GetIntegratedLuminosity
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/TupleBminus
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/TupleB0
hadd Opening the next 923 files
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/GetIntegratedLuminosity
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/TupleBminus
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/TupleB0
hadd Opening the next 923 files
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/GetIntegratedLuminosity
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/TupleBminus
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/TupleB0
hadd Opening the next 574 files
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/GetIntegratedLuminosity
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/TupleBminus
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/TupleB0

Compare the step-2 ntuple w/ the old step-2 ntuple, both WITHOUT D* veto (this is because the old one is missing branches for this veto):

> make rdx-ntuple-run2-data-oldcut-no-Dst-veto
> cd gen/rdx-ntuple-run2-data-oldcut-no-Dst-veto/ntuple

> uiddump -n D0--21_10_08--cutflow_data--data--2016--md.root -t tree
Num of events: 1540780, Num of IDs: 1540750, Num of UIDs: 1540720
Num of duplicated IDs: 30, Num of duplicated events: 30, duplicate rate: 0.00%
> uiddump -n D0--21_10_08--std--data--2016--md.root -t tree
Num of events: 1538763, Num of IDs: 1538733, Num of UIDs: 1538703
Num of duplicated IDs: 30, Num of duplicated events: 30, duplicate rate: 0.00%
> uidcommon -n D0--21_10_08--cutflow_data--data--2016--md.root  -t tree -N D0--21_10_08--std--data
Total common IDs: 1535074

> uiddump -n Dst--21_10_08--cutflow_data--data--2016--md.root -t tree
Num of events: 280804, Num of IDs: 280801, Num of UIDs: 280798
Num of duplicated IDs: 3, Num of duplicated events: 3, duplicate rate: 0.00%
> uiddump -n Dst--21_10_08--std--data--2016--md.root -t tree 
Num of events: 278130, Num of IDs: 278127, Num of UIDs: 278124
Num of duplicated IDs: 3, Num of duplicated events: 3, duplicate rate: 0.00%
> uidcommon -n Dst--21_10_08--cutflow_data--data--2016--md.root -t tree -N Dst--21_10_08--std--dat
Total common IDs: 277528

In addition, I have a separate d0_dst_veto_ok branch, and the veto rate is 21.5%.

yipengsun commented 2 years ago

@afernez I decided to put verification into separate comments, and just provide link in the top post, to avoid making top post super long.

yipengsun commented 2 years ago

For D** production: It looks to me that the DIRAC backend is mis-behaving, and jobs starting from 12675011 MagUp (index 158) for me are not working properly or are not submitting properly.

Now all the remaining jobs for this production have been submitted.

yipengsun commented 2 years ago

Also having problems on trying to remove files from lxplus:

ERROR    Tried 3 times to remove file/folder: /afs/cern.ch/user/s/suny/work/gangadir/repository/suny/LocalXML/6.0/jobs/0xxx/160_1634326806.1191213__to_be_deleted_/9_1634326806.1674125__to_be_deleted__1634326806.1711805__to_be_deleted_/.__afs8E37
WARNING  Error trying to fully remove Job #'160':: GangaException: Failed to remove file/folder: /afs/cern.ch/user/s/suny/work/gangadir/repository/suny/LocalXML/6.0/jobs/0xxx/160_1634326806.1191213__to_be_deleted_/9_1634326806.1674125__to_be_deleted__1634326806.1711805__to_be_deleted_/.__afs8E37
yipengsun commented 2 years ago

Production for large tracker-only 2016 MC for both polarities for D* -> Tau/Mu Nu has been submitted.

afernez commented 2 years ago

Ok, I think ganga is cooperating enough for me now to resubmit the DD+D**s 2016 MU-MD FullSim MC jobs. I've updated the top post with the new date. I'm hopeful these jobs will all run correctly now... It is unclear to me what the issue has been.

yipengsun commented 2 years ago

For 2012 signal production: Note that the MagUp have roughly twice as many events as MagDown (from BKK: MagUp: 213956, MagDown: 146177), and this seems directly proportional to the ntuple file size.

yipengsun commented 2 years ago

The TO production has finished, with HUGE file sizes:

38G     # D* sig
38G     
197G    # D* norm
196G    
yipengsun commented 2 years ago

There's a main difficulty in merging the large TO ntuple: the EOS dismounts within 1 day:

I have tried to either mount the EOS via sshfs or merging directly on lxplus, both times the merger errored out with a file not exist error, I checked and confirmed that the EOS storage has been dismounted. I think it is only mounted for about a day before it dismounts.

I have resorted back to copy the full files to my local machine and merge locally. Unfortunately this puts a huge stress on my disk space and I'm trying to deleted unneeded files to have enough space for the ntuples.

yipengsun commented 2 years ago

For normalization, after skimming locally, the file size change is: 200 GB -> 120 GB (both approx.). Still, I'm really running out on disk space.

yipengsun commented 2 years ago

@manuelfs and I had agreed that we'll postpone the pruning of the TO ntuples to 0.9.6. Consider this done.