Closed yipengsun closed 2 years ago
I decide to increase the file per job option. 4k subjobs is kind of unwieldy.
Thanks for noting this- good that the accidental cutflow
flag will just mean larger ntuples rather than requiring things be reproduced. While you are re-running those failed subjobs, could I get started on the 2016 fullsim MC? I could start with MD (so basically just reproducing most of the contents here)
Let's divide our burden then:
Can you produce the DDX
and D**_s
samples listed at: https://umd-lhcb.github.io/lhcb-ntuples-gen/data/data_sources/#run-2-muonic-rd-monte-carlo
BTW, make sure to:
lxplus
.ganga.py
as noted in the docAlso, can you add your actual job submission script to run2-rdx/jobs
? You can start with one of the existing script and just tweak the parameters.
Also, after hadd
the ntuples on lxplus, I typically drop some branches offline to make the ntuples smaller (because they can be HUGE). I'll document that procedure later, but just keep this in mind that you need to run another script before doing the git annex add
procedure.
Submitted std
, data, 2011 MD.
@manuelfs @Svende @afernez There's something VERY fishy about the Stripping v28r2
:
Stripping version | polarity | num of evt in BKK | num of evt from DV log | luminosity in BKK | file size in BKK |
---|---|---|---|---|---|
28r2 |
down | 999373140 | 999373140 | 842212309.924 | 69.2 TB |
28r1 |
down | 490996260 | 490996260 | 841274852.008 | 31.2 TB |
28r2 |
up | 946833490 | - | 777737446.214 | 64.8 TB |
28r1 |
up | 476955403 | - | 777747254.257 | 30.1 TB |
How come that we magically has twice as many events for both polarities but the luminosity doesn't change at all??
BTW, the latest production looks fine, because num of evt in BKK = num of evt from DV log.
I've submitted Sig and Nor for 2016 FullSim MC w/ both polarities.
@afernez I used the job submission script here: https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/master/run2-rdx/jobs/21_10_08-fullsim_sig_nor_reprod.sh
You probably can just change the MC IDs and submit the DDX
and D**_s
samples.
I tried running a job overnight last night to do one of the DDX
modes, just to make sure I had things configured correctly. Weirdly, ganga stopped before running the jobs with an error message
ERROR SandboxError: File '/afs/cern.ch/user/a/afernez/lhcb-ntuples-gen/run2-rdx/weights_soft.xml' does not exist.
This file does exist... so I'm not sure I understand what went wrong. I'll look into it now, and then I'll try to run the same job again, and then I'll do all of the DDX
and D**s
.
Also, yeah, I'll run your script above to cut out unneeded branches before git annex add
ing the merged ntuples.
Well, after staring at things for a bit (and changing nothing), I ran the job again, and it worked. Not sure what happened, but seems fine now.
Ok, submitted the jobs for all 2016 fullsim DDX
and D**s
md and mu. I'll commit the job script and update the top post.
cutflow_data
, data, 2016 MDVerifying output for Job 130...
subjob 0: ntuple has a size of 5 KiB, which is too small!
subjob 15: ntuple has a size of 5 KiB, which is too small!
subjob 16: ntuple has a size of 5 KiB, which is too small!
subjob 2: ntuple has a size of 5 KiB, which is too small!
subjob 20: ntuple has a size of 5 KiB, which is too small!
subjob 21: ntuple has a size of 5 KiB, which is too small!
subjob 23: ntuple has a size of 5 KiB, which is too small!
subjob 26: ntuple has a size of 5 KiB, which is too small!
subjob 27: ntuple has a size of 5 KiB, which is too small!
subjob 28: ntuple has a size of 5 KiB, which is too small!
subjob 3: ntuple has a size of 5 KiB, which is too small!
subjob 30: ntuple has a size of 5 KiB, which is too small!
subjob 38: ntuple has a size of 5 KiB, which is too small!
subjob 39: ntuple has a size of 5 KiB, which is too small!
subjob 4: ntuple has a size of 5 KiB, which is too small!
subjob 41: ntuple has a size of 5 KiB, which is too small!
subjob 42: ntuple has a size of 5 KiB, which is too small!
subjob 45: ntuple has a size of 5 KiB, which is too small!
subjob 46: ntuple has a size of 5 KiB, which is too small!
subjob 47: ntuple has a size of 5 KiB, which is too small!
subjob 51: ntuple has a size of 5 KiB, which is too small!
subjob 58: ntuple has a size of 5 KiB, which is too small!
subjob 62: ntuple has a size of 5 KiB, which is too small!
subjob 63: ntuple has a size of 5 KiB, which is too small!
subjob 69: ntuple has a size of 5 KiB, which is too small!
subjob 7: ntuple has a size of 5 KiB, which is too small!
Job 130 output verification failed with 26 error(s).
Out of 4264 subjobs, 26 produced ntuples don't contain any candidate.
In the end I set the minimal file size to 1 KiB.
The merge failed with the following errors:
Warning in <TTree::CopyEntries>: The output TTree (DecayTree) must be associated with a writable directory (TupleB0 in eos/stash/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_Reco16_Stripping28r
2_90000000_SEMILEPTONIC.DST.root).
Warning in <TTree::CopyEntries>: The output TTree (DecayTree) must be associated with a writable directory (TupleB0 in eos/stash/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_Reco16_Stripping28r
2_90000000_SEMILEPTONIC.DST.root).
Warning in <TTree::CopyEntries>: The output TTree (DecayTree) must be associated with a writable directory (TupleB0 in eos/stash/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_Reco16_Stripping28r
2_90000000_SEMILEPTONIC.DST.root).
Warning in <TTree::CopyEntries>: The output TTree (DecayTree) must be associated with a writable directory (TupleB0 in eos/stash/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_Reco16_Stripping28r
2_90000000_SEMILEPTONIC.DST.root).
Warning in <TTree::CopyEntries>: The output TTree (DecayTree) must be associated with a writable directory (TupleB0 in eos/stash/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_Reco16_Stripping28r
2_90000000_SEMILEPTONIC.DST.root).
Warning in <TTree::CopyEntries>: The output TTree (DecayTree) must be associated with a writable directory (TupleB0 in eos/stash/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_Reco16_Stripping28r
2_90000000_SEMILEPTONIC.DST.root).
Warning in <TTree::CopyEntries>: The output TTree (DecayTree) must be associated with a writable directory (TupleB0 in eos/stash/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_Reco16_Stripping28r
2_90000000_SEMILEPTONIC.DST.root).
Warning in <TTree::CopyEntries>: The output TTree (DecayTree) must be associated with a writable directory (TupleB0 in eos/stash/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_Reco16_Stripping28r
I downloaded the full job output, and hadd
ed them locally successfully:
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/GetIntegratedLuminosity
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/TupleBminus
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/TupleB0
hadd Opening the next 923 files
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/GetIntegratedLuminosity
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/TupleBminus
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/TupleB0
hadd Opening the next 923 files
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/GetIntegratedLuminosity
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/TupleBminus
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/TupleB0
hadd Opening the next 923 files
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/GetIntegratedLuminosity
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/TupleBminus
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/TupleB0
hadd Opening the next 574 files
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/GetIntegratedLuminosity
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/TupleBminus
hadd Target path: downloads/Dst_D0--21_09_23--cutflow_data--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_eco16_Stripping28r2_90000000_SEMILEPTONIC.DST.root:/TupleB0
Compare the step-2 ntuple w/ the old step-2 ntuple, both WITHOUT D*
veto (this is because the old one is missing branches for this veto):
> make rdx-ntuple-run2-data-oldcut-no-Dst-veto
> cd gen/rdx-ntuple-run2-data-oldcut-no-Dst-veto/ntuple
> uiddump -n D0--21_10_08--cutflow_data--data--2016--md.root -t tree
Num of events: 1540780, Num of IDs: 1540750, Num of UIDs: 1540720
Num of duplicated IDs: 30, Num of duplicated events: 30, duplicate rate: 0.00%
> uiddump -n D0--21_10_08--std--data--2016--md.root -t tree
Num of events: 1538763, Num of IDs: 1538733, Num of UIDs: 1538703
Num of duplicated IDs: 30, Num of duplicated events: 30, duplicate rate: 0.00%
> uidcommon -n D0--21_10_08--cutflow_data--data--2016--md.root -t tree -N D0--21_10_08--std--data
Total common IDs: 1535074
> uiddump -n Dst--21_10_08--cutflow_data--data--2016--md.root -t tree
Num of events: 280804, Num of IDs: 280801, Num of UIDs: 280798
Num of duplicated IDs: 3, Num of duplicated events: 3, duplicate rate: 0.00%
> uiddump -n Dst--21_10_08--std--data--2016--md.root -t tree
Num of events: 278130, Num of IDs: 278127, Num of UIDs: 278124
Num of duplicated IDs: 3, Num of duplicated events: 3, duplicate rate: 0.00%
> uidcommon -n Dst--21_10_08--cutflow_data--data--2016--md.root -t tree -N Dst--21_10_08--std--dat
Total common IDs: 277528
In addition, I have a separate d0_dst_veto_ok
branch, and the veto rate is 21.5%.
@afernez I decided to put verification into separate comments, and just provide link in the top post, to avoid making top post super long.
For D**
production: It looks to me that the DIRAC backend is mis-behaving, and jobs starting from 12675011 MagUp
(index 158) for me are not working properly or are not submitting properly.
Now all the remaining jobs for this production have been submitted.
Also having problems on trying to remove files from lxplus:
ERROR Tried 3 times to remove file/folder: /afs/cern.ch/user/s/suny/work/gangadir/repository/suny/LocalXML/6.0/jobs/0xxx/160_1634326806.1191213__to_be_deleted_/9_1634326806.1674125__to_be_deleted__1634326806.1711805__to_be_deleted_/.__afs8E37
WARNING Error trying to fully remove Job #'160':: GangaException: Failed to remove file/folder: /afs/cern.ch/user/s/suny/work/gangadir/repository/suny/LocalXML/6.0/jobs/0xxx/160_1634326806.1191213__to_be_deleted_/9_1634326806.1674125__to_be_deleted__1634326806.1711805__to_be_deleted_/.__afs8E37
Production for large tracker-only 2016 MC for both polarities for D* -> Tau/Mu Nu
has been submitted.
Ok, I think ganga is cooperating enough for me now to resubmit the DD
+D**s
2016 MU-MD FullSim MC jobs. I've updated the top post with the new date. I'm hopeful these jobs will all run correctly now... It is unclear to me what the issue has been.
For 2012 signal production: Note that the MagUp have roughly twice as many events as MagDown (from BKK: MagUp: 213956, MagDown: 146177), and this seems directly proportional to the ntuple file size.
The TO production has finished, with HUGE file sizes:
38G # D* sig
38G
197G # D* norm
196G
There's a main difficulty in merging the large TO ntuple: the EOS dismounts within 1 day:
I have tried to either mount the EOS via sshfs
or merging directly on lxplus, both times the merger errored out with a file not exist error, I checked and confirmed that the EOS storage has been dismounted. I think it is only mounted for about a day before it dismounts.
I have resorted back to copy the full files to my local machine and merge locally. Unfortunately this puts a huge stress on my disk space and I'm trying to deleted unneeded files to have enough space for the ntuples.
For normalization, after skimming locally, the file size change is: 200 GB -> 120 GB (both approx.). Still, I'm really running out on disk space.
@manuelfs and I had agreed that we'll postpone the pruning of the TO ntuples to 0.9.6. Consider this done.
Here we list all GRID ntuple productions for
v0.9.5
.cutflow_data
, data, 2016 MDThis was originally meant to be a
std
production but was produced ascutflow_data
due to Yipeng's negligence. The production is sub-optimal but should be sufficient for rebuilding our run 2 data template.The differences between
cutflow_data
andstd
are:cutflow_data
doesn't have trigger filtering, so there will be more events. But we apply the trigger cuts offline anyway, so this won't be a problemNotes
Stripping v28r2
(previously was onStripping v28r1
) and now we have ~4k subjobs, instead of just ~2k. This suggests that latest stripping divides the data more finely.Beijing
andCNAF.it
.Beijing
kept failing so ended up getting blocked. Re-run the job atCNAF.it
twice. The first time no ntuple was produced. The second time it produced fine.std
, data, 2011 MDmc
, Sig and Norm, 2016 MD-MU, FullSimThe MC IDs are listed at here and here
mc
,DDX
andD**s
, 2016 MD-MU, FullSimMC IDs here and here
mc
,D*+TauNu
, 2012 MD-MU, FullSim, Sim08aMC IDs:
11574010
146, 148
mc
,D**
, 2016 MD-MU, FullSimMC IDs here and here.
149-169
mc
,D* -> Tau/Mu Nu
, 2016 MD-MU, tracker-onlyMC IDs:
11574011, 11574021
170-173
Postprocess of GRID ntuples
I typically remove some branches during this process. Use the
scripts/haddcut.py
:For the
<YAML_config>
: