Closed yipengsun closed 2 years ago
I tried to skim the TO MagDown of normalization without merging them. Before skimming, the total size is ~197 GB. After it's ~122 GB.
The commit message above should be SIGNAL, instead of normalization, but let's not rewrite history for a typo :-P
@manuelfs @afernez I've updated a preliminary plan for ntuple production. The main idea is: Trying to not exceed 1 TB for each person.
Note that the production plan is blocked until #99 is resolved, as that marks the job-submission all the way to ntuple merging on server has been validated at least once.
I tried sending jobs, but got this error when entering ganga
|22:44:42|lxplus789:~$ ganga
*** Welcome to Ganga ***
Version: 8.5.7
Documentation and support: http://cern.ch/ganga
Type help() or help('index') for online help.
This is free software (GPL), and you are welcome to redistribute it
under certain conditions; type license() for details.
INFO reading config file /cvmfs/[ganga.cern.ch/Ganga/install/8.5.7/lib/python3.8/site-packages/ganga/GangaLHCb/LHCb.ini](http://ganga.cern.ch/Ganga/install/8.5.7/lib/python3.8/site-packages/ganga/GangaLHCb/LHCb.ini)
INFO reading config file /cvmfs/[lhcb.cern.ch/lib/GangaConfig/config/8-0-0/GangaLHCb.ini](http://lhcb.cern.ch/lib/GangaConfig/config/8-0-0/GangaLHCb.ini)
2022/02/03 22:44:58 ERROR: Unauthorized 401 - do you have authentication tokens?
Error "/usr/bin/myschedd.sh |": command terminated with exit code 256
Configuration Error Line 0 while reading config source /usr/bin/myschedd.sh |
I emailed lhcb-distributed-analysis@cern.ch, and they suggested I looked into my .bashrc
. I removed these two suspcious lines
export KRB5_CONFIG=/etc/krb5.conf
export KRB5CCNAME=FILE:/var/tmp/krb5_cc_cache
and the error went away!
I then submitted jobs for the 11874050 11874070 12874010 12874030
with this script, and they seem to go through.
Ganga In [6]: jobs
Ganga Out [6]:
Registry Slice: jobs (11 objects)
--------------
fqid | status | name | subjobs | application | backend | backend.actualCE | comment | subjob status
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 | new | | | Executable | Localhost | | | 0 / 0
1 | failed |First gang | | GaudiExec | Dirac | ANY | | 0 / 0
2 | failed |First gang | | GaudiExec | Dirac | ANY | | 0 / 0
3 | new |Dst_D0--22 | | Executable | Localhost | |Dst_D0--22_02_04--mc--tracker_ | 0/0/0/0
4 | new |Dst_D0--22 | | Executable | Localhost | |Dst_D0--22_02_04--mc--tracker_ | 0/0/0/0
5 | new |Dst_D0--22 | | Executable | Localhost | |Dst_D0--22_02_04--mc--tracker_ | 0/0/0/0
6 | new |Dst_D0--22 | | Executable | Localhost | |Dst_D0--22_02_04--mc--tracker_ | 0/0/0/0
7 | new |Dst_D0--22 | | Executable | Localhost | |Dst_D0--22_02_04--mc--tracker_ | 0/0/0/0
8 | new |Dst_D0--22 | | Executable | Localhost | |Dst_D0--22_02_04--mc--tracker_ | 0/0/0/0
9 | new |Dst_D0--22 | | Executable | Localhost | |Dst_D0--22_02_04--mc--tracker_ | 0/0/0/0
10 | new |Dst_D0--22 | | Executable | Localhost | |Dst_D0--22_02_04--mc--tracker_ | 0/0/0/0
No idea what those first 4 jobs are though.
Those probably the test jobs you submitted a long time ago, you can safely remove them with, say jobs[0].remove()
in ganga
.
I also submitted my (small) jobs, with this script.
I can see on Dirac that most of my jobs are done running already, so probably they just have to be downloaded to eos now. I am slightly worried about the state of ganga (as usual...), because it seems to be freezing again when I try to enter the ipython session. I will let it go for a while, though, and hope that the jobs are downloading correctly, and once they're done that I'll be able to run ganga normally.
My jobs do not seem to have started (same output as above when I type jobs
in ganga
). Is there anything that I can check?
For job not starting, I have no idea. Maybe wait another day and if they still don't start, run jobs[index].resubmit()
?
I finish production for all Bs
jobs. The DaVinci ntuples are 12 GB total (this is before any local skimming). After skimming the total size is about 7.5 GB.
My jobs never started, so I deleted them, and resubmitted them. Then I realized that I got an error
|03:56:12|lxplus776:~/code/lhcb-ntuples-gen/run2-rdx/jobs$ ./22_02_03-tracker_only_ddx_22to25.sh
*** Welcome to Ganga ***
Version: 8.5.7
Documentation and support: http://cern.ch/ganga
Type help() or help('index') for online help.
This is free software (GPL), and you are welcome to redistribute it
under certain conditions; type license() for details.
INFO reading config file /afs/cern.ch/user/m/manuelf/.gangarc
INFO reading config file /cvmfs/ganga.cern.ch/Ganga/install/8.5.7/lib/python3.8/site-packages/ganga/GangaLHCb/LHCb.ini
INFO reading config file /cvmfs/lhcb.cern.ch/lib/GangaConfig/config/8-0-0/GangaLHCb.ini
INFO Using LHCbDirac version prod
=== Welcome to Ganga on CVMFS. In case of problems contact lhcb-distributed-analysis@cern.ch ===
Reconstruction script: ../reco_Dst_D0.py
Condition file: ../conds/cond-mc-2016-md-sim09k-tracker_only.py
LFN: /MC/2016/Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8/Sim09k/Reco16/Filtered/11874050/D0TAUNU.SAFESTRIPTRIG.DST
NTuple name: Dst_D0--22_02_07--mc--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8_Sim09k_Reco16_Filtered_11874050_D0TAUNU.SAFESTRIPTRIG.DST.root
Truncated job name: Dst_D0--22_02_07--mc--11874050--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-T
Preparing job Dst_D0--22_02_07--mc--11874050--tracker_only--MC_2016_Beam6500GeV-2016-MagDown-T
GangaDiracError: All the files are only available on archive SEs. It is likely the data set has been archived. Contact data management to request that it be staged
(consider --debug option for more information)
INFO Stopping the DIRAC process
INFO Stopping Job processing before shutting down Repositories
INFO Shutting Down Ganga Repositories
INFO Registry Shutdown
@afernez @yipengsun Have you gotten GangaDiracError: All the files are only available on archive SEs. It is likely the data set has been archived. Contact data management to request that it be staged
before? Is it a bug on my part or precisely the 8 samples I sent were not staged?
I had 1 similar problem before, which turned out to be that the LFNs were wrong (remember that at one time I asked Svende to send an email to some convenor to re-stage these files, only to discover that these files don't exist).
In this case, since the main part of the LFN is working: /MC/2016/Beam6500GeV-2016-MagDown-TrackerOnly-Nu1.6-25ns-Pythia8/Sim09k/Reco16/Filtered/
, I'm wondering if the MC IDs listed in the table is actually wrong? Could you check that on DIRAC?
And now you see this error message:
GangaDiracError: All the files are only available on archive SEs. It is likely the data set has been archived. Contact data management to request that it be staged
(consider --debug option for more information)
can be misleading, because it will print out this message even if the sample doesn't exist anywhere!
I also checked the table in the top post and the data sources listed in our wiki. They are consistent. So if the MC IDs are wrong, they are at least wrong in a consistent way.
Thank you Yipeng!
It was indeed that the MC IDs didn't exist (I had mistakenly used the Run 1 IDs...). From now on I'll know that this message is misleading.
I submitted the jobs with the correct IDs and now they appear as submitted
and running
as opposed to new
.
fqid | status | name | subjobs | application | backend | backend.actualCE | comment | subjob status
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
19 | submitted |Dst_D0--22 | 42 | GaudiExec | Dirac | None |Dst_D0--22_02_07--mc--tracker_ | 40/0/0/0
20 | submitted |Dst_D0--22 | 61 | GaudiExec | Dirac | None |Dst_D0--22_02_07--mc--tracker_ | 60/0/0/0
21 | running |Dst_D0--22 | 9 | GaudiExec | Dirac | None |Dst_D0--22_02_07--mc--tracker_ | 9/0/0/0
22 | running |Dst_D0--22 | 9 | GaudiExec | Dirac | None |Dst_D0--22_02_07--mc--tracker_ | 9/0/0/0
23 | submitted |Dst_D0--22 | 25 | GaudiExec | Dirac | None |Dst_D0--22_02_07--mc--tracker_ | 23/0/0/0
24 | submitted |Dst_D0--22 | 28 | GaudiExec | Dirac | None |Dst_D0--22_02_07--mc--tracker_ | 27/0/0/0
25 | running |Dst_D0--22 | 7 | GaudiExec | Dirac | None |Dst_D0--22_02_07--mc--tracker_ | 7/0/0/0
26 | running |Dst_D0--22 | 8 | GaudiExec | Dirac | None |Dst_D0--22_02_07--mc--tracker_ | 8/0/0/0
I also committed the corrected script.
By the way, here's the number of MC events in disk per year in markdown
# | Sample | Name | MC ID | TOTAL [M] | 2015 [M] | 2016 [M] | 2017 [M] | 2018 [M] |
---|---|---|---|---|---|---|---|---|
1 | D0 | B- → D0 μ ν | 12573012 | 161.14 | 7.90 | 45.44 | 47.37 | 60.43 |
2 | D0/D*+ | B0 → D*+ μ ν | 11574021 | 274.86 | 13.47 | 77.51 | 80.81 | 103.07 |
3 | D0 | B- → D*0 μ ν | 12773410 | 452.40 | 22.17 | 127.58 | 133.00 | 169.65 |
4 | D0 | B- → D0 τ ν | 12573001 | 11.08 | 0.54 | 3.13 | 3.26 | 4.16 |
5 | D0/D*+ | B0 → D*+ τ ν | 11574011 | 60.63 | 2.97 | 17.10 | 17.82 | 22.74 |
6 | D0 | B- → D*0 τ ν | 12773400 | 35.07 | 1.72 | 9.89 | 10.31 | 13.15 |
7 | D0/D*+ | B0 → D**+ μ ν | 11874430 | 154.35 | 7.56 | 43.53 | 45.38 | 57.88 |
8 | D0/D*+ | B0 → D**+ τ ν | 11874440 | 1.20 | 0.06 | 0.34 | 0.35 | 0.45 |
9 | D0/D*+ | B- → D**0 μ ν | 12873450 | 126.78 | 6.21 | 35.75 | 37.27 | 47.54 |
10 | D0/D*+ | B- → D**0 τ ν | 12873460 | 1.80 | 0.09 | 0.51 | 0.53 | 0.68 |
11 | D0 | B- → D**(→D0ππ) μ ν | 12675011 | 22.21 | 1.09 | 6.26 | 6.53 | 8.33 |
12 | D0 | B0 → D**(→D0ππ) μ ν | 11674401 | 24.54 | 1.20 | 6.92 | 7.22 | 9.20 |
13 | D0/D*+ | B- → D*(→D+ππ) μ ν | 12675402 | 15.87 | 0.78 | 4.48 | 4.67 | 5.95 |
14 | D0/D*+ | B0 → D*(→D+ππ) μ ν | 11676012 | 16.24 | 0.80 | 4.58 | 4.77 | 6.09 |
15 | D0 | B- → D*(→D0ππ) μ ν | 12875440 | 26.62 | 1.30 | 7.51 | 7.83 | 9.98 |
16 | D0 | Bs → Ds**(→D0K) μ ν | 13874020 | 5.48 | 0.27 | 1.55 | 1.61 | 2.06 |
17 | D*+ | Bs → D**+μ ν | 13674000 | 5.04 | 0.25 | 1.42 | 1.48 | 1.89 |
18 | D0 | B0 → D0(Xc → μ νX')X | 11894600 | 125.90 | 6.17 | 35.50 | 37.01 | 47.21 |
19 | D0 | B0 → D0(Ds → τν)X | 11894200 | 3.46 | 0.17 | 0.97 | 1.02 | 1.30 |
20 | D0 | B+ → D0(Xc → μ νX')X | 12896300 | 75.81 | 3.71 | 21.38 | 22.29 | 28.43 |
21 | D0 | B+ → D0(Ds → τν)X | 12896310 | 8.87 | 0.43 | 2.50 | 2.61 | 3.33 |
22 | D*+ | B0 → D*+ (Xc → μ ν X')X | 11894610 | 44.62 | 2.19 | 12.58 | 13.12 | 16.73 |
23 | D*+ | B0 → D*+(Ds → τ ν) X | 11894210 | 4.12 | 0.20 | 1.16 | 1.21 | 1.54 |
24 | D*+ | B+ → D*+ (Xc → μ ν X')X | 12895400 | 18.03 | 0.88 | 5.09 | 5.30 | 6.76 |
25 | D*+ | B+ → D*+(Ds → τ ν) X | 12895000 | 3.00 | 0.15 | 0.85 | 0.88 | 1.13 |
I think we should copy your MD table to the top post, as well as adding it to our docs
, as it is SUPER useful.
Added Manuel's markdown table at: https://github.com/umd-lhcb/rdx-run2-analysis/blob/master/docs/mc_prod.md#overview
@manuelfs Actually, the pdf version of the IDs do contain at least 2 errors: The index 20 and 21 should be 12893600, 12893610
.
I can fix the markdown table, can you fix the pdf table?
@yipengsun With your fix of placing my gangadir on AFS but setting the workspace
folder as a soft link to a folder on EOS, everything is finally working for me with ganga!
I've submitted the jobs for my "remaining D* Tau Nu
" task above now, and they're running normally (and my monitoring is working correctly).
Submitted the J/psi K
data job. Now for me the only missing part is the J/psi K MC
.
@manuelfs Actually, the pdf version of the IDs do contain at least 2 errors: The index 20 and 21 should be
12893600, 12893610
.I can fix the markdown table, can you fix the pdf table?
I fixed the source Excel files and committed a .pdf version of the split-by-year table https://github.com/umd-lhcb/group-talks/tree/master/rdx/tables
Great. I'll add a link to that table and remove the bugged pdf then, to avoid confusion.
I have 1 French and 1 Russian backend that keep failing: CPPM.fr
and RRCKI.ru
. I'm going to blacklist them and resubmit.
I believe the correct way to do it is this:
for sj in jobs[181].subjobs.select(status='failed'):
sj.backend.settings["BannedSites"].append("LCG.RRCKI.ru")
Then resubmit.
Some of the inputs are only available in RRCKI.ru
. When I ban it, the jobs fails immediately.
I used the following command to get input LFNs to a subjob:
jobs[189].subjobs[57].inputdata.getLFNs()
I'm trying to commit the DD
ntuples, but keep getting the following errors when trying to sync
|10:32:30|glacier:~/code/lhcb-ntuples-gen$ git annex sync
pull origin
Warning: Permanently added the ECDSA host key for IP address '140.82.113.4' to the list of known hosts.
ok
pull glacier
ok
push origin
Enumerating objects: 206, done.
Counting objects: 100% (206/206), done.
Delta compression using up to 32 threads
send-pack: unexpected disconnect while reading sideband packet
Compressing objects: 100% (158/158), done.
fatal: the remote end hung up unexpectedly
I tried git config http.postBuffer 524288000
as suggested here, and also
export GIT_TRACE_PACKET=1
export GIT_TRACE=1
export GIT_CURL_VERBOSE=1
as suggested here, to no avail. Any ideas?
Hmm it could be because I'm copying a large amount of files to glacier
? I'm not sure if it's relevant though. I'll google it and see what's going on.
Wait, this is an error on github's part. Did you accidentally commited some large files to git
directly, instead of annexing them?
Looking at the history, I may have added some of the files with git add .
indeed. I deleted my last commit (it was not pushed), unstaged the files, and will try to commit them again.
With my updated batch_skim.sh
, I got:
Verifying output for Job 180, which has 111 subjobs...
subjob 85: ntuple missing!
subjob 88: ntuple missing!
subjob 98: ntuple missing!
Job 180 output verification failed with 3 error(s).
If I manually remove folder 85
:
Verifying output for Job 180, which has 111 subjobs...
Found 110 subjobs, which =/= 111. Terminate now.
The actual outputs are colored. Also, the script will terminate if current job has any error.
I counted 14 MC species (each has 2 folders, 1 per polarity) as of Feb 27, 2022. The numbers checks out.
The MC ghost production will be tracked in https://github.com/umd-lhcb/lhcb-ntuples-gen/issues/115. Closed.
Here we list all GRID ntuple productions for
v0.9.6
.General production plan for
0.9.6
The main idea is: In this version, we produce ALL required ntuples for 2016, for both polarities.
We'll use the produced ntuples to fully setup a fit for year 2016, make sure these templates work (in the sense of good convergence); at the same time, do cut optimization to see which cuts can be embedded in DaVinci directly.
Note: As a part of cut optimization, we need to validate that optimized cuts work with our current fitter (plus some minimal changes, if needed).
After all these steps, we'll produce 2017 and 2018 ntuples in the next version
0.9.7
.Divide the production among Alex, Manuel, and Yipeng
From Yipeng's experience,
11574021
is about 200 GB for each polarity.A naive estimation on size per Million event on disk: 5.16 GB / 1M evt.
Note: Below the size are for BOTH polarities, BEFORE local branch removal
Production for Alex:
12573012
(idx 1, ~240 GB)D* Tau Nu
(idx 4, 6, ~70 GB)D**
(idx 7-15, ~600 GB)Production for Manuel:
12773410
(idx 3, ~660 GB)DDX
(idx 22-25, ~50 GB)Production for Yipeng:
Mu
misID11574021
(already done, ~40 GB per polarity, ~24 GB after skimming) (idx 2) (see #87 for more details)10574011
(already done, ~200 GB per polarity, ~122 GB after skimming) (idx 5)DDX
(idx 18-21, ~160 GB, ~200 GB actually)Bs
(idx 16-17, ~16 GB, ~12 GB actually)2016 MC ghost