Closed khurtado closed 3 years ago
Hi @khurtado ,
Thanks for the debugging efforts 😄
In principle, that seems like an undesired behaviour. I said "in principle" because I do not fully understand how Pythia and Delphes makes use of the scaling factor (a.k.a number of jobs the workflow needs * N) internally.
In theory, each parallel job computes a set of values for a given benchmark (sm
, w
...) so it makes sense to compute them in parallel. In this scenario, increasing the number of jobs so that there are more than one job per benchmark, is a way to parallelize the computation of every single benchmark on its own. I am unsure if Pythia / Delphes are prepared to handle this, and if so, how it is done.
If you could confirm that this internal parallelization of benchmark-based computed values makes sense, and it is done correctly, then we could start debugging the memory consumption of each job.
As an initial hint, I always found this particular code snippet a bit funny. Bear in mind it is a rewrite from its older version, which generated a similar list. Maybe @irinaespejo knows where this code snippet comes from.
Hi @khurtado,
Thanks for the update. I think @Sinclert 's intuition is right. We need to investigate how to parallelize the jobs that have the same benchmark within a Pythia+Delphes step (so 6 times) instead of calling Pythia+Delphes 6*n_jobs times. I'm looking into the snippet. Luckily, Delphes is on github delphes/delphes and Pythia alisw/pythia8 so we can ask the developer team.
Hi @khurtado, is there a way we can access the cluster you are using for debugging purposes? thank you!
@irinaespejo Yes, let's discuss via slack
Hi all,
@Sinclert and I discussed a solution offline and I'll write it here for the record:
The problem of this issue is that the madminer-workflow, particularly the Pythia and Delphes steps do no scale well.
Right now, we control the number of jobs by an external parameter called num_generation_jobs
(here) i.e. the number of arrows (or jobs) leaving the generate step in the current architecture is num_generation_jobs
. Each arrow leaving the generate step will make computations according to the distribution of the benchmarks which is controlled by this snippet. _This means a Pythia and a Delphes instance is called num_generation_jobs
times. Which could be a cause for the bad scalability._
Instead, we propose a subtle change in the architecture of the workflow. The number of arrows (jobs) leaving the generate step will be num_benchmarks
and not num_generation_jobs
. The each arrow will pass num_jobs
to the Pythia and Delphes state. We hope that Delphes and Pythia will know how to internally parallelize a big chunk of jobs. Maybe @khurtado can comment on this Delphes/Pythia internal parallelization.
The num_benchmarks
depends on the user-specified benchmarks here and on morphing max_overall_power
Changes to make:
(please do not hesitate to update the to-do list in the comments below)
Non-solved questions about the proposed solution
This makes sense and sounds good to me!
I don't know much about the internal parallelization details on Delphes/Pythia unfortunately, so I can't comment on that.
Please, let me know once changes are done and I would be happy to test (or if I can help with anything besides testing).
After a bit of research, it seems that MadGraph (the pseudo-engine used to run Pythia and Delphes), have an optional argument called run_mode
(MadGraph forum comment).
This could be used to specify:
run_mode=0
: single core (no parallelization).run_mode=1
: cluster mode (not useful, as we are relying on REANA to deal with back-ends).run_mode=2
: multi-core (process-based parallelization).Sadly, I could not find an official reference to this argument, so not sure if the accepted values have changed on modern versions of MadGraph (2.9.X
and 3.X.X
). In any case, this would be the "last piece" to migrate:
num_jobs
among M benchmarks_.num_jobs
to each of the M benchmarks_.Wow that's interesting. Maybe just assigning run_mode=2
with the current architecture is able to scale. I'll try it and get back to you tomorrow.
@Sinclert the options for run_mode
seem to be the same in modern versions of Madgraph:
https://bazaar.launchpad.net/~madteam/mg5amcnlo/3.x/view/head:/Template/LO/README#L80
@khurtado @irinaespejo
I have created a new branch, mg_process_parallelization, to implement the changes we discussed about. In principle, the Docker image coming from that branch (madminer-workflow-ph:0.5.0-test
) should be able to parallelize the MadGraph steps of each benchmark.
In a nutshell:
.tar.gz
folders now iterates on the number of benchmarks.run_mode=2
: to run in multi-core mode.nb_core=None
: to assign as many processes as cores detected.Bear in mind that the num_generation_jobs
workflow-level parameter has not been removed, but it is currently useless, as we are setting the number of parallel processes, per benchmark, by the maximum number possible (usingnb_core=None
).
Let me know if fine-tunning the number of processes per benchmark is something of interest.
Please, run the sub-workflow with the new Docker image (0.5.0-test
), and compare the results with the old one (0.4.0
).
@sinclert wow nice, I was also working on this without success. Regarding point 3
A me5_configuration.txt file has been added to the set of cards, with options: run_mode=2: to run in multi-core mode. nb_core=None: to assign as many processes as cores detected.
When I uncommented # run_mode=2
and ran the workflow on yadage-run
I saw that there where cards still with the uncommented # run_mode=2
begin created in the generate step ans transmitted to he pythia step.
I think the easiest solution to check whether we are really running on run_mode=2
is that @khurtado runs the branch mg-process-parallelization on the VT3 cluster and lets us know if the scalability issue is solved. @khurtado let us know right away of you run into trouble. Thank you!!
Actually, since I have access to the cluster, I'm going to run the branch mg-process-parallelization
workflow now
Hi everyone,
The results from running scailfin/madminer-workflow-ph
(mg-process-parallelization) on VC3:
Sanity checks:
The workflow finishes successfully (the status is running but all files of the steps are there)
The physics workflow indeed uses the branch code
Other checks:
run_mode = 2
introduced in the docker image here
The command grep -R "run_mode = 2"
shows indeed that
./pythia_3/mg_processes/signal/Cards/me5_configuration.txt:run_mode = 2 ./pythia_3/mg_processes/signal/madminer/cards/me5_configuration_0.txt:run_mode = 2 ./delphes_3/extract/madminer/cards/me5_configuration_3.txt:run_mode = 2 (and all the other pythia and delphes steps) All good!
Now, scalability tests? Answering to Sinclert, yes we are interested in fine-tunning num of processes per benchmark.
Memory usage results from running branch mg_process_parallelization
example of Delphes ClusterId MemoryUsage Args 217 318 /reana/users/00000000-0000-0000-0000-000000000000/workflows/e23b2ff1-8625-45da-8c13-0c05665dd6e2 madminertool/madminer-workflow-ph:0.5.0-test sh -c '/madminer/scripts/4_delphes.sh -p /madminer -m software/MG5_aMC_v2_9_4 -c /reana/users/00000000-0000-0000-0000-000000000000/workflows/e23b2ff1-8625-45da-8c13-0c05665dd6e2/workflow_ph/configure/data/madminer_config.h5 -i /reana/users/00000000-0000-0000-0000-000000000000/workflows/e23b2ff1-8625-45da-8c13-0c05665dd6e2/ph/input.yml -e /reana/users/00000000-0000-0000-0000-000000000000/workflows/e23b2ff1-8625-45da-8c13-0c05665dd6e2/workflow_ph/pythia_4/events/Events.tar.gz -o /reana/users/00000000-0000-0000-0000-000000000000/workflows/e23b2ff1-8625-45da-8c13-0c05665dd6e2/workflow_ph/delphes_4'
example of Pythia: 210 196 /reana/users/00000000-0000-0000-0000-000000000000/workflows/e23b2ff1-8625-45da-8c13-0c05665dd6e2 madminertool/madminer-workflow-ph:0.5.0-test sh -c '/madminer/scripts/3_pythia.sh -p /madminer -m software/MG5_aMC_v2_9_4 -z /reana/users/00000000-0000-0000-0000-000000000000/workflows/e23b2ff1-8625-45da-8c13-0c05665dd6e2/workflow_ph/generate/folder_0.tar.gz -o /reana/users/00000000-0000-0000-0000-000000000000/workflows/e23b2ff1-8625-45da-8c13-0c05665dd6e2/workflow_ph/pythia_3'
Hi @Sinclert, I've been testing the mg-process-parallelization branch on scailfin/workflow-madminer-ph
. When running make yadage-run
I found the following error
on the file .yadage/workflow_ph/generate/_packtivity/generate.run.log
there's
2021-09-07 09:10:38,583 | pack.generate.run | INFO | starting file logging for topic: run 2021-09-07 09:11:03,362 | pack.generate.run | INFO | b'Benchmark: 0 sm' 2021-09-07 09:11:03,362 | pack.generate.run | INFO | b'Benchmark: 1 w' 2021-09-07 09:11:03,362 | pack.generate.run | INFO | b'Benchmark: 2 morphing_basis_vector_2' 2021-09-07 09:11:03,362 | pack.generate.run | INFO | b'Benchmark: 3 morphing_basis_vector_3' 2021-09-07 09:11:03,362 | pack.generate.run | INFO | b'Benchmark: 4 morphing_basis_vector_4' 2021-09-07 09:11:03,362 | pack.generate.run | INFO | b'Benchmark: 5 morphing_basis_vector_5' 2021-09-07 09:11:03,610 | pack.generate.run | INFO | b"sed: can't read s/nb_core = None/nb_core = 1/: No such file or directory"
This was solved by doing the following changes:
scripts/2_generate.sh
the " ". The new line should look like sed -i \
madminertool/madminer-workflow-ph:0.5.0-test-2
make yadage-run
The workflow finishes successfully now without any further errors.
Hi @irinaespejo ,
I included ""
because of macOS compatibility. I thought it was a quick fix to make the script runnable both in macOS and Linux. It seems it did not work.
According to this StackOverflow post, we could achieve this by using the -e
flag instead. Could you try the following snippet and confirm that it runs on Linux?
sed -i \
-e "s/${default_spec}/${custom_spec}/" \
"${SIGNAL_ABS_PATH}/madminer/cards/me5_configuration_${i}.txt"
I just tested the snippet you posted and it runs successfully :heavy_check_mark: (my upload internet connection is pretty slow)
The PR changing the parallelization strategy (https://github.com/scailfin/madminer-workflow-ph/pull/11) has been merged.
We should be in a better spot to test the total time + memory consumption of each benchmark job.
Hi @khurtado and @irinaespejo,
Is there anything else to discuss within this issue? Have you tried the latest version of the workflow?
The last version of the workflow ran succesfully after Kenyi did some fixing with the cluster permits. @khurtado how is the situation in the cluster to submit computationally intensive workflows? Can we just try? Thanks!!
@irinaespejo Yes, the cluster should have workers to work with. I still need to fix the website certs, I will do that tomorrow.
Hi. I am closing this issue for now.
For future reporting of performance issues / configuration tweaks / etc, please, open a separate issue.
Hello,
This was discussed via slack at some point, so I just wanted to open an issue so this is not forgotten. When scaling up a workflow via
num_generation_jobs
, the number of jobs in the physics stage increases properly, but the memory per job also considerably increases per job.E.g.: If
num_generation_jobs
is increased by a factor 10 (from6
to60
), memory usage per delphes job goes from ~700 MB to 7 GB e.g.:num_generation_jobs: 60 122 7325 /reana/users/00000000-0000-0000-0000-000000000000/workflows/8aa9df1b-168b-4622-981e-01be73344b90 madminertool/madminer-workflow-ph:0.3.0 sh -c '/madminer/scripts/4_delphes.sh -p /madminer -m software/MG5_aMC_v2_9_3 -c /reana/users/00000000-0000-0000-0000-000000000000/workflows/8aa9df1b-168b-4622-981e-01be73344b90/workflow_ph/configure/data/madminer_config.h5 -i /reana/users/00000000-0000-0000-0000-000000000000/workflows/8aa9df1b-168b-4622-981e-01be73344b90/ph/input.yml -e /reana/users/00000000-0000-0000-0000-000000000000/workflows/8aa9df1b-168b-4622-981e-01be73344b90/workflow_ph/pythia_33/events/Events.tar.gz -o /reana/users/00000000-0000-0000-0000-000000000000/workflows/8aa9df1b-168b-4622-981e-01be73344b90/workflow_ph/delphes_33'