scailfin / MadGraph5-simulation-configs

MadGraph5_aMC@NLO source files and configuration files for event simulaiton
MIT License
0 stars 1 forks source link

Determine if signac can be used for workflow management #13

Open matthewfeickert opened 3 years ago

matthewfeickert commented 3 years ago

At the moment everything controlling the workflows on Blue Waters is controlled through Bash scripts that need user configuration that submit Torque/PBS jobs with qsub that submit files to Shifter containers using aprun and the whole mess needs to be tired together with Bash scripts again to guide it. This is pretty ugly and it would be nicer to use some sort of workflow system if possible.

From SciPy 2019, 2020, and 2021 I've seen @bdice and co discuss using signac to be able to control workflows on HPCs that are dealing with automation of thousands of datasets. So this might be an interesting channel to look at as a way to escape Bash-everything.

Relevant links:

BenGalewsky commented 3 years ago

I wonder if Parsl could be a good fit for this? We already have a fair amount of experience with Parsl executors on BlueWaters since they are shared with funcX and it's nice that you can use Python as your workflow definition language

matthewfeickert commented 3 years ago

Does Parsl also keep track of the data provenance produced during the workflow?

BenGalewsky commented 3 years ago

From the Parsl help slack channel:

most ways that I've seen people use parsl, they aren't telling parsl about the data, in the sense of "here are the files" or "here are my databases" so parsl doesn't usually know anything about that at all but there is a reasonable collection of information in the monitoring db if you turn it on about which tasks depended on which other tasks

it doesn't use the word "provenance" at all, but you can ask questions like "what tasks were run as pre-reqs to the task I am pointing at" and for all of those tasks get info like where/when it ran

matthewfeickert commented 3 years ago

I'm going to the signac office hours today to discuss with them if it would be a reasonable solution here, but it seems like a more complete workflow solution than Parsl in that it is able to handle the entire workflow and provenance end to end. The less inventing of data management for recombination of hundreds/thousands of jobs per stage that I need to do the better. :+1:

matthewfeickert commented 3 years ago

So after attending the signac office hours today (thanks for a very welcoming time @bdice and @atravitz!) I walked the team through the basics of the workflow on Blue Waters at the moment

BlueWaters_workflow

and the good news is that they think that even with all of the containerization this workflow should be well suited to using signac. Another good thing that @bdice mentioned is that to move from Blue Waters to another HPC system, like Delta, the workflow would be the same and the only thing that I would need to change would be the machine specific template. But having to only change ~1 file to port the whole workflow seems awesome! :sparkles:


@BenGalewsky, @bdice and I are also going to try to do some pair programming next week once I've gone through the docs and intro workflow tutorial and attempted to implement some of the workflow. If you'd like to join as well you're welcome too!

matthewfeickert commented 3 years ago

Just a note to self, that given the refactoring of PRs #15, #19, #20, #21 the simulation pipeline (stages 1 through 3) is now fully parallelized so that each stage is operating on a slice of the total number of events simulated and then recombined (c.f. PR #21) into an event level ROOT file at the end of the preprocessing stage (stage 3).