Defining the "common workflow" for our lesson

hpc-carpentry / old-hpc-workflows

Scaling studies on high-performance clusters using Snakemake workflows

https://www.hpc-carpentry.org/old-hpc-workflows/

Other

8 stars 2 forks source link

Defining the "common workflow" for our lesson #5

Open ocaisa opened 2 years ago

ocaisa commented 2 years ago

The current example is a set of books that are downloaded. How do we define our raw data? We effectively don't have any, what we are doing is taking measurements with amdahl which will become our raw data.

In 01-introduction.md we start off by creating a bash script describing the manual workflow. We will somehow need to replicate this. This will require:

[x] Generating a set of data (which will require parsing of amdahl output, or perhaps adding a --terse option to amdahl, see #6). Redirecting the amdahl output to a file could work...or indeed using the output files from SLURM itself.
[ ] Plotting the result (both graphically...and perhaps in terminal)

ocaisa commented 2 years ago

The "common workflow" identified in 01-introduction.md is

Read a data file.
Perform an analysis on this data file.
Write the analysis results to a new file.
Plot a graph of the analysis results.
Save the graph as an image, so we can put it in a paper.
Make a summary table of the analyses, which requires aggregation of all previous results.

Can we cover the same points? I think the last point is the hardest (and unnecessary for us). Order could be changed though to

Create data files (Run slurm job using job template we provide, store output in well-defined filename)
Perform an analysis on the data files. (extract our timings convert into speedup)
Write the analysis results to a new file.
Plot a graph of the analysis results (could consider doing this locally or remotely).
Save the graph as an image.
Pull the results (in this case an image) from the cluster and review it.

reid-a commented 2 years ago

This could build off what was done in the HPC Intro lesson -- call back to that lesson, show a job script, and look at the output of the job script. This could live in first episode. Induces HPC Intro as a pretty hard pre-requisite for this lesson.

bkmgit commented 2 years ago

The current format with the job submission script at the end seems ok. However, one may wish to enable attendees to practice using SLURM, in which case one could introduce the job submission script at the beginning. The lesson seems independent of HPC intro, but does allow practice using a scheduler.