valeriupredoi / bgcval2

Package for BGCVal v2.0
3 stars 0 forks source link

Batch timeseries analysis in slurm #120

Closed ledm closed 7 months ago

ledm commented 7 months ago

Closes #118

It's looking like this script is working now.

This PR adds a slurm queue based batch parallel processing of single job timeseries tool.

It's got the following features:

Need to do:

valeriupredoi commented 7 months ago

oh and also - this things really do deserve a test, not thru and thru with SLURM submission, but everything up to that. I can write the test when it's about ready :+1:

ledm commented 7 months ago

let's make sure the piping is done correctly; also, have you tried this in practice? We don't need any special env to pass to sbatch do we? Like any special environment variables

I've been using this for a few days and it works on jasmin. Your amendment to the subprocess also works too.

If batch_timeseries fails you get normal python errors. If it fails inside inside the sbatch script, then you get error messages in the places that we tell it to fail.

valeriupredoi commented 7 months ago

yeh that's how we want it to behave, so stdout can be piped to eg a file. Looks good, bud! Let me write a test for it!

ledm commented 7 months ago

I'm not ready to merge. Still need to add documentation & maybe get @DrYool to try it.

ledm commented 7 months ago

The next question I have:

ledm commented 7 months ago

Basically, the process for adding a new job, input.yml:

  1. analysis_compare -s -y input.yml: This generates the job download commands, which will run overnight. It also creates an html report, but it breaks if there's no data downloaded yet.
  2. Wait overnight for data to download on mass.
  3. batch_timeseries -y input.yml: This submits the job timeseries onto the processing nodes.
  4. analysis_compare -s -y input.yml: This job creates the html report.
  5. ./rsync_to_esmeval.sh: this copies to html to the web visible location on disk.

I suspect that these can be merged into fewer commands!

valeriupredoi commented 7 months ago

@ledm I added some test gubbins, make sure to pull or merge so there are no conflictseses

ledm commented 7 months ago

The logo in the README points towards a file on the main branch, but of course it's not available yet until this PR is merged.

ledm commented 7 months ago

Okay @valeriupredoi, I'm happy with this now.

valeriupredoi commented 7 months ago

all good by me too, bud! Go ahead and merge when you good 🍺