jdblischak / smk-simple-slurm

A simple Snakemake profile for Slurm without --cluster-config
Creative Commons Zero v1.0 Universal
120 stars 14 forks source link

Use of Job Grouping #10

Closed ocaisa closed 1 year ago

ocaisa commented 1 year ago

We're preparing a lesson on HPC Workflows for HPC Carpentry and have chosen Snakemake (from the big range of possible choices) for a number of reasons.

You mention in the README that this profile doesn't work with Job Grouping. That scares me a little as I'm afraid of teaching something that is likely to cause issues for the end users without a lot of tweaking. I was just wondering if (the recently merged) https://github.com/snakemake/snakemake/pull/1218 changes that position?

jdblischak commented 1 year ago

We're https://github.com/snakemake/snakemake/pull/1218 and have chosen Snakemake (from the big range of possible choices) for a number of reasons.

@ocaisa That's awesome. Thanks for working on creating such a useful learning resource!

You mention in the README that this profile doesn't work with Job Grouping. That scares me a little as I'm afraid of teaching something that is likely to cause issues for the end users without a lot of tweaking.

I mention that in the README to be up-front with users, but I honestly don't see this as a big limitation. Do you personally use job groups regularly? I've found them to be awkward and difficult to use, and I doubt the average Snakemake user will ever need to worry about this niche feature.

I was just wondering if (the recently merged) https://github.com/snakemake/snakemake/pull/1218 changes that position?

I don't think so. I think the main issue with job groups is that they remove the key rule, which this profile uses to name the jobs and log files:

https://github.com/jdblischak/smk-simple-slurm/blob/f97e0bf50310335ea5621c204596c50abcf78a49/simple/config.yaml#L8-L9

So if you wanted to use job groups, I think all you would have to do is delete {rule} from these two lines.

Note that I'm pretty sure the official Slurm profile has this same limitation. It avoids an error by default by not assuming that the key rule exists. But if you were to try to give jobs informative names via SBATCH_DEFAULTS, you'd get the same error from a lack of {rule}

https://github.com/Snakemake-Profiles/slurm/blob/8ee65d648e502beba406059e2a2d026110d38b9a/%7B%7Bcookiecutter.profile_name%7D%7D/slurm_utils.py#L114

ocaisa commented 1 year ago

Thanks for taking the time to detail the issue, now I understand things better. (BTW I added an incorrect link in my original comment, I've edited it to point to our markdown notes for the Hackathon)

ocaisa commented 1 year ago

I should say the reason we even looked at job groups is because there are HPC sites that document against using snakemake because it can so easily overwhelm the scheduler with tiny jobs. The unfortunate thing for us is that our maintainers are not that familiar with snakemake, so understanding why was not trivial for us.

jdblischak commented 1 year ago

I should say the reason we even looked at job groups is because there are HPC sites that document against using snakemake because it can so easily overwhelm the scheduler with tiny jobs.

I agree that is a legitimate concern. Array jobs are much better suited for efficiently scheduling thousands of small jobs, and I occasionally have to switch to these instead of Snakemake. On the other hand, many uses of Snakemake will not involve such intense requirements. Teaching researchers how to use Snakemake to automate common tasks like FASTQ -> BAM -> Counts for various NGS techniques will both make their lives much easier and also make their results much more reproducible.

Another big concern for systems admins is the overall stress on the Slurm scheduler, which I address in the section Use speed with caution. As long as users set max-jobs-per-second and max-status-checks-per-second to reasonable levels, they should avoid the ire of the sysadmins.

jdblischak commented 1 year ago

Also, I created an example of job grouping using this profile. Note that Snakemake 7.11 doesn't address some of the fundamental limitations with this feature (so in general I don't recommend using job grouping, regardless of whether you use this or a different profile).

ocaisa commented 1 year ago

Thanks so much for all the effort you put into answering this issue!