snakemake / snakemake-executor-plugin-slurm

A Snakemake executor plugin for submitting jobs to a SLURM cluster
MIT License
17 stars 18 forks source link

Job submission with --prioritize runs all at once, then stalls #134

Open KB1RD opened 2 months ago

KB1RD commented 2 months ago

Software Versions

$ snakemake --version
8.16.0
$ conda list | grep snakemake-executor-plugin-slurm
snakemake-executor-plugin-slurm 0.8.0              pyhdfd78af_0    bioconda
snakemake-executor-plugin-slurm-jobstep 0.2.1              pyhdfd78af_0    bioconda
$ sinfo --version
slurm 23.11.4

Describe the bug When using the --prioritize option of Snakemake, the following happens: 1.) All prioritized jobs are submitted all at once, overwhelming the cluster if there are hundreds or thousands 2.) Snakemake then blocks waiting for every one of these jobs to complete before executing more jobs. I'm assuming Snakemake will execute more jobs once these prioritized jobs are finished, but I'm still waiting on my university cluster to churn through the massive batch that I accidentally submitted the other day because of the bug.

Minimal example Use the -P flag as described here to prioritize a particular target

cmeesters commented 2 months ago

I am pretty sure, this is not a bug of this executor. @johanneskoester ? The executor only receives one job at a time (if no group job is given).

KB1RD commented 2 months ago

I can move the issue to Snakemake main if that's the cause.

Now that the prioritized jobs were completed, Snakemake began (according to the log) resubmitting jobs that were already run two at a time, instead of 64 at a time as I requested, but somehow they weren't submitted to the cluster despite SN log indications that they were. The jobs were dependencies of the file I prioritized: So all several hundred ran all at once on the cluster, then Snakemake tried running them two at a time, again, but also failed to do so for some unknown reason. Don't know if that helps narrow down the cause.

cmeesters commented 2 months ago

Cam you please indicate your command line (or profile), including priotirization and the 64 semaphore?

KB1RD commented 2 months ago

Running our lab's variant of ACCDB... (The only changes are to add a dataset and use a custom version of the software that Snakemake runs, Psi4)

snakemake -j 64 --executor slurm --default-resources slurm_account=<act-name> --rerun-incomplete -P Outputs/<dataset-name>/IndValues.csv