snakemake / snakemake

This is the development home of the workflow management system Snakemake. For general information, see
https://snakemake.github.io
MIT License
2.19k stars 526 forks source link

Rule with Nextflow workflow not being executed concurrently for multiple samples on SLURM #2472

Open juliawiggeshoff opened 9 months ago

juliawiggeshoff commented 9 months ago

Snakemake version

7.32.4

Describe the bug

When attempting to integrate the use of a foreign workflow management system into Snakemake, Snakemake does not execute the rule for multiple samples "concurrently". Instead, it waits for the whole Nextflow workflow to finish to then start the analysis for the second sample, i.e. it's submitting jobs sequentially instead of the expected concurrent (parallel?) submission of jobs for different samples.

Is this an issue related to the nature of this type of rule? Is it maybe because the execution of the rule is "handover" to another executor and treated as a local rule and that's why submission is sequential instead of concurrent? I may be misunderstanding how local rules work, but if not, the issue would be unrelated to SLURM and probably to the handover directive.

However, if this is indeed a SLURM issue as mentioned different times (in other contexts) on https://github.com/snakemake/snakemake/issues/2339 and https://github.com/snakemake/snakemake/issues/2060, is there any way we can change this behaviour?

Minimal example

A snippet of my Snakefile running the Nextflow workflow

rule sarek:
    input:
        params="results/{project}/{sample}/{sample}_params.json"
    output: mutect2="results/{project}/{sample}/variant_calling/mutect2/{sample}_Tumor_vs_{sample}_Normal/{sample}_Tumor_vs_{sample}_Normal.mutect2.filtered.vcf.gz"        
    params:
        pipeline="sarek3.2.2",
        revision="3.2.2",
        profile="slurm_singularity",
        outdir="results/{project}/{sample}"
    handover: True
    shell:
        "nextflow23.04.02 run {params.pipeline} -revision {params.revision} -profile {params.profile} "
        "-params-file {input.params} --outdir {params.outdir} -work-dir /scratch/work"

How I'm executing Snakemake:

snakemake --slurm --keep-going --verbose --printshellcmds --reason --nolock --jobs unlimited --cores 24 --local-cores 4 --rerun-incomplete

I've also just run Snakemake on the head node, i.e. without including --slurm, and the problem persists. Interestingly, commenting out handover: True and running Snakemake with the above comment works for submitting and executing N jobs for rule sarek at the same time, however, I get this error:

ERROR ~ Error executing process > 'NFCORE_SAREK:SAREK:FASTP (PT_6_Tumor-Lane1)'

Caused by:
  Failed to submit process to grid scheduler for execution

Command executed:

  sbatch .command.run

Command exit status:
  1

Command output:
  sbatch: error: Batch job submission failed: Access/permission denied

Additional context

The nextflow workflow is saved on my home, i.e. it has already been pulled, and I have modified the slurm profile (here called slurm_singularity) in the nextflow.config to work on the HPC I work with. This workflow has been tested on my HPC before and works well "on its own", so I know the issue is not with the nextflow workflow and SLURM.

johanneskoester commented 9 months ago

The reason for this behavior is because we cannot really pass parallelization info to nextflow directly: The idea when using the nextflow wrapper is that the user has to specify a nextflow config that properly configures nextflow to parallelize the workflow. I would be happy to offer something automatic there, but this would require some nextflow engineers to help with translating Snakemake's parallelization info to what nextflow understands (e.g. translate Snakemake's slurm args to corresponding nextflow args). This is future work.

Until then, the only option I see here is to tell sarek to run all the samples in one go (I hope that is possible with sarek, as it seems quite a natural task to me).