maxplanck-ie / snakepipes

Customizable workflows based on snakemake and python for the analysis of NGS data
http://snakepipes.readthedocs.io
378 stars 85 forks source link

DNA-mapping pipeline gets stuck at origFASTQ rule when using a custom cluster config yaml #887

Closed TobiasHohl closed 2 weeks ago

TobiasHohl commented 1 year ago

Using the following command, I tried running the DNA-mapping pipeline using a customized cluster config yaml to circumvent issues with one node on our server: DNA-mapping --DAG --trim --trimmerOptions '-a nexteraF=CTGTCTCTTATA -A nexteraR=CTGTCTCTTATA' --dedup --mapq 2 -j 20 -i ./fastq/ -o ./mapping/ hg38 --clusterConfigFile custom-cluster-config.yaml

Apparently, the pipeline always got stuck at the execution of the first rules (origFASTQ1 and origFASTQ2), no matter what name I gave the custom config yaml and no matter what output folder I assigned. This was not the case when using the default cluster configuration. Also, when I start the pipeline with the default and interrupt after the origFASTQ folder and its contents are created, I can restart with the custom config and it finishes.

Custom cluster config:

CollectAlignmentSummaryMetrics:
  memory: 2G
CollectInsertSizeMetrics:
  memory: 1G
FASTQdownsample:
  memory: 4G
__default__:
  memory: 1G
bamCoverage:
  memory: 4G
bamCoverage_RPKM:
  memory: 5G
bamCoverage_coverage:
  memory: 5G
bamCoverage_filtered:
  memory: 4G
bamCoverage_raw:
  memory: 5G
bamCoverage_unique_mappings:
  memory: 5G
bamPE_fragment_size:
  memory: 10G
bowtie2:
  memory: 4G
bwa:
  memory: 4G
bwamem2:
  memory: 6G
create_snpgenome:
  memory: 30G
filter_reads:
  memory: 3G
filter_reads_umi:
  memory: 10G
plotCorrelation_pearson:
  memory: 3G
plotCorrelation_pearson_allelic:
  memory: 5G
plotCorrelation_spearman:
  memory: 3G
plotCorrelation_spearman_allelic:
  memory: 2G
plotCoverage:
  memory: 1G
plotEnrichment:
  memory: 1G
plotFingerprint:
  memory: 1G
plotPCA:
  memory: 4G
plotPCA_allelic:
  memory: 4G
plot_heatmap_CSAW_up:
  memory: 10G
snakePipes_cluster_logDir: cluster_logs
snakemake_cluster_cmd: module load slurm; sbatch --ntasks-per-node 1 -p bioinfo --mem-per-cpu
  {cluster.memory} -c {threads} -e cluster_logs/{rule}.%j.err -o cluster_logs/{rule}.%j.out
  -x deep9 -J {rule}.snakemake
snakemake_latency_wait: 300
snp_split:
  memory: 10G

Edit: formatting

katsikora commented 1 year ago

Hi Tobi,

I'll have a look if I can reproduce this. Just to understand, you've only modified the cluster command to exclude deep9?

Best wishes,

Katarzyna

TobiasHohl commented 1 year ago

Hi Katarzyna,

yes, when I start the pipeline with the following command it works: DNA-mapping --DAG --trim --trimmerOptions "-a nexteraF=CTGTCTCTTATA -A nexteraR=CTGTCTCTTATA" --dedup --mapq 2 -j 20 -i ./fastq/ -o ./sp/ hg38

Only when using the cluster config yaml as stated above the pipeline gets stuck.

Thanks for looking into this!

Best Tobi

katsikora commented 1 year ago

Hi Tobi,

thanks for submitting the issue. Indeed, I can reproduce it. I'll have a look what might be causing this.

Best,

Katarzyna

katsikora commented 1 year ago

Hi Tobi,

as an update, I will attempt to circumvent this and other slurm-related issues in snakePipes by implementing native slurm support available in more recent snakemake versions. This is now pending, as I need the IT to solve an issue related to the slurm folder on the package partition.

Best,

Katarzyna

WardDeb commented 2 weeks ago

Obsolete since cluster yamls have been deprecated.