nf-core / sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
https://nf-co.re/sarek
MIT License
400 stars 404 forks source link

The executor freezes for two days without any error message #713

Open andrewucla opened 2 years ago

andrewucla commented 2 years ago

Description of the bug

I am using Sarek for Whole exome sequencing, running on slurm. I noticed this pipeline is stuck in one process without outputing any error messages. It hangs in the step [ 0%] 0 of 65.

Command used and terminal output

1. script
#!/bin/bash
#SBATCH -o POSC187.out
#SBATCH -e POSC187.err
#SBATCH -J POSC187.job
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -c 16
#SBATCH -p scavenger
#SBATCH -t 500:00:00
#SBATCH --mem=128G
#SBATCH --get-user-env
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=ew152@duke.edu

nextflow run nf-core/sarek --max_memory 128.GB -profile singularity --wes --intervals /work/ew152/WES/hg38_bed3_GATK.bed --input POSC187.csv --outdir . --tools manta,snpeff -c /work/ew152/WES/01.RawData/WES_nolim.conf

2. terminal output

executor >  local (235)
[-        ] process > NFCORE_SAREK:SAREK:PREPARE_... -
[-        ] process > NFCORE_SAREK:SAREK:PREPARE_... -
[-        ] process > NFCORE_SAREK:SAREK:PREPARE_... -
[-        ] process > NFCORE_SAREK:SAREK:PREPARE_... -
[-        ] process > NFCORE_SAREK:SAREK:PREPARE_... -
[-        ] process > NFCORE_SAREK:SAREK:PREPARE_... -
[-        ] process > NFCORE_SAREK:SAREK:PREPARE_... -
[-        ] process > NFCORE_SAREK:SAREK:PREPARE_... -
[a9/1f64ae] process > NFCORE_SAREK:SAREK:PREPARE_... [100%] 1 of 1 ✔
[-        ] process > NFCORE_SAREK:SAREK:PREPARE_... -
[-        ] process > NFCORE_SAREK:SAREK:PREPARE_... -
[-        ] process > NFCORE_SAREK:SAREK:PREPARE_... -
[-        ] process > NFCORE_SAREK:SAREK:PREPARE_... -
[-        ] process > NFCORE_SAREK:SAREK:PREPARE_... -
[-        ] process > NFCORE_SAREK:SAREK:PREPARE_... -
[45/82b624] process > NFCORE_SAREK:SAREK:PREPARE_... [100%] 1 of 1 ✔
[b0/76dc9e] process > NFCORE_SAREK:SAREK:PREPARE_... [100%] 65 of 65 ✔
[-        ] process > NFCORE_SAREK:SAREK:PREPARE_... -
[-        ] process > NFCORE_SAREK:SAREK:PREPARE_... -
[-        ] process > NFCORE_SAREK:SAREK:ALIGNMEN... -
[-        ] process > NFCORE_SAREK:SAREK:ALIGNMEN... -
[-        ] process > NFCORE_SAREK:SAREK:ALIGNMEN... -
[-        ] process > NFCORE_SAREK:SAREK:ALIGNMEN... -
[-        ] process > NFCORE_SAREK:SAREK:ALIGNMEN... -
[-        ] process > NFCORE_SAREK:SAREK:ALIGNMEN... -
[-        ] process > NFCORE_SAREK:SAREK:ALIGNMEN... -
[-        ] process > NFCORE_SAREK:SAREK:ALIGNMEN... -
[2b/de0021] process > NFCORE_SAREK:SAREK:RUN_FAST... [100%] 2 of 2 ✔
[bf/335305] process > NFCORE_SAREK:SAREK:FASTP (P... [100%] 2 of 2 ✔
[09/74febf] process > NFCORE_SAREK:SAREK:GATK4_MA... [100%] 24 of 24 ✔
[-        ] process > NFCORE_SAREK:SAREK:GATK4_MA... -
[-        ] process > NFCORE_SAREK:SAREK:GATK4_MA... -
[dd/ae261e] process > NFCORE_SAREK:SAREK:MARKDUPL... [100%] 1 of 1 ✔
[52/687d90] process > NFCORE_SAREK:SAREK:MARKDUPL... [100%] 1 of 1 ✔
[26/96df17] process > NFCORE_SAREK:SAREK:MARKDUPL... [100%] 1 of 1 ✔
[22/1be9cd] process > NFCORE_SAREK:SAREK:MARKDUPL... [100%] 1 of 1 ✔
[a9/805102] process > NFCORE_SAREK:SAREK:PREPARE_... [100%] 65 of 65 ✔
[08/fd290a] process > NFCORE_SAREK:SAREK:PREPARE_... [100%] 1 of 1 ✔
[89/5a2afb] process > NFCORE_SAREK:SAREK:RECALIBR... [100%] 65 of 65 ✔
[90/0de739] process > NFCORE_SAREK:SAREK:RECALIBR... [100%] 1 of 1 ✔
[5e/1f7a97] process > NFCORE_SAREK:SAREK:RECALIBR... [100%] 1 of 1 ✔
[72/471faa] process > NFCORE_SAREK:SAREK:CRAM_QC:... [100%] 1 of 1 ✔
[69/d752cd] process > NFCORE_SAREK:SAREK:CRAM_QC:... [100%] 1 of 1 ✔
[-        ] process > NFCORE_SAREK:SAREK:SAMTOOLS... -
[-        ] process > NFCORE_SAREK:SAREK:GERMLINE... -
[-        ] process > NFCORE_SAREK:SAREK:GERMLINE... -
[-        ] process > NFCORE_SAREK:SAREK:GERMLINE... -
[-        ] process > NFCORE_SAREK:SAREK:GERMLINE... -
[b4/ce31c2] process > NFCORE_SAREK:SAREK:TUMOR_ON... [  0%] 0 of 65
[-        ] process > NFCORE_SAREK:SAREK:TUMOR_ON... -
[-        ] process > NFCORE_SAREK:SAREK:TUMOR_ON... -
[-        ] process > NFCORE_SAREK:SAREK:TUMOR_ON... -
[-        ] process > NFCORE_SAREK:SAREK:PAIR_VAR... -
[-        ] process > NFCORE_SAREK:SAREK:PAIR_VAR... -
[-        ] process > NFCORE_SAREK:SAREK:PAIR_VAR... -
[-        ] process > NFCORE_SAREK:SAREK:PAIR_VAR... -
[-        ] process > NFCORE_SAREK:SAREK:PAIR_VAR... -
[-        ] process > NFCORE_SAREK:SAREK:VCF_QC:B... -
[-        ] process > NFCORE_SAREK:SAREK:VCF_QC:V... -
[-        ] process > NFCORE_SAREK:SAREK:VCF_QC:V... -
[-        ] process > NFCORE_SAREK:SAREK:VCF_QC:V... -
[-        ] process > NFCORE_SAREK:SAREK:ANNOTATE... -
[-        ] process > NFCORE_SAREK:SAREK:ANNOTATE... -
[-        ] process > NFCORE_SAREK:SAREK:CUSTOM_D... -
[-        ] process > NFCORE_SAREK:SAREK:MULTIQC     -
Staging foreign file: s3://ngi-igenomes/igenomes/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta.fai
Staging foreign file: s3://ngi-igenomes/igenomes/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta

Relevant files

No response

System information

No response

FriederikeHanssen commented 2 years ago

Hi @andrewucla ! way back I experienced this with a much older sarek version. ctrl+c helped there. Never figured out what happened under the hood and haven't experienced it with recent nextflow versions and sarek releases.

@pontus since I can't reproduce it with the pipeline on our slurm cluster, I was wondering if it may be something with the system itself, but I am not knowledgeable about it; so pinging you :)

pontus commented 2 years ago

Unfortunately not enough information to make an educated guess even.

If this persists and interest to solve it does so, it could for a starter be useful with a pstree (as singularity is being used, starting from the nextflow process should be enough). Similarly /proc/N/stack contents might give a clue for any N which is a light weight process in a long running process.

Alternatively, a look at the work directory might give a much better idea about what's going on.

SAADAT-Abu commented 1 year ago

WARN: There's no process matching config selector: NFCORE_SAREK:SAREK:CRAM_QC_NO_MD:SAMTOOLS_STATS -- Did you mean: NFCORE_SAREK:SAREK:CRAM_QC_RECAL:SAMTOOLS_STATS?

Can someone help. SAREK version 3.1.2

FriederikeHanssen commented 1 year ago

@SAADAT-Abu The warning has no impact on the execution. It's just a thing where nextflow warns when a process config is loaded but not the process not. No worries about this one.

SAADAT-Abu commented 1 year ago

@FriederikeHanssen Thanks a lot for the quick response. But my run seems to be stuck after fastQC step 5 hours. I killed it and resumed but its stuck again at the same point. Is it possible that it is downloading the resources like genome and indexes?

FriederikeHanssen commented 1 year ago

Is it possible that it is downloading the resources like genome and indexes?

yes that sounds likely. I would recommend pre-downloading the reference data from igenomes

sounkou-bioinfo commented 1 year ago

hi all. i am trying out the workflow on some WES data in a local HPC setting, i have a similar freezing issue that seem to be related to the aws resources files (hg19 resources in my case). I was wondering if there was a complete index of the s3 paths to the of the required files(not just folders) because i am unable to download the content of some folders - i am unable to use the aws cli and dowloaded igenomes references that are files and not directories using wget - . Thank you in advance.

FriederikeHanssen commented 1 year ago

I am afraid to download the igenomes reference data you need the aws-cli. I don't think there is another way to it but you could join slack (nf-co.re/join) and ask in the #igenomes channel.

You can build the command here: https://ewels.github.io/AWS-iGenomes/ or infer the paths for each file and folder from here: https://github.com/nf-core/sarek/blob/master/conf/igenomes.config