Open TonyKess opened 1 year ago
Segmentation fault
sounds bad.
Any chance it works again if you resume?
Same message unfortunately. This is using the joint-germline workflow with a species without any prior genomic info (indels, dbSNP etc). The joint-germline.vcf itself actually seems fine, but seems like vcftools is looking for something in it that isn't there?
quick update: disabling vcftools seems to lead to pipeline completion
Good idea, that would make sense
I'm going to try to use some high depth samples to build a SNP/indel reference, and will see if including that info in subsequent runs changes the performance here.
I have encountered the same error using different data. There is a workaround originally proposed by @FriederikeHanssen which is to add an ignore to the config file. This works as the tool appears to produce a good output before the seg fault.
Relevant Config Line
\\ within process {}
withName:VCFTOOLS_TSTV_COUNT {
errorStrategy = 'ignore'
}
Command Run:
#!/bin/bash
#PBS -l select=1:ncpus=2:mem=8gb
#PBS -l walltime=12:00:00
module load anaconda3/personal
cd ${PBS_O_WORKDIR}
nextflow run nf-core/sarek \
-c Good_Imperial.config \
--input Ecoli_Samples.Sarek.csv \
--fasta WT_S295.fna \
--save_reference \
--outdir /rds/general/user/rjackso1/home/Projects/2023_Julian_Ecoli/Sarek_Results \
--igenomes_ignore \
--tools haplotypecaller \
--skip_tools baserecalibrator \
--joint_germline \
-resume
Error Message in Main Outfile
[5b/447b3f] NOTE: Process `NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_TSTV_COUNT (joint_variant_calling)` terminated with an error exit status (139) -- Error is ignored
Relevant .command.sh
#!/bin/bash -euo pipefail
vcftools \
--gzvcf joint_germline.vcf.gz \
--out joint_germline \
--TsTv-by-count \
\
cat <<-END_VERSIONS > versions.yml
"NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_TSTV_COUNT":
vcftools: $(echo $(vcftools --version 2>&1) | sed 's/^.*VCFtools (//;s/).*//')
END_VERSIONS
Relevant .command.err
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount /var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
VCFtools - 0.1.16
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--gzvcf joint_germline.vcf.gz
--out joint_germline
--TsTv-by-count
Using zlib version: 1.2.11
Warning: Expected at least 2 parts in FORMAT entry: ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not intended to describe called alleles">
Warning: Expected at least 2 parts in FORMAT entry: ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
Warning: Expected at least 2 parts in FORMAT entry: ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
Warning: Expected at least 2 parts in FORMAT entry: ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
After filtering, kept 11 out of 11 Individuals
Outputting Ts/Tv by Alternative Allele Count
/rds/general/user/rjackso1/home/Projects/2023_Julian_Ecoli/work/5b/447b3f6899f78519c3a6080f6469b9/.command.sh: line 7: 38 Segmentation fault vcftools --gzvcf joint_germline.vcf.gz --out joint_germline --TsTv-by-count
Custom Config File Used:
//Profile config names for nf-core/configs
// you must create /tmp and /var/tmp in /rds/general/user/$USER/ephemeral/
params {
// Config Params
config_profile_description = 'Imperial College London - HPC Profile'
// Resources
max_memory = 920.GB
max_cpus = 256
max_time = 1000.h
}
process {
// base params
executor = 'pbspro'
maxRetries = 3
// resource specific params - modified for imperial queues
withLabel:process_low {
cpus = { 1 }
memory = { 12.GB * task.attempt }
time = { 4.h * task.attempt }
errorStrategy = { task.attempt <= 4 ? 'retry' : 'finish' }
}
withLabel:process_medium {
cpus = { 4 * task.attempt }
memory = { 30.GB * task.attempt }
time = { 16.h * task.attempt }
errorStrategy = { task.attempt <= 4 ? 'retry' : 'finish' }
}
withLabel:process_high {
cpus = { 8 * task.attempt }
memory = { 92.GB * task.attempt }
time = { 16.h * task.attempt }
errorStrategy = { task.attempt <= 4 ? 'retry' : 'finish' }
}
withName:FASTQC { // seems to fail when using lower numbers of cores
cpus = { 8 * task.attempt }
memory = { 30.GB * task.attempt }
time = { 4.h * task.attempt }
errorStrategy = { task.attempt <= 4 ? 'retry' : 'finish' }
}
withName:VCFTOOLS_TSTV_COUNT {
cpus = { 8 * task.attempt }
memory = { 30.GB * task.attempt }
time = { 4.h * task.attempt }
errorStrategy = 'ignore'
}
}
executor {
$pbspro {
queueSize = 49
submitRateLimit = '10 sec'
}
$local {
cpus = 2
queueSize = 1
memory = '6 GB'
}
}
singularity {
enabled = true
autoMounts = true
runOptions = "-B /rds:/rds,/etc:/etc,/rds/general/user/$USER/ephemeral/tmp:/tmp,/var/tmp:/var/tmp"
}
I'm running into a similar issue with VCFTOOLS_TSTV_COUNT
on joint_germline.vcf.gz
. Can confirm that adding errorStrategy = 'ignore'
in my config got the pipeline to finish successfully.
I am new to nextflow and I have the same problem and I follow the step by adding the config file like below:
touch nexflow.config
nano nextflow.config
process {
withName:VCFTOOLS_TSTV_COUNT {
errorStrategy = 'ignore'
}
}
nextflow -bg run nf-core/sarek -r 3.4.0 -params-file params.json -profile docker -c nextflow.config
and I place the config file locally and I ran the code but the pipeline seems cannot read my config file
Core Nextflow options revision : 3.4.0 runName : tender_mccarthy containerEngine : docker launchDir : /data/run3 workDir : /data/run3/work projectDir : /root/.nextflow/assets/nf-core/sarek userName : root profile : docker configFiles :
The config files are empty.
Just to report that I got the same error doing joint germline calling on human samples (3 individuals from public "genome in a bottle" data). The hack of ignoring the error is a workaround, but the error is still there.
I tried to troubleshoot a bit:
.command.sh
). The segmentation fault error occurred after just 20 secs. htop
) did not suggest it was a memory issue. --chr "chr12"
for example - segmentation fault happened again. ALT_ALLELE_COUNT
column goes up to 5, which might make sense as I have 3 diploid individuals).Possibly this is related to this open issue on the vcftools
repo.
Description of the bug
Joint germline genotyping completes actual genotyping, but fails at TSV count vcftools step
Command used and terminal output
Command:
nextflow run nf-core/sarek --skip_tools baserecalibrator --genome null --igenomes_ignore --joint_germline --intervals Ssal_v3.1_genomic.chroms.bed --fasta Ssal_v3.1_genomic.chroms.fna --input salmo5samp.csv -profile docker --tools haplotypecaller,manta -resume
Error:
Error executing process > 'NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_TSTV_COUNT (joint_variant_calling)'
Caused by: Process
NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_TSTV_COUNT (joint_variant_calling)
terminated with an error exit status (139)Command executed:
vcftools \ --gzvcf joint_germline.vcf.gz \ --out joint_germline \ --TsTv-by-count \ \
cat <<-END_VERSIONS > versions.yml "NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_TSTV_COUNT": vcftools: $(echo $(vcftools --version 2>&1) | sed 's/^.VCFtools (//;s/).//') END_VERSIONS
Command exit status: 139
Command output: (empty)
Command error:
VCFtools - 0.1.16 (C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted: --gzvcf joint_germline.vcf.gz --out joint_germline --TsTv-by-count
Using zlib version: 1.2.11 Warning: Expected at least 2 parts in FORMAT entry: ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not intended to describe called alleles"> Warning: Expected at least 2 parts in FORMAT entry: ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group"> Warning: Expected at least 2 parts in FORMAT entry: ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification"> Warning: Expected at least 2 parts in FORMAT entry: ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)"> Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed"> Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed"> Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed"> Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed"> Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed"> Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed"> Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed"> Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed"> After filtering, kept 5 out of 5 Individuals Outputting Ts/Tv by Alternative Allele Count After filtering, kept 9896941 out of a possible 9896941 Sites Run Time = 49.00 seconds .command.sh: line 7: 27 Segmentation fault (core dumped) vcftools --gzvcf joint_germline.vcf.gz --out joint_germline --TsTv-by-count
Work dir: /genomics/Tony/Atlantic_Salmon/work/8d/0c7967703f8969b8e8948f2bf3fd38
Tip: when you have fixed the problem you can continue the execution adding the option
-resume
to the run command line