nf-core / sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
https://nf-co.re/sarek
MIT License
408 stars 415 forks source link

Pipeline completed with errors : ControlFREEC #970

Closed Nour-EddineS closed 1 year ago

Nour-EddineS commented 1 year ago

Description of the bug

Dear SAREK Team, I want to use Somatic variant calling, but pipeline completed with errors. Thanks in advance for your help.

Best regards,

Command used and terminal output

nextflow run nf-core/sarek  --step variant_calling --input samplesheet.csv --outdir results/ --genome GATK.GRCh37 -profile docker --wes --intervals /home/user1/Target-panel/dataSet/data_run/tar_bla_cancer2.bed --tools cnvkit,controlfreec --only_paired_variant_calling true --max_cpus 7

-[nf-core/sarek] Pipeline completed with errors-
Error executing process > 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_CONTROLFREEC:FREEC_SOMATIC (tumor_3468_S15_1_vs_normal_3468_S15_1)'

Caused by:
  Process `NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_CONTROLFREEC:FREEC_SOMATIC (tumor_3468_S15_1_vs_normal_3468_S15_1)` terminated with an error exit status (1)

Command executed:

  touch config.txt

  echo "[general]" >> config.txt
  echo BedGraphOutput = TRUE >> config.txt
  echo breakPointThreshold = 1.2 >> config.txt
  echo breakPointType = 4 >> config.txt
  echo chrFiles =${PWD}/Chromosomes >> config.txt
  echo chrLenFile = ${PWD}/human_g1k_v37_decoy.fasta.fai >> config.txt
  echo coefficientOfVariation = 0.05 >> config.txt
  echo  >> config.txt
  echo  >> config.txt
  echo  >> config.txt
  echo forceGCcontentNormalization = 1 >> config.txt
  echo  >> config.txt
  echo gemMappabilityFile = ${PWD}/out100m2_hg19.gem >> config.txt
  echo  >> config.txt
  echo  >> config.txt
  echo  >> config.txt
  echo  >> config.txt
  echo  >> config.txt
  echo minimalSubclonePresence = 30 >> config.txt
  echo "maxThreads = 2" >> config.txt
  echo noisyData = TRUE >> config.txt
  echo  >> config.txt
  echo ploidy = 2 >> config.txt
  echo printNA = FALSE >> config.txt
  echo readCountThreshold = 50 >> config.txt
  echo sex = XY >> config.txt
  echo  >> config.txt
  echo  >> config.txt
  echo  >> config.txt
  echo  >> config.txt

  echo "[control]" >> config.txt
  echo mateFile = ${PWD}/tumor_3468_S15_1_vs_normal_3468_S15_1.normal.mpileup.gz >> config.txt
  echo  >> config.txt
  echo  >> config.txt
  echo inputFormat = pileup >> config.txt
  echo mateOrientation = FR >> config.txt

  echo "[sample]" >> config.txt
  echo mateFile = ${PWD}/tumor_3468_S15_1_vs_normal_3468_S15_1.tumor.mpileup.gz >> config.txt
  echo  >> config.txt
  echo  >> config.txt
  echo inputFormat = pileup >> config.txt
  echo mateOrientation = FR >> config.txt

  echo "[BAF]" >> config.txt
  echo  >> config.txt
  echo fastaFile = ${PWD}/human_g1k_v37_decoy.fasta >> config.txt
  echo  >> config.txt
  echo  >> config.txt
  echo  >> config.txt
  echo SNPfile = $PWD/dbsnp_138.b37.vcf.gz >> config.txt

  echo "[target]" >> config.txt
  echo captureRegions = tar_bla_cancer2.bed >> config.txt

  freec -conf config.txt

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_CONTROLFREEC:FREEC_SOMATIC":
      controlfreec: $(echo $(freec -version 2>&1) | sed 's/^.*Control-FREEC  //; s/:.*$//' | sed -e "s/Control-FREEC v//g" )
  END_VERSIONS

Command exit status:
  1

Command output:
  Control-FREEC v11.6 : a method for automatic detection of copy number alterations, subclones and for accurate estimation of contamination and main ploidy using deep-sequencing data
  Multi-threading mode using 2 threads
  ..consider the sample being male
  ..Breakpoint threshold for segmentation of copy number profiles is 1.2
  ..telocenromeric set to 50000
  ..FREEC is not going to adjust profiles for a possible contamination by normal cells
  ..Coefficient Of Variation set equal to 0.05
  ..it will be used to evaluate window size
  ..Output directory:   .
  ..Directory with files containing chromosome sequences:   Chromosomes
  ..Sample file:    tumor_3468_S15_1_vs_normal_3468_S15_1.tumor.mpileup.gz
  ..Sample input format:    pileup
  ..Control file:   tumor_3468_S15_1_vs_normal_3468_S15_1.normal.mpileup.gz
  ..Input format for the control file:  pileup
  ..forceGCcontentNormalization was set to 1: will use GC-content to normalize the read count data
  ..minimal expected GC-content (general parameter "minExpectedGC") was set to 0.35
  ..maximal expected GC-content (general parameter "maxExpectedGC") was set to 0.55
  ..Polynomial degree for "ReadCount ~ GC-content" normalization is 3 or 4: will try both
  ..Minimal CNA length (in windows) is 3
  ..File with chromosome lengths:   human_g1k_v37_decoy.fasta.fai
  ..File human_g1k_v37_decoy.fasta.fai was read

Command error:
  For example, you can remove chromosome GL000217.1 from your human_g1k_v37_decoy.fasta.fai
  Error: chromosome GL000216.1 present in your human_g1k_v37_decoy.fasta.fai file was not detected in your file with capture regions tar_bla_cancer2.bed
  Please solve this issue and rerun Control-FREEC
  For example, you can remove chromosome GL000216.1 from your human_g1k_v37_decoy.fasta.fai
  Error: chromosome GL000215.1 present in your human_g1k_v37_decoy.fasta.fai file was not detected in your file with capture regions tar_bla_cancer2.bed
  Please solve this issue and rerun Control-FREEC
  For example, you can remove chromosome GL000215.1 from your human_g1k_v37_decoy.fasta.fai
  Error: chromosome GL000205.1 present in your human_g1k_v37_decoy.fasta.fai file was not detected in your file with capture regions tar_bla_cancer2.bed
  Please solve this issue and rerun Control-FREEC
  For example, you can remove chromosome GL000205.1 from your human_g1k_v37_decoy.fasta.fai
  Error: chromosome GL000219.1 present in your human_g1k_v37_decoy.fasta.fai file was not detected in your file with capture regions tar_bla_cancer2.bed
  Please solve this issue and rerun Control-FREEC
  For example, you can remove chromosome GL000219.1 from your human_g1k_v37_decoy.fasta.fai
  Error: chromosome GL000224.1 present in your human_g1k_v37_decoy.fasta.fai file was not detected in your file with capture regions tar_bla_cancer2.bed
  Please solve this issue and rerun Control-FREEC
  For example, you can remove chromosome GL000224.1 from your human_g1k_v37_decoy.fasta.fai
  Error: chromosome GL000223.1 present in your human_g1k_v37_decoy.fasta.fai file was not detected in your file with capture regions tar_bla_cancer2.bed
  Please solve this issue and rerun Control-FREEC
  For example, you can remove chromosome GL000223.1 from your human_g1k_v37_decoy.fasta.fai
  Error: chromosome GL000195.1 present in your human_g1k_v37_decoy.fasta.fai file was not detected in your file with capture regions tar_bla_cancer2.bed
  Please solve this issue and rerun Control-FREEC
  For example, you can remove chromosome GL000195.1 from your human_g1k_v37_decoy.fasta.fai
  Error: chromosome GL000212.1 present in your human_g1k_v37_decoy.fasta.fai file was not detected in your file with capture regions tar_bla_cancer2.bed
  Please solve this issue and rerun Control-FREEC
  For example, you can remove chromosome GL000212.1 from your human_g1k_v37_decoy.fasta.fai
  Error: chromosome GL000222.1 present in your human_g1k_v37_decoy.fasta.fai file was not detected in your file with capture regions tar_bla_cancer2.bed
  Please solve this issue and rerun Control-FREEC
  For example, you can remove chromosome GL000222.1 from your human_g1k_v37_decoy.fasta.fai
  Error: chromosome GL000200.1 present in your human_g1k_v37_decoy.fasta.fai file was not detected in your file with capture regions tar_bla_cancer2.bed
  Please solve this issue and rerun Control-FREEC
  For example, you can remove chromosome GL000200.1 from your human_g1k_v37_decoy.fasta.fai
  Error: chromosome GL000193.1 present in your human_g1k_v37_decoy.fasta.fai file was not detected in your file with capture regions tar_bla_cancer2.bed
  Please solve this issue and rerun Control-FREEC
  For example, you can remove chromosome GL000193.1 from your human_g1k_v37_decoy.fasta.fai
  Error: chromosome GL000194.1 present in your human_g1k_v37_decoy.fasta.fai file was not detected in your file with capture regions tar_bla_cancer2.bed
  Please solve this issue and rerun Control-FREEC
  For example, you can remove chromosome GL000194.1 from your human_g1k_v37_decoy.fasta.fai
  Error: chromosome GL000225.1 present in your human_g1k_v37_decoy.fasta.fai file was not detected in your file with capture regions tar_bla_cancer2.bed
  Please solve this issue and rerun Control-FREEC
  For example, you can remove chromosome GL000225.1 from your human_g1k_v37_decoy.fasta.fai
  Error: chromosome GL000192.1 present in your human_g1k_v37_decoy.fasta.fai file was not detected in your file with capture regions tar_bla_cancer2.bed
  Please solve this issue and rerun Control-FREEC
  For example, you can remove chromosome GL000192.1 from your human_g1k_v37_decoy.fasta.fai
  Error: chromosome NC_007605 present in your human_g1k_v37_decoy.fasta.fai file was not detected in your file with capture regions tar_bla_cancer2.bed
  Please solve this issue and rerun Control-FREEC
  For example, you can remove chromosome NC_007605 from your human_g1k_v37_decoy.fasta.fai
  Error: chromosome hs37d5 present in your human_g1k_v37_decoy.fasta.fai file was not detected in your file with capture regions tar_bla_cancer2.bed
  Please solve this issue and rerun Control-FREEC
  For example, you can remove chromosome hs37d5 from your human_g1k_v37_decoy.fasta.fai
  Will exit

Work dir:
  /home/user1/nf-core/work/95/aeb42ee48ac9481504cef7bbd311f2

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

Relevant files

nextflow.log samplesheet.csv tar_bla_cancer2.bed.tar.gz

System information

CPU: Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz RAM: 32 GB Distribution: Ubuntu 22.04.1 LTS

FriederikeHanssen commented 1 year ago

Hey! This is unfortunately a known issue, see here: https://github.com/BoevaLab/FREEC/issues/106. the current work around in the pipeline is to provide a len file using this parameter: https://nf-co.re/sarek/3.1.2/parameters#cf_chrom_len

Nour-EddineS commented 1 year ago

Dear @FriederikeHanssen, How can I create a len file please? Best regards,

FriederikeHanssen commented 1 year ago

If you click "Help" a description opens up:

Control-FREEC requires a file containing all chromosome lenghts. By default the fasta.fai is used. If the fasta.fai file contains chromosomes not present in the intervals, it fails (see: https://github.com/BoevaLab/FREEC/issues/106).

In this case, a custom chromosome length can be specified. It must be of the same format as the fai, but only contain the relevant chromosomes.

Example shown here (it is from a different genome build, so don't use this one): http://bioinfo-out.curie.fr/projects/freec/src/hg18.len

You can check your bed file, and then only keep the chromosomes from the fai of the genome build you are using that are also in the bed file. Then name it to something my_len.len and add it with the above parameter. To download the fai, either check your local igenomes installation if you ave it, or you can get from https://ewels.github.io/AWS-iGenomes/ here

FriederikeHanssen commented 1 year ago

Has this worked? If yes, could you close this issue?

Nour-EddineS commented 1 year ago

@FriederikeHanssen Yes it works Thanks :)

chumawinnie commented 2 months ago

Title: FREEC_SOMATIC Process Error: Missing Output File *_sample.cpn in NFCORE Sarek 3.4.2

Body:

Hi NFCORE Sarek community,

I am currently running the NFCORE Sarek pipeline (version 3.4.2) for somatic variant calling on whole-exome sequencing data. During the execution of the pipeline, I encountered an issue with the FREEC_SOMATIC process, where the expected output file(s) *_sample.cpn are missing, causing the process to fail. Below are the details of the command I used and the error message I received:

Command Executed:

nextflow run nf-core/sarek -r 3.4.2 -profile docker \
  --input samplesheet.csv \
  --outdir /home/obiorach/test-work-sarek/WES-test-result \
  --genome hg19 \
  --dbsnp /home/obiorach/whole-Exon-single-seq/ref-genome/known_sites.vcf/dbsnp_138.hg19.vcf.gz \
  --known_indels /home/obiorach/whole-Exon-single-seq/ref-genome/known_indels.vcf/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz \
  --max_cpus 20 \
  --max_memory '30 GB' \
  --igenomes_base 's3://ngi-igenomes/igenomes' \
  --wes \
  --intervals /home/obiorach/whole-Exon-single-seq/ref-genome/exom_targets.bed/HyperExomeV2_primary_targets.hg19.bed \
  --tools mutect2,strelka,vep,manta,tiddit,cnvkit,controlfreec,msisensorpro \
  --pon /home/obiorach/whole-Exon-single-seq/ref-genome/panel-of-normal/updated_Mutect2-exome-panel_vcf.vcf.gz \
  --germline_resource /home/obiorach/whole-Exon-single-seq/ref-genome/germline-resource/renamed_gnomad.vcf.gz \
  --vep_cache /home/obiorach/vep_cache \
  --vep_species homo_sapiens \
  --vep_genome GRCh37 \
  --vep_cache_version 112 \
  -c custom.config \
  -resume

Error Message:

Aug-29 13:43:20.036 [TaskFinalizer-106] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_CONTROLFREEC:FREEC_SOMATIC (Tumour_vs_Normal); work-dir=/home/obiorach/test-work-sarek/work/c0/6ad8417512a5e92448b724cf07213a
  error [nextflow.exception.MissingFileException]: Missing output file(s) `*_sample.cpn` expected by process `NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_CONTROLFREEC:FREEC_SOMATIC (Tumour_vs_Normal)`

Custom Configuration File:

process {
    // adding a regex to prevent the fully qualified name to take precedence
    withName: ".*" {
            time = 72.h
    }

    withName: 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_MUTECT2:MUTECT2_PAIRED' {
        memory = 30.GB  // Allocate more memory for the Mutect2 process
        cpus = 20         // Allocate more CPUs for the Mutect2 process
        time = 72.h      // Specific time limit for this process
    }
}

Steps Taken to Troubleshoot:

  1. Checked Output Directory: Verified that the output directory exists and is writable. No .cpn files were found.
  2. Reviewed Command and Configuration: Ensured that the paths and parameters in the config.txt file and command are correct.
  3. Checked Log Files: Reviewed .command.log, .command.err, and .command.out files in the task's workDir for additional details, but found no specific clues beyond the missing file error.
  4. Validated Input Data: Confirmed that input BAM files and reference genome are correctly formatted and indexed.
  5. Rerun Task: Tried re-running the specific process using the -resume flag, but the issue persists.

Questions:

  1. Missing Output Files: Could anyone advise on potential causes for the missing *_sample.cpn file and suggest further troubleshooting steps? Are there specific configuration requirements or dependencies that might lead to this issue?

  2. Control-FREEC Visualization: Is there a special configuration required to enable visualization outputs from Control-FREEC? If so, could you provide guidance on how to set it up?

Thank you in advance for your support!

Best regards,
Obiora Ch.