nf-core / sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
https://nf-co.re/sarek
MIT License
388 stars 401 forks source link

Giving a user fasta file, but keeping all default fil path #1514

Closed Ist4lri closed 2 months ago

Ist4lri commented 4 months ago

Description of the bug

I provide a fasta file for running Mutect2 and have this error :

A USER ERROR has occurred: Fasta index file file://GRCh38_latest_genomic.fna.fai for reference file://GRCh38_latest_genomic.fna does not exist. Please see https://gatk.broadinstitute.org/hc/articles/360035531652-FASTA-Reference-genome-format for help creating it.

from Mutect2 of GATK.

But my file is here, and exist.

Command used and terminal output

`nextflow run nf-core/sarek -r dev -profile singularity -c custom.config -params-file nf-params.json`

json :

{
    "input": "sample.csv",
    "outdir": "results",
    "wes": "true",
    "fasta": "/gpfs/home/plgouttebel/home/exomic/data/ref/GRCh38_latest_genomic.fna",
    "aligner": "bwa-mem2",
    "tools": "mutect2",
    "skip_tools": "baserecalibrator,markduplicates"
}

config : singularity.cacheDir = '/scratch/plgouttebel/data_Singula/nf-core-sarek_dev/singularity-images'

Output from Log file :

May-07 15:15:33.560 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
May-07 15:15:33.560 [Task submitter] INFO  nextflow.Session - [48/601cc5] Submitted process > NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_TUMOR_ONLY_ALL:BAM_VARIANT_CALLING_TUMOR_ONLY_MUTECT2:MUTECT2 (BR666F)
May-07 15:15:33.655 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_TUMOR_ONLY_ALL:BAM_VARIANT_CALLING_TUMOR_ONLY_MUTECT2:GETPILEUPSUMMARIES (BR666F); work-dir=/scratch/plgouttebel/data_Singula/nf-core-sarek_dev/work/a0/84752509ea76ccd51c89f3b8af9c20
  error [nextflow.exception.ProcessFailedException]: Process `NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_TUMOR_ONLY_ALL:BAM_VARIANT_CALLING_TUMOR_ONLY_MUTECT2:GETPILEUPSUMMARIES (BR666F)` terminated with an error exit status (2)
May-07 15:15:33.763 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_TUMOR_ONLY_ALL:BAM_VARIANT_CALLING_TUMOR_ONLY_MUTECT2:GETPILEUPSUMMARIES (BR666F)'

Caused by:
  Process `NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_TUMOR_ONLY_ALL:BAM_VARIANT_CALLING_TUMOR_ONLY_MUTECT2:GETPILEUPSUMMARIES (BR666F)` terminated with an error exit status (2)

Command executed:

  gatk --java-options "-Xmx9830M -XX:-UsePerfData" \
      GetPileupSummaries \
      --input BR666F.sorted.cram \
      --variant af-only-gnomad.hg38.vcf.gz \
      --output BR666F.mutect2.chr2_16146120-32867130.pileups.table \
      --reference GRCh38_latest_genomic.fna \
      --intervals chr2_16146120-32867130.bed \
      --tmp-dir . \

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_TUMOR_ONLY_ALL:BAM_VARIANT_CALLING_TUMOR_ONLY_MUTECT2:GETPILEUPSUMMARIES":
      gatk4: $(echo $(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*$//')
  END_VERSIONS

Command exit status:
  2

Command output:
  (empty)

Command error:
  Using GATK jar /usr/local/share/gatk4-4.5.0.0-0/gatk-package-4.5.0.0-local.jar
  Running:
      java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx9830M -XX:-UsePerfData -jar /usr/local/share/gatk4-4.5.0.0-0/gatk-package-4.5.0.0-local.jar GetPileupSummaries --input BR666F.sorted.cram --variant af-only-gnomad.hg38.vcf.gz --output BR666F.mutect2.chr2_16146120-32867130.pileups.table --reference GRCh38_latest_genomic.fna --intervals chr2_16146120-32867130.bed --tmp-dir .
  13:15:32.947 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/share/gatk4-4.5.0.0-0/gatk-package-4.5.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
  13:15:33.307 INFO  GetPileupSummaries - ------------------------------------------------------------
  13:15:33.313 INFO  GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.5.0.0
  13:15:33.314 INFO  GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
  13:15:33.314 INFO  GetPileupSummaries - Executing as plgouttebel@n064 on Linux v3.10.0-1160.el7.x86_64 amd64
  13:15:33.314 INFO  GetPileupSummaries - Java runtime: OpenJDK 64-Bit Server VM v17.0.10-internal+0-adhoc..src
  13:15:33.314 INFO  GetPileupSummaries - Start Date/Time: May 7, 2024 at 1:15:32 PM GMT
  13:15:33.314 INFO  GetPileupSummaries - ------------------------------------------------------------
  13:15:33.315 INFO  GetPileupSummaries - ------------------------------------------------------------
  13:15:33.316 INFO  GetPileupSummaries - HTSJDK Version: 4.1.0
  13:15:33.316 INFO  GetPileupSummaries - Picard Version: 3.1.1
  13:15:33.316 INFO  GetPileupSummaries - Built for Spark Version: 3.5.0
  13:15:33.317 INFO  GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
  13:15:33.317 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
  13:15:33.317 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
  13:15:33.317 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
  13:15:33.318 INFO  GetPileupSummaries - Deflater: IntelDeflater
  13:15:33.318 INFO  GetPileupSummaries - Inflater: IntelInflater
  13:15:33.318 INFO  GetPileupSummaries - GCS max retries/reopens: 20
  13:15:33.318 INFO  GetPileupSummaries - Requester pays: disabled
  13:15:33.319 INFO  GetPileupSummaries - Initializing engine
  13:15:33.322 INFO  GetPileupSummaries - Shutting down engine
  [May 7, 2024 at 1:15:33 PM GMT] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 0.01 minutes.
  Runtime.totalMemory()=167772160
  ***********************************************************************

  A USER ERROR has occurred: Fasta index file file://GRCh38_latest_genomic.fna.fai for reference file://GRCh38_latest_genomic.fna does not exist. Please see https://gatk.broadinstitute.org/hc/articles/360035531652-FASTA-Reference-genome-format for help creating it.

  ***********************************************************************
  Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

Work dir:
  /scratch/plgouttebel/data_Singula/nf-core-sarek_dev/work/a0/84752509ea76ccd51c89f3b8af9c20

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
May-07 15:15:33.769 [Task monitor] INFO  nextflow.Session - Execution cancelled -- Finishing pending tasks before exit
May-07 15:15:33.795 [main] DEBUG nextflow.Session - Session await > all processes finished

Relevant files

[plgouttebel@login01 nf-core-sarek_dev]$ ls -l /scratch/plgouttebel/data_Singula/nf-core-sarek_dev/work/a0/84752509ea76ccd51c89f3b8af9c20
total 4
lrwxrwxrwx 1 plgouttebel ubx2 160 May  7 15:15 af-only-gnomad.hg38.vcf.gz -> /scratch/plgouttebel/data_Singula/nf-core-sarek_dev/work/stage-afe03af1-e05d-4a93-af32-09b63a751b4a/3c/e686ef595583a185a5b7f2480f6f94/af-only-gnomad.hg38.vcf.gz
lrwxrwxrwx 1 plgouttebel ubx2 164 May  7 15:15 af-only-gnomad.hg38.vcf.gz.tbi -> /scratch/plgouttebel/data_Singula/nf-core-sarek_dev/work/stage-afe03af1-e05d-4a93-af32-09b63a751b4a/e9/bc174e86314d14b42fab79c5283b02/af-only-gnomad.hg38.vcf.gz.tbi
lrwxrwxrwx 1 plgouttebel ubx2 109 May  7 15:15 BR666F.sorted.cram -> /scratch/plgouttebel/data_Singula/nf-core-sarek_dev/work/1a/5ad95b654c06311dc198df39b7a33d/BR666F.sorted.cram
lrwxrwxrwx 1 plgouttebel ubx2 114 May  7 15:15 BR666F.sorted.cram.crai -> /scratch/plgouttebel/data_Singula/nf-core-sarek_dev/work/1a/5ad95b654c06311dc198df39b7a33d/BR666F.sorted.cram.crai
lrwxrwxrwx 1 plgouttebel ubx2 117 May  7 15:15 chr2_16146120-32867130.bed -> /scratch/plgouttebel/data_Singula/nf-core-sarek_dev/work/b0/fead63adc11db1d9353e4e666e6bf9/chr2_16146120-32867130.bed
lrwxrwxrwx 1 plgouttebel ubx2  69 May  7 15:15 GRCh38_latest_genomic.fna -> /gpfs/home/plgouttebel/home/exomic/data/ref/GRCh38_latest_genomic.fna
lrwxrwxrwx 1 plgouttebel ubx2 162 May  7 15:15 Homo_sapiens_assembly38.dict -> /scratch/plgouttebel/data_Singula/nf-core-sarek_dev/work/stage-afe03af1-e05d-4a93-af32-09b63a751b4a/0f/674a437a17df7ac9f50ac6d50c930c/Homo_sapiens_assembly38.dict
lrwxrwxrwx 1 plgouttebel ubx2 167 May  7 15:15 Homo_sapiens_assembly38.fasta.fai -> /scratch/plgouttebel/data_Singula/nf-core-sarek_dev/work/stage-afe03af1-e05d-4a93-af32-09b63a751b4a/01/63bf12053a02deb319a2f6ac4dbe47/Homo_sapiens_assembly38.fasta.fai

System information

HPC on curta from MCIA (Mésocentre de calcul intensif aquitain) sarek downloaded locally

maxulysse commented 4 months ago

So from what I can see, issue is that null should have been assigned to genome. But in my opinion, sarek should have either failed early. Or print a huge warning and recompute the basic index from the fasta file: fai, dict + needed build index.

FriederikeHanssen commented 2 months ago

This is related to #1253 . Can we track there or is there an additional issue you found?

maxulysse commented 2 months ago

This is related to #1253 . Can we track there or is there an additional issue you found?

Yeah, that sounds similar to me. Let's close this one over the older issue