nf-core / rnafusion

RNA-seq analysis pipeline for detection of gene-fusions
https://nf-co.re/rnafusion
MIT License
138 stars 93 forks source link

building reference with singularity profile #455

Closed nservant closed 5 months ago

nservant commented 8 months ago

Description of the bug

Hi, I try to build the references as follow ;

nextflow run ${IDIR}/rnafusion/main.nf \
   --build_references \
   --starfusion --fusioncatcher --arriba \
   -profile singularity,test \
   --outdir ${IDIR}/rna-fusion-ref \
   -resume

But the pipeline crashed when trying to build the STARfusion references ...

ERROR ~ Error executing process > 'NFCORE_RNAFUSION:BUILD_REFERENCES:STARFUSION_BUILD (star-fusion)'

Caused by:
  Process `NFCORE_RNAFUSION:BUILD_REFERENCES:STARFUSION_BUILD (star-fusion)` terminated with an error exit status (2)

Command executed:

  export TMPDIR=/tmp
  wget http://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam34.0/Pfam-A.hmm.gz --no-check-certificate
  wget https://github.com/FusionAnnotator/CTAT_HumanFusionLib/releases/download/v0.3.0/fusion_lib.Mar2021.dat.gz -O CTAT_HumanFusionLib_Mar2021.dat.gz --no-check-certificate
  wget https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/AnnotFilterRule.pm -O AnnotFilterRule.pm --no-check-certificate
  wget https://www.dfam.org/releases/Dfam_3.4/infrastructure/dfamscan/homo_sapiens_dfam.hmm --no-check-certificate
  wget https://www.dfam.org/releases/Dfam_3.4/infrastructure/dfamscan/homo_sapiens_dfam.hmm.h3f --no-check-certificate
  wget https://www.dfam.org/releases/Dfam_3.4/infrastructure/dfamscan/homo_sapiens_dfam.hmm.h3i --no-check-certificate
  wget https://www.dfam.org/releases/Dfam_3.4/infrastructure/dfamscan/homo_sapiens_dfam.hmm.h3m --no-check-certificate
  wget https://www.dfam.org/releases/Dfam_3.4/infrastructure/dfamscan/homo_sapiens_dfam.hmm.h3p --no-check-certificate
  gunzip Pfam-A.hmm.gz && hmmpress Pfam-A.hmm
  /usr/local/src/STAR-Fusion/ctat-genome-lib-builder/prep_genome_lib.pl \
      --genome_fa Homo_sapiens.GRCh38.102.all.fa \
      --gtf Homo_sapiens.GRCh38.102.chr.gtf \
      --annot_filter_rule AnnotFilterRule.pm \
      --fusion_annot_lib CTAT_HumanFusionLib_Mar2021.dat.gz \
      --pfam_db Pfam-A.hmm \
      --dfam_db homo_sapiens_dfam.hmm \
      --max_readlength 100 \
      --CPU 2

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RNAFUSION:BUILD_REFERENCES:STARFUSION_BUILD":
      STAR-Fusion: $(STAR-Fusion --version 2>&1 | grep -i 'version' | sed 's/STAR-Fusion version: //')
  END_VERSIONS

Command exit status:
  2

Command output:
  Working...    done.
  Pressed and indexed 19179 HMMs (19179 names and 19179 accessions).
  Models pressed into binary file:   Pfam-A.hmm.h3m
  SSI index for binary model file:   Pfam-A.hmm.h3i
  Profiles (MSV part) pressed into:  Pfam-A.hmm.h3f
  Profiles (remainder) pressed into: Pfam-A.hmm.h3p

  Building a new DB, current time: 12/19/2023 14:36:09
  New DB name:   ctat_genome_lib_build_dir/ref_genome.fa
  New DB title:  ctat_genome_lib_build_dir/ref_genome.fa
  Sequence type: Nucleotide
  Keep MBits: T
  Maximum file size: 1000000000B
  Adding sequences from FASTA; added 25 sequences in 60.2846 seconds.
  Dec 19 14:37:39 ..... started STAR run
  Dec 19 14:37:39 ... starting to generate Genome files

Command error:
  250700K .......... .......... .......... .......... .......... 99%  333K 0s
  250750K .......... .......... .......... .......... .......... 99% 20.2M 0s
  250800K .......... .......... .......... .......... .......... 99%  333K 0s
  250850K .......... .......... .......... .......... .......... 99% 34.8M 0s
  250900K .......... .......... .......... .......... .......... 99%  334K 0s
  250950K .......... .......... .......... .......... .......... 99% 19.6M 0s
  251000K .......... .......... .......... ........             100%  137M=5m29s

  2023-12-19 14:34:30 (762 KB/s) - 'homo_sapiens_dfam.hmm.h3p' saved [257063264/257063264]

  Working...    done.
  Pressed and indexed 19179 HMMs (19179 names and 19179 accessions).
  Models pressed into binary file:   Pfam-A.hmm.h3m
  SSI index for binary model file:   Pfam-A.hmm.h3i
  Profiles (MSV part) pressed into:  Pfam-A.hmm.h3f
  Profiles (remainder) pressed into: Pfam-A.hmm.h3p
  -found STAR at /usr/local/bin/STAR

  -found makeblastdb at /usr/local/bin/makeblastdb

  -found blastn at /usr/local/bin/blastn

  -found dfamscan.pl at /usr/local/bin/dfamscan.pl

  -found nhmmscan at /usr/local/bin/nhmmscan

  -found hmmscan at /usr/local/bin/hmmscan

  * Running CMD: cp Homo_sapiens.GRCh38.102.all.fa ctat_genome_lib_build_dir/ref_genome.fa
  * Running CMD: samtools faidx ctat_genome_lib_build_dir/ref_genome.fa
  * Running CMD: makeblastdb -in ctat_genome_lib_build_dir/ref_genome.fa -dbtype nucl

  Building a new DB, current time: 12/19/2023 14:36:09
  New DB name:   ctat_genome_lib_build_dir/ref_genome.fa
  New DB title:  ctat_genome_lib_build_dir/ref_genome.fa
  Sequence type: Nucleotide
  Keep MBits: T
  Maximum file size: 1000000000B
  Adding sequences from FASTA; added 25 sequences in 60.2846 seconds.
  * Running CMD: cp Homo_sapiens.GRCh38.102.chr.gtf ctat_genome_lib_build_dir/ref_annot.gtf
  * Running CMD: bash -c " set -euxo pipefail; /usr/local/src/STAR-Fusion/ctat-genome-lib-builder/util/gtf_to_exon_gene_records.pl ctat_genome_lib_build_dir/ref_annot.gtf  | sort -k 1,1 -k4,4g -k5,5g | uniq  > ctat_genome_lib_build_dir/ref_annot.gtf.mini.sortu " 
  + /usr/local/src/STAR-Fusion/ctat-genome-lib-builder/util/gtf_to_exon_gene_records.pl ctat_genome_lib_build_dir/ref_annot.gtf
  + uniq
  + sort -k 1,1 -k4,4g -k5,5g
  * Running CMD: STAR --runThreadN 2 --runMode genomeGenerate --genomeDir ctat_genome_lib_build_dir/ref_genome.fa.star.idx   --genomeFastaFiles Homo_sapiens.GRCh38.102.all.fa  --limitGenomeGenerateRAM 40419136213  --genomeChrBinNbits 16  --sjdbGTFfile Homo_sapiens.GRCh38.102.chr.gtf  --sjdbOverhang 100 
  Dec 19 14:37:39 ..... started STAR run
  Dec 19 14:37:39 ... starting to generate Genome files
  Error, cmd: STAR --runThreadN 2 --runMode genomeGenerate --genomeDir ctat_genome_lib_build_dir/ref_genome.fa.star.idx   --genomeFastaFiles Homo_sapiens.GRCh38.102.all.fa  --limitGenomeGenerateRAM 40419136213  --genomeChrBinNbits 16  --sjdbGTFfile Homo_sapiens.GRCh38.102.chr.gtf  --sjdbOverhang 100  died with ret 9 No such file or directory at /usr/local/src/STAR-Fusion/ctat-genome-lib-builder/lib/Pipeliner.pm line 186.
    Pipeliner::run(Pipeliner=HASH(0x557ca6f950e0)) called at /usr/local/src/STAR-Fusion/ctat-genome-lib-builder/prep_genome_lib.pl line 460

Work dir:
  /data/kdi_prod/.kdi/project_workspace_0/1983/acl/01.00/nf-core/work/aa/10e3b22515fa47363273ec82028fe0

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details

It seems that it crashes when building the STAR index with died with ret 9 No such file or directory at /usr/local/src/STAR-Fusion/ctat-genome-lib-builder/lib/Pipeliner.pm line 186. But I checked and all required input files are in the work folder ... and singularity should be able to access them as it binds the current folder ...

Any idea is welcome :) Thanks

Command used and terminal output

No response

Relevant files

No response

System information

No response

rannick commented 8 months ago

Hello! Is the error consistent, i.e. occurs several time at the same place?

nservant commented 8 months ago

Yes. I also try to run it manually

singularity exec --no-home --pid -B OUTDIR/nf-core OUTDIR/nf-core/work/singularity/docker.io-trinityctat-starfusion-1.12.0.img /usr/local/src/STAR-Fusion/ctat-genome-lib-builder/prep_genome_lib.pl     --genome_fa Homo_sapiens.GRCh38.102.all.fa     --gtf Homo_sapiens.GRCh38.102.chr.gtf     --annot_filter_rule AnnotFilterRule.pm     --fusion_annot_lib CTAT_HumanFusionLib_Mar2021.dat.gz     --pfam_db Pfam-A.hmm     --dfam_db homo_sapiens_dfam.hmm     --max_readlength 100     --CPU 2

same error

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LANG = "fr_FR.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
-found STAR at /usr/local/bin/STAR

-found makeblastdb at /usr/local/bin/makeblastdb

-found blastn at /usr/local/bin/blastn

-found dfamscan.pl at /usr/local/bin/dfamscan.pl

-found nhmmscan at /usr/local/bin/nhmmscan

-found hmmscan at /usr/local/bin/hmmscan

-- Skipping CMD: cp /data/kdi_prod/.kdi/project_workspace_0/1983/acl/01.00/nf-core/work/aa/10e3b22515fa47363273ec82028fe0/Homo_sapiens.GRCh38.102.all.fa /data/kdi_prod/.kdi/project_workspace_0/1983/acl/01.00/nf-core/work/aa/10e3b22515fa47363273ec82028fe0/ctat_genome_lib_build_dir/ref_genome.fa, checkpoint [/data/kdi_prod/.kdi/project_workspace_0/1983/acl/01.00/nf-core/work/aa/10e3b22515fa47363273ec82028fe0/ctat_genome_lib_build_dir/__chkpts/ref_genome.fa.ok] exists.
-- Skipping CMD: samtools faidx /data/kdi_prod/.kdi/project_workspace_0/1983/acl/01.00/nf-core/work/aa/10e3b22515fa47363273ec82028fe0/ctat_genome_lib_build_dir/ref_genome.fa, checkpoint [/data/kdi_prod/.kdi/project_workspace_0/1983/acl/01.00/nf-core/work/aa/10e3b22515fa47363273ec82028fe0/ctat_genome_lib_build_dir/__chkpts/ref_genome_fai.ok] exists.
-- Skipping CMD: makeblastdb -in /data/kdi_prod/.kdi/project_workspace_0/1983/acl/01.00/nf-core/work/aa/10e3b22515fa47363273ec82028fe0/ctat_genome_lib_build_dir/ref_genome.fa -dbtype nucl, checkpoint [/data/kdi_prod/.kdi/project_workspace_0/1983/acl/01.00/nf-core/work/aa/10e3b22515fa47363273ec82028fe0/ctat_genome_lib_build_dir/__chkpts/makeblastdb.ok] exists.
-- Skipping CMD: cp /data/kdi_prod/.kdi/project_workspace_0/1983/acl/01.00/nf-core/work/aa/10e3b22515fa47363273ec82028fe0/Homo_sapiens.GRCh38.102.chr.gtf /data/kdi_prod/.kdi/project_workspace_0/1983/acl/01.00/nf-core/work/aa/10e3b22515fa47363273ec82028fe0/ctat_genome_lib_build_dir/ref_annot.gtf, checkpoint [/data/kdi_prod/.kdi/project_workspace_0/1983/acl/01.00/nf-core/work/aa/10e3b22515fa47363273ec82028fe0/ctat_genome_lib_build_dir/__chkpts/ref_annot.gtf.ok] exists.
-- Skipping CMD: bash -c " set -euxo pipefail; /usr/local/src/STAR-Fusion/ctat-genome-lib-builder/util/gtf_to_exon_gene_records.pl /data/kdi_prod/.kdi/project_workspace_0/1983/acl/01.00/nf-core/work/aa/10e3b22515fa47363273ec82028fe0/ctat_genome_lib_build_dir/ref_annot.gtf  | sort -k 1,1 -k4,4g -k5,5g | uniq  > /data/kdi_prod/.kdi/project_workspace_0/1983/acl/01.00/nf-core/work/aa/10e3b22515fa47363273ec82028fe0/ctat_genome_lib_build_dir/ref_annot.gtf.mini.sortu " , checkpoint [/data/kdi_prod/.kdi/project_workspace_0/1983/acl/01.00/nf-core/work/aa/10e3b22515fa47363273ec82028fe0/ctat_genome_lib_build_dir/__chkpts/ref_annot.gtf.mini.sortu.ok] exists.
* Running CMD: STAR --runThreadN 2 --runMode genomeGenerate --genomeDir /data/kdi_prod/.kdi/project_workspace_0/1983/acl/01.00/nf-core/work/aa/10e3b22515fa47363273ec82028fe0/ctat_genome_lib_build_dir/ref_genome.fa.star.idx   --genomeFastaFiles /data/kdi_prod/.kdi/project_workspace_0/1983/acl/01.00/nf-core/work/aa/10e3b22515fa47363273ec82028fe0/Homo_sapiens.GRCh38.102.all.fa  --limitGenomeGenerateRAM 40419136213  --genomeChrBinNbits 16  --sjdbGTFfile /data/kdi_prod/.kdi/project_workspace_0/1983/acl/01.00/nf-core/work/aa/10e3b22515fa47363273ec82028fe0/Homo_sapiens.GRCh38.102.chr.gtf  --sjdbOverhang 100 
Dec 19 15:54:05 ..... started STAR run
Dec 19 15:54:05 ... starting to generate Genome files
Dec 19 15:54:57 ..... processing annotations GTF
Dec 19 15:55:34 ... starting to sort Suffix Array. This may take a long time...
Dec 19 15:55:49 ... sorting Suffix Array chunks and saving them to disk...
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
terminate called recursively
Error, cmd: STAR --runThreadN 2 --runMode genomeGenerate --genomeDir /data/kdi_prod/.kdi/project_workspace_0/1983/acl/01.00/nf-core/work/aa/10e3b22515fa47363273ec82028fe0/ctat_genome_lib_build_dir/ref_genome.fa.star.idx   --genomeFastaFiles /data/kdi_prod/.kdi/project_workspace_0/1983/acl/01.00/nf-core/work/aa/10e3b22515fa47363273ec82028fe0/Homo_sapiens.GRCh38.102.all.fa  --limitGenomeGenerateRAM 40419136213  --genomeChrBinNbits 16  --sjdbGTFfile /data/kdi_prod/.kdi/project_workspace_0/1983/acl/01.00/nf-core/work/aa/10e3b22515fa47363273ec82028fe0/Homo_sapiens.GRCh38.102.chr.gtf  --sjdbOverhang 100  died with ret 134 No such file or directory at /usr/local/src/STAR-Fusion/ctat-genome-lib-builder/lib/Pipeliner.pm line 186.
    Pipeliner::run(Pipeliner=HASH(0x55dae81c0b88)) called at /usr/local/src/STAR-Fusion/ctat-genome-lib-builder/prep_genome_lib.pl line 460
rannick commented 8 months ago

Ok, here is a first possibility: https://github.com/alexdobin/STAR/issues/1620 not enough RAM

nservant commented 8 months ago

Thanks, I'll try !

nservant commented 8 months ago

there is no CUP requirement in the STARFUSION_BUILD module. Does it mean that it is using the default in base.config ? ie. 6 Gb ?

process STARFUSION_BUILD {
    tag 'star-fusion'

    conda "bioconda::dfam=3.3 bioconda::hmmer=3.3.2 bioconda::star-fusion=1.12.0 bioconda::trinity=2.13.2 bioconda::samtools=1.9 bioconda::star=2.7.8a"
    container "docker.io/trinityctat/starfusion:1.12.0"
rannick commented 7 months ago

You might have found it in the meantime but the requirements are specified in conf/modules.config: https://github.com/nf-core/rnafusion/blob/ffe9c09fb070e5675361e06543ca53fa3d29c470/conf/modules.config#L322-L325

rannick commented 7 months ago

It would be great to hear if you have solved the issue and how

rannick commented 5 months ago

I will assume this was a cluster dependent error that is now closed