nf-core / scrnaseq

A single-cell RNAseq pipeline for 10X genomics data
https://nf-co.re/scrnaseq
MIT License
213 stars 170 forks source link

kallisto subworkflow runs out of memory (reiteration of #38) #116

Closed Khajidu closed 2 years ago

Khajidu commented 2 years ago

Description of the bug

Every time I run kallisto as a subworkflow, it crashes for memory reasons. However, unlike in #38, it crashes at the indexing stage instead of the bustools stage. It seems to be the same reason, though, as I get the same kind of error messages (I also tried hard-coding the memory requirements in version 1.1.0 and the pipeline then worked). The reason I see is that I ask for like 32GB of memory (cannot ask for 32G as I get string [32.G] does not match pattern ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$ (32.G)) and kallisto wants to use 32G (and this cannot be changed for GB either).

Command used and terminal output

$ nextflow run nf-core/scrnaseq -r dev --max_cpus 32 --max_memory '32.GB' --outdir /shared/ifbstor1/projects/bsbii/sc_single_cell_brain/fulltest/ --protocol '10XV3' --aligner kallisto --transcript_fasta /shared/projects/bsbii/sc_single_cell_brain/sc_gene_models_ncbi_utrs.fasta --input 'test_samples_v2.csv' --genome_fasta /shared/projects/bsbii/sc_single_cell_brain/sc_ncbi_genome.fasta --gtf /shared/projects/bsbii/sc_single_cell_brain/sc_gene_models_ncbi_no_genes_no_contigs_notrnas.gtf --kb_workflow 'nucleus' -profile ifb_core

Error executing process > 'NFCORE_SCRNASEQ:SCRNASEQ:KALLISTO_BUSTOOLS:KALLISTOBUSTOOLS_REF (sc_ncbi_genome.fasta)'

Caused by:
  Process `NFCORE_SCRNASEQ:SCRNASEQ:KALLISTO_BUSTOOLS:KALLISTOBUSTOOLS_REF (sc_ncbi_genome.fasta)` terminated for an unknown reason -- Likely it has been terminated by the external system

Command executed:

  kb \
      ref \
      -i kb_ref_out.idx \
      -g t2g.txt \
      -f1 cdna.fa \
      -f2 intron.fa \
      -c1 cdna_t2c.txt \
      -c2 intron_t2c.txt \
      --workflow nucleus \
      sc_ncbi_genome.fasta \
      sc_gene_models_ncbi_no_genes_no_contigs_notrnas.gtf

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SCRNASEQ:SCRNASEQ:KALLISTO_BUSTOOLS:KALLISTOBUSTOOLS_REF":
      kallistobustools: $(echo $(kb --version 2>&1) | sed 's/^.*kb_python //;s/positional arguments.*$//')
  END_VERSIONS

Command exit status:
  -

Command output:
  (empty)

Command error:
  [2022-06-21 09:31:39,900]    INFO [ref_lamanno] Preparing sc_ncbi_genome.fasta, sc_gene_models_ncbi_no_genes_no_contigs_notrnas.gtf
  [2022-06-21 09:32:13,709]    INFO [ref_lamanno] Splitting genome sc_ncbi_genome.fasta into cDNA at tmp/tmp74iebqht
  [2022-06-21 09:32:53,465]    INFO [ref_lamanno] Creating cDNA transcripts-to-capture at tmp/tmp4_dnckwg
  [2022-06-21 09:32:53,805]    INFO [ref_lamanno] Splitting genome into introns at tmp/tmpb18oat1d
  [2022-06-21 09:38:41,370]    INFO [ref_lamanno] Creating intron transcripts-to-capture at tmp/tmpmbj8zrjy
  [2022-06-21 09:38:51,358]    INFO [ref_lamanno] Concatenating 1 cDNA FASTAs to cdna.fa
  [2022-06-21 09:38:51,770]    INFO [ref_lamanno] Concatenating 1 cDNA transcripts-to-captures to cdna_t2c.txt
  [2022-06-21 09:38:51,792]    INFO [ref_lamanno] Concatenating 1 intron FASTAs to intron.fa
  [2022-06-21 09:39:06,987]    INFO [ref_lamanno] Concatenating 1 intron transcripts-to-captures to intron_t2c.txt
  [2022-06-21 09:39:07,161]    INFO [ref_lamanno] Concatenating cDNA and intron FASTAs to tmp/tmpvz8czi6v
  [2022-06-21 09:39:22,955]    INFO [ref_lamanno] Creating transcript-to-gene mapping at t2g.txt
  [2022-06-21 09:39:37,502]    INFO [ref_lamanno] Indexing tmp/tmpvz8czi6v to kb_ref_out.idx

Work dir:
  /shared/ifbstor1/projects/bsbii/sc_single_cell_brain/work/72/f969fb35db25b832205956436bb6e5

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

Relevant files

No response

System information

Nextflow version : 22.04.0 Hardware : HPC Executor : Slurm Container : Singularity OS : CentOS Version of nf-core/scrnaseq : 2.0.0 or dev

apeltzer commented 2 years ago

You could try supplying a separate config that overwrites what kallisto is using for all steps, e.g. https://nf-co.re/usage/configuration#custom-configuration-files and then supplying something for memory:


withName: 'KALLISTOBUSTOOLS_REF' {
        memory : '32.GB'
    }
Khajidu commented 2 years ago

It didn't work, same error.

Here are the logs:

`[2022-06-22 09:29:53,353] INFO [ref_lamanno] Preparing sc_ncbi_genome.fasta, sc_gene_models_ncbi_no_genes_no_contigs_notrnas.gtf

[2022-06-22 09:30:27,852] INFO [ref_lamanno] Splitting genome sc_ncbi_genome.fasta into cDNA at /shared/ifbstor1/projects/bsbii/sc_single_cell_brain/work/ca/8de08f1ecfab6f41d8cc1f94e45c52/tmp/tmp_aaaucz5

[2022-06-22 09:31:08,887] INFO [ref_lamanno] Creating cDNA transcripts-to-capture at /shared/ifbstor1/projects/bsbii/sc_single_cell_brain/work/ca/8de08f1ecfab6f41d8cc1f94e45c52/tmp/tmpxx9d3eue

[2022-06-22 09:31:09,220] INFO [ref_lamanno] Splitting genome into introns at /shared/ifbstor1/projects/bsbii/sc_single_cell_brain/work/ca/8de08f1ecfab6f41d8cc1f94e45c52/tmp/tmp_4zx6l_e

[2022-06-22 09:37:03,403] INFO [ref_lamanno] Creating intron transcripts-to-capture at /shared/ifbstor1/projects/bsbii/sc_single_cell_brain/work/ca/8de08f1ecfab6f41d8cc1f94e45c52/tmp/tmpw8sjit6k

[2022-06-22 09:37:11,916] INFO [ref_lamanno] Concatenating 1 cDNA FASTAs to cdna.fa

[2022-06-22 09:37:12,346] INFO [ref_lamanno] Concatenating 1 cDNA transcripts-to-captures to cdna_t2c.txt

[2022-06-22 09:37:12,369] INFO [ref_lamanno] Concatenating 1 intron FASTAs to intron.fa

[2022-06-22 09:37:28,411] INFO [ref_lamanno] Concatenating 1 intron transcripts-to-captures to intron_t2c.txt

[2022-06-22 09:37:28,593] INFO [ref_lamanno] Concatenating cDNA and intron FASTAs to /shared/ifbstor1/projects/bsbii/sc_single_cell_brain/work/ca/8de08f1ecfab6f41d8cc1f94e45c52/tmp/tmp4jp7quuj

[2022-06-22 09:37:46,283] INFO [ref_lamanno] Creating transcript-to-gene mapping at t2g.txt

[2022-06-22 09:38:01,695] INFO [ref_lamanno] Indexing /shared/ifbstor1/projects/bsbii/sc_single_cell_brain/work/ca/8de08f1ecfab6f41d8cc1f94e45c52/tmp/tmp4jp7quuj to kb_ref_out.idx

[2022-06-22 09:47:41,663] ERROR [ref_lamanno]

[build] loading fasta file /shared/ifbstor1/projects/bsbii/sc_single_cell_brain/work/ca/8de08f1ecfab6f41d8cc1f94e45c52/tmp/tmp4jp7quuj

[build] k-mer length: 31

[build] warning: clipped off poly-A tail (longer than 10) from 242 target sequences

[build] warning: replaced 24801554 non-ACGUT characters in the input sequence with pseudorandom nucleotides

[build] counting k-mers ...

[2022-06-22 09:47:41,679] ERROR [main] An exception occurred

Traceback (most recent call last):

File "/usr/local/lib/python3.9/site-packages/kb_python/main.py", line 856, in main COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)

File "/usr/local/lib/python3.9/site-packages/kb_python/main.py", line 131, in parse_ref ref_lamanno(

File "/usr/local/lib/python3.9/site-packages/ngs_tools/logging.py", line 62, in inner return func(*args, **kwargs)

File "/usr/local/lib/python3.9/site-packages/kb_python/ref.py", line 661, in ref_lamanno index_result = kallisto_index(combined_path, index_path, k=k or 31)

File "/usr/local/lib/python3.9/site-packages/kb_python/ref.py", line 212, in kallisto_index run_executable(command)

File "/usr/local/lib/python3.9/site-packages/kb_python/dry/init.py", line 24, in inner return func(*args, **kwargs)

File "/usr/local/lib/python3.9/site-packages/kb_python/utils.py", line 195, in run_executable raise sp.CalledProcessError(p.returncode, ' '.join(command))

subprocess.CalledProcessError: Command '/usr/local/bin/kallisto index -i kb_ref_out.idx -k 31 /shared/ifbstor1/projects/bsbii/sc_single_cell_brain/work/ca/8de08f1ecfab6f41d8cc1f94e45c52/tmp/tmp4jp7quuj' died wit h <Signals.SIGKILL: 9>.

slurmstepd: error: Detected 1 oom-kill event(s) in StepId=23442481.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.`

Khajidu commented 2 years ago

Link to profile configuration if needed: https://github.com/nf-core/configs/blob/master/conf/ifb_core.config

Khajidu commented 2 years ago

Solved by giving more memory to the process in the config file.

apeltzer commented 2 years ago

Ok, then lets add another profile for you that automatically does that for your cluster

Khajidu commented 2 years ago

Good!

apeltzer commented 2 years ago

Can you share what/how you modified the memory in the process config file? Then we can simply copy that

Khajidu commented 2 years ago

I set the memory as the following:

`process { withLabel:process_high { memory = 500.GB } withName:'KALLISTOBUSTOOLS_REF' { memory = 250.GB } withName:'KALLISTOBUSTOOLS_COUNT' { memory = 250.GB }

}`

grst commented 2 years ago

Any chance this got fixed also for less memory now that we added the -m <MEMORY> flag in the kallisto module?

ogibson commented 2 years ago

This issue seems to be resolved. @apeltzer, is there anything else that can be done here?

apeltzer commented 2 years ago

No, if it works we can just close here šŸ‘šŸ»