yyoshiaki / VIRTUS2

A bioinformatics pipeline for viral transcriptome detection and quantification considering splicing.
Other
16 stars 6 forks source link

Memory Issue #39

Open SomeGuy3865 opened 2 months ago

SomeGuy3865 commented 2 months ago

Hello,

You had helped me get this running previously and I since adapted whole genome read detection for single transcript detection which generated some novel insights that will soon be published. I'm now helping with a collaborator project and they sequenced to a VERY high depth. Using the exact same pipeline which worked for all whole genomes, single genes, and transcripts that I made references for, it is now running into this error when sorting BAM files through STAR

... started sorting BAM Max memory needed for sorting = 9659452

EXITING because of fatal ERROR: not enough memory for BAM sorting: SOLUTION: re-run STAR with at least --limitBAMsortRAM 1009659452 ... FATAL ERROR, exiting

I had 100G RAM, then tried reserving a node on my HPC for 400G, and am now trying to get 1024G. Though it seems like maybe this is something I have to specify internally and just increase the limit it is allowed to use. I'm guessing that would be somewhere within "VIRTUS_wrapper2.py", where I'd have to add another argument "--limitBAMsortRAM [value]" at the start, but I'm not sure where that argument would then have to point to within the script. Could you suggest to me what argument you'd recommend to add and what code to insert at what place? (I assume whereever it is starting the STAR run). It might even be something worth adding to the initial script, since it seems high depth sequencing is going to become standard for viral read detection. Sorry if I misinterpreted the error message/solution. Please correct me if I did.

I really appreciate your help, sir!

yyoshiaki commented 1 month ago

Hi, sorry for my late reply. I personally moved from Japan to the US, and couldn't take time. That's good to know, congratulations on your results!

Regarding the existing issue in STAR repo, increasing --limitBAMsortRAM or --outBAMsortingBinsN may work. https://github.com/alexdobin/STAR/issues/870

Because VIRTUS is using cwstool as the backbone, it's a bit tricky to modify it. You may be able to directly edit star_mapping-se.cwl or star_mapping-pe.cwl by adding the option.

https://github.com/yyoshiaki/VIRTUS2/blob/master/tool/star/star_mapping-se/star_mapping-se.cwl#L72 example, add this after L72

  - default: 200
    id: outBAMsortingBinsN
    type: int?
    inputBinding:
      position: 0
      prefix: '--outBAMsortingBinsN'
      shellQuote: false

The other option is you can simply split fastq files into small fastq files.