big reference fasta will crash STAR

GoogleCodeExporter commented 8 years ago

What steps will reproduce the problem?
1. use multiple fasta (e.g. contigs) with many entries (>10000) as a reference 
file to be mapped to

What is the expected output? What do you see instead?
expected: clean run, seen: crash

What version of the product are you using? On what operating system?
newest (2.3 or so)

Original issue reported on code.google.com by sadd...@gmail.com on 12 Feb 2013 at 10:18

GoogleCodeExporter commented 8 years ago

Hi @saddy01

Did the crash happen at the genome generation step?
Could you please share with me the .fasta file that caused the crash, so that I 
can replicate the problem.

Thanks
Alex

Original comment by adobin@gmail.com on 12 Feb 2013 at 2:24

GoogleCodeExporter commented 8 years ago

Hi,

I have the same problem. My reference genome is about 300.000 fasta sequences 
and about 300MB in size.

I get this message:

Feb 18 11:25:11 ..... Started STAR run
Feb 18 11:25:11 ... Starting to generate Genome files
/var/spool/slurmd/job1306271/slurm_script: line 9: 12129 Killed  

Jon

Original comment by jon.br...@gmail.com on 18 Feb 2013 at 10:28

GoogleCodeExporter commented 8 years ago

It crashes at genome generation. I'm sorry I can't share my file. But I think 
it should be fine if you generate a large enough fasta (many short sequences) 
with random bases.

Original comment by sadd...@gmail.com on 19 Feb 2013 at 12:44

GoogleCodeExporter commented 8 years ago

@saddy01

This is likely to be insufficient RAM problem. STAR bins genome sequence in a 
way that each chromosome (contig) starts at a new bin, which creates an 
overhead of Nchromosomes*BinSize, where BinSize=2^genomeChrBinNbits. By 
default, --genomeChrBinNbits = 18.

I suggest that you try a much smaller value of --genomeChrBinNbits 12. This 
would require just a few GB of RAM and should allow you to generate the genome 
files. I have not tried STAR with more than 50,000 contigs, and I suspect there 
might be significant slowdown in the mapping speed when the number of contigs 
is too big.

Please check this discussion thread
http://seqanswers.com/forums/showpost.php?p=96821&postcount=16

If this still does not work, please generate "the random" genome matching the 
sizes of your real genome contigs, check that it still fails to run, and send 
me the genome. I need to replicate the problem to fix it.

Original comment by adobin@gmail.com on 20 Feb 2013 at 2:39

Changed state: Accepted
Added labels: Type-Other
Removed labels: Type-Defect

mbaughn / rna-star

big reference fasta will crash STAR #2