Enormous memory consumption with large BAM file

plger commented 9 years ago

Hi, I'm trying to launch a stringTie assembly on a rather large BAM file (81G), and the call systematically ends with an 'error allocating memory'. I monitored it and saw that it used up to 128 GB of ram (it keeps increasing, and decreases only by tiny amounts from time to time). I tried reducing the number of threads, and using the default -s 1000000, but I get into the same problem. This is puzzling since loci are supposed to be processed one by one and I cannot understand how loading a maximum of 1000000 read pairs, even multithreading, could ever use so much memory. Is this somehow expected? The reads have been aligned to a spiked-in human genome using HISAT, and I'm using stringTie -G with the refseq annotation. (The fact that it's spiked-in means that technically there are a bit more than 100 chromosomes, if that's of any relevance.) (I've tried compiling and running the v1.0.4 software on two different linux x86 clusters.) Thanks in advance, Pierre-Luc

mpertea commented 9 years ago

Hi Pierre-Luc, We are working on a version of StringTie to reduce the large memory footprint when spliced reads artificially join distinct bundles together. Unfortunately HISAT has a tendency to predict many such splices reads, which results in a high memory usage. Due to this issue, at this point we don't recommend using StringTie with HISAT. Please use TopHat until we finish fixing this problem.

plger commented 9 years ago

Thanks a lot for the hint!

rjpbonnal commented 9 years ago

Hi, do you think that STAR could be used for this purpose ?

mpertea commented 9 years ago

STAR poses the same problems as HISAT unfortunately. We will have a fix for this soon.

infphilo commented 9 years ago

I recently released HISAT2, which is a successor to both HISAT and TopHat2. Please use HISAT2 with an option, --dta, which should work well with StringTie. HISAT2 is faster and much more memory efficient than STAR and provides better alignment quality. The HISAT2 website is at https://ccb.jhu.edu/software/hisat2

mpertea commented 9 years ago

We just released version 1.1.0 of StringTie which now runs using less than 1Gb memory for almost all samples.

gpertea / stringtie

Enormous memory consumption with large BAM file #18