Open EreboPSilva opened 6 years ago
It is good that you re-posted it as a new issue as indeed it made it easier to spot it and I also think your case is not related to the other, closed issue.
It is hard to see what's going on there (the reason for crashing) without the data to reproduce it here in a "debug" environment.. But it looks like you have a bunch of very large bundles, with a suspiciously high number of junctions in each bundle, suggesting bad (noisy) data (alignments), and using -p 4 made stringtie load those 4 bundles and process them all 4 at the same time.. So one thing to check is if the crash isn't caused by an out-of-memory situation -- how much RAM do you have on that machine?
Try running without the -p option (single processing thread), that's the only way to identify a specific bundle that might be causing a crash (please see this document for hints about how to identify a problem bundle and submit data for debugging), and to also lower the memory usage. The fact that there are so many junctions there may also be an indicator that your data may be noisy (low quality alignments, and/or a multiplexed sample? -- please don't use multiplexed data with stringtie!). STAR is nice and fast but it might generate a lot of false positives unless used with more stringent alignment options (cannot help you with STAR options, sorry I don't use it). It looks like each bundle covers almost an entire scaffold by itself, with way too many junctions in each bundle..
You might be able to filter some of the spurious junctions by increasing the value of the -j
parameter (see some discussion about lowering memory usage here: https://github.com/gpertea/stringtie/issues/164#issuecomment-363597528). If the problem persists and it's not really memory related (doesn't exhaust your RAM), try to prepare a bundle data for debugging, if you can share it with me.
We have around 128 Gb of RAM, maybe there is the problem. I've checked it and this is the output now:
jmgps@bq078:sra$ stringtie rvar_30_active/Aligned.sortedByCoord.out.bam -v -o rvar_30_active_clout/transcripts.gtf -G rvar.gff -A rvar_30_active_gene_abund.tab
Running StringTie 1.3.4d. Command line:
stringtie rvar_30_active/Aligned.sortedByCoord.out.bam -v -o rvar_30_active_clout/transcripts.gtf -G rvar.gff -A rvar_30_active_gene_abund.tab
[10/25 17:25:58] Loading reference annotation (guides)..
[10/25 17:25:59] 198 reference transcripts loaded.
Default stack size for threads: 8388608
[10/25 17:26:34]>bundle Scaffold001:10-9333030 [12512970 alignments (4518214 distinct), 85971 junctions, 20 guides] begins processing...
Segmentation fault
I will check both STAR documentation and the document you are linking, and I'll retry.
Thanks!
128GB should be enough for regular data, especially when using a single thread -- and it should be easy to check how much memory stringtie is using before it crashes -- even just visually using top, but better yet you could run that stringtie command through the "time" program with -f (formatting string) adjusted to include the %M output (maximum resident memory use).
I'm opening this issue despite having commented in the old issue about this theme, since I'm not entirely sure on how commenting closed issues works. So I'd like to apologize if by this I'm simply being repetitive. I'll paste my original message and a link to the old issue: