Closed zhixue closed 2 years ago
Thank you for reporting this and providing the debug data -- it seems that there is a particular BAM record in the uLTRA output that stringtie has trouble parsing properly, I'll be fixing that shortly.
The problem seems to be related to record 880af412-ef82-474a-9e85-e6df5784e5ac having an alignment that ends with an intron ( the CIGAR string ends with =9I4=1X2=1X5D230N
), which does not quite make sense by itself, unless that is a peculiar way of suggesting that the read alignment ends exactly at an intron boundary ? However, that alignment does not have a transcription strand assigned, which makes that justification rather unlikely.
I can modify StringTie to ignore that kind of unusual alignment (hanging intron with no terminal exon) but I suspect the problem might be deeper, it could be an alignment bug and perhaps it should be reported to the uLTRA aligner author.
Most SAM processing tools seem to silently ignore this issue, including IGV, so I guess I'll do the same (certainly preferable over the current crash due to the unexpected structural anomaly, the number of "exons" vs. the number of introns detected in that alignment).
Thanks for your rapid response!
I have re-downloaded the latest version of stringtie and run this sample successfully! I have also reported this case to the uLTRA aligner author.
Maybe the output of uLTRA has something unexpected in SAM format, because I have another sample causing "segmentation fault". With the similar way, I have located the trouble at part of alignment records in Chr1, but I can not infer more.
The bam file is here (28K) Sample2_Chr1head10900_11000.bam.
The commands are as follow:
# stringtie
~/tool/stringtie_996f585/stringtie -p 1 -L -l S2c1 -o Sample2_Chr1head10900_11000.gtf Sample2_Chr1head10900_11000.bam
#### Segmentation fault ####
Hmm, this was the same issue of a hanging intron with no terminal exon, but this time capped by a insertion (the CIGAR of read 03debcb9-2135-431b-bbf0-ff10c64983d1 ends with 1X3=3I1=59N2I
)
I'll add a more robust check there: if there is no M
/X
/=
preceding the first intron (N) or following the last intron, such intron should be discarded.
Addressed by 62551bb.
It works. Thank you~
Hi, thanks for the wonderful tool for long RNA reads analysis~
I am trying to run StringTie2 for rice ONT raw reads (fastq reads) by first running the uLTRA aligner (v0.0.4) and then provide generated sorted bam file to StringTie (v2.2.1). I have used IGV to check the bam file, it is ok.
Moreover, I have divided the bam file in different chromosomes to test, and I have found that the 90,001th~95,000th sorted reads cause "segmentation fault".
The bam file is here (5.8M) chr3_head90000_95000.bam.
The commands are as follow: