gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
378 stars 78 forks source link

SAM error: found spliced alignment without XS attribute #7

Closed jingyayeah closed 9 years ago

jingyayeah commented 9 years ago

Hi,

I'm using the assembled transcripts GTF with StringTie to run cuffmerge,but i get error as follows:

[14:42:00] Loading reference annotation. [14:42:03] Inspecting reads and determining fragment length distribution. SAM error on line 470: found spliced alignment without XS attribute SAM error on line 471: found spliced alignment without XS attribute SAM error on line 896: found spliced alignment without XS attribute SAM error on line 897: found spliced alignment without XS attribute SAM error on line 1218: found spliced alignment without XS attribute

I use the bam file produced by HISAT,there are some spliced read alignment without the tag XS .

as follows: HWI-7001455:320:HH7WNADXX:1:2110:1668:35148 99 3 9795235 255 21M77N105M = 9795423 314 GTTACTTTCTCTGTCCCCAAGGGTTTTCACTGAATTCTCAGGATTGCGAAGCTCCTCTGCTTCTCTTCCCTTCGGCAAGAAACTTTCTTCCGATGAGTTCGTTTCCATCGTCTCCTTCCAGACTTC ;988(.)9:=4@;.>))<:((-268=@=1=?63<=???)3:>?<?88?8=?;1==8;;????3=;>?>?####################################################### AS:i:-15 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:23G102 YS:i:-5 YT:Z:CP

I am not sure the sam caused the error of cuffmerge ?

lmdu commented 9 years ago

I use HISAT + stringTie + cuffmerge to assemble transcripts. I get the same errors.

[10:13:25] Loading reference annotation. [Tue Apr 7 10:13:27 2015] Assembling transcripts You are using Cufflinks v2.2.1, which is the most recent release. Command line: cufflinks -o merged/ -F 0.05 -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 8 merged/tmp/mergeSam_filefebrLL [bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). File merged/tmp/mergeSam_filefebrLL doesn't appear to be a valid BAM file, trying SAM... [10:13:29] Inspecting reads and determining fragment length distribution. SAM error on line 28759: found spliced alignment without XS attribute SAM error on line 68416: found spliced alignment without XS attribute SAM error on line 69011: found spliced alignment without XS attribute SAM error on line 93561: found spliced alignment without XS attribute SAM error on line 103959: found spliced alignment without XS attribute SAM error on line 152041: found spliced alignment without XS attribute SAM error on line 160060: found spliced alignment without XS attribute SAM error on line 160061: found spliced alignment without XS attribute SAM error on line 177497: found spliced alignment without XS attribute SAM error on line 177498: found spliced alignment without XS attribute

infphilo commented 9 years ago

In HISAT, XS attributes are not reported for alignments involving non-GT/AG splice sites. You can direct HISAT not to output such alignments using "--pen-noncansplice 1000000".

NixBio commented 9 years ago

HI,

I used unstranded data from an RNA-Seq experiment: Mapped with tophat2.

Then, I used StringTie with a reference.gtf to determine expressed and spliced Isoforms. To merge the .gtf files I used cuffmerge: This results in: SAM error on line 877: found spliced alignment without XS attribute

I searched for the respective gene in the reference.gtf, there the minus strand is used

Reference.gtf 1 ensembl exon 31870178 31870747 . - . transcript_id "Gene1_T01"; gene_id "Gene1"; gene_name "Gene1"; 1 ensembl exon 31874837 31875517 . - . transcript_id "Gene1_T01"; gene_id "Gene1"; gene_name "Gene1"; 1 ensembl CDS 31874845 31875387 . - 0 transcript_id "Gene1_T01"; gene_id "Gene1"; gene_name "Gene1";

In the StringTie.gtf this information is lost

StringTie.gtf 1 StringTie transcript 31870178 31875517 1000 . . gene_id "AB_1.599"; transcript_id "AB_1.599.1"; reference_id "Gene1_T01"; ref_gene_id "Gene1"; ref_gene_name "Gene1"; cov "6.554756"; FPKM "2.389276"; 1 StringTie exon 31870178 31870747 1000 . . gene_id "AB_1.599"; transcript_id "AB_1.599.1"; exon_number "1"; reference_id "Gene1_T01"; ref_gene_id "Gene1"; ref_gene_name "Gene1"; cov "12.215790"; 1 StringTie exon 31874837 31875517 1000 . . gene_id "AB_1.599"; transcript_id "AB_1.599.1"; exon_number "2"; reference_id "Gene1_T01"; ref_gene_id "Gene1"; ref_gene_name "Gene1"; cov "1.816446";

cufflinks -o /Alignments/StringTie/Cuffmerge04052015_log1/ -F 0.05 -g /Annotations/genomes/Reference.gtf -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 4 /Alignments/StringTie/Cuffmerge04052015_log/tmp/gtf2sam_filebgmy70 [bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). File /Alignments/StringTie/Cuffmerge04052015_log/tmp/gtf2sam_filebgmy70 doesn't appear to be a valid BAM file, trying SAM... [15:38:59] Loading reference annotation. [15:39:04] Inspecting reads and determining fragment length distribution. SAM error on line 877: found spliced alignment without XS attribute SAM error on line 877: found spliced alignment without XS attribute ....

When I looked in the respective file I find:

gtf2sam AB_1.598.2 0 1 31865715 255 107M80N158M172N88M181N74M94N75M91N845M * 0 0 * * XS:A:+ ZF:f:0.959395 AB_1.599.1 0 1 31870178 255 570M4089N681M * 0 0 * * ZF:f:2.389276 AB_1.600.2 16 1 31977658 255 630M2740N99M98N118M95N77M76N144M * 0 0 * * XS:A:- ZF:f:3.110215

As the entry starting with AB_1.599.1 the XS:A:+ tag is missing! Why does StringTie not report the strand information even though it is given?

stringtie stringtie accepted_hits_sort_pairs.bam -o $basedir/Alignments/StringTie/$name.gtf -p 1 -G Reference.gtf -l AB_1 -C $basedir/Alignments/StringTie/AB_1_cov_refs.gtf

cuffmerge: -o Example_log -g Reference.gtf -p 4 -s Reference.fa GTFlist

Any Ideas?

Thanks

gpertea commented 9 years ago

Ela seems to have identified the latter problem (alignments from TopHat2) and it should be fixed in the upcoming v1.0.4 release. As for HISAT spliced alignments with no strand provided, StringTie will ignore those for now (we cannot rely on users to remember to use a option like "--pen-noncansplice 1000000" when running HISAT..).