gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
365 stars 76 forks source link

Multiple transcripts with the same coordinates #354

Open suyeonwy opened 2 years ago

suyeonwy commented 2 years ago

Hi,

I'm trying to run StringTie for transcriptome assembly using BAM files generated by 'STAR' The command line that I used is followed

stringtie sample1.sorted.bam -f 0.1 -c 2.5 -p 15 -G ref.chr.gtf -o sample1.gtf &> sample1.log

And I found that there are some transcripts with the same coordinates and slightly different combinations of exons. Here are some examples that I found

9   StringTie   transcript  51918953    51931337    1000    +   .   gene_id "STRG.17234"; transcript_id "STRG.17234.1"; cov "9.697406"; FPKM "1.410810"; TPM "3.128901";
9   StringTie   exon    51918953    51919060    1000    +   .   gene_id "STRG.17234"; transcript_id "STRG.17234.1"; exon_number "1"; cov "6.178105";
9   StringTie   exon    51927055    51927125    1000    +   .   gene_id "STRG.17234"; transcript_id "STRG.17234.1"; exon_number "2"; cov "13.987437";
9   StringTie   exon    51928061    51928135    1000    +   .   gene_id "STRG.17234"; transcript_id "STRG.17234.1"; exon_number "3"; cov "15.125772";
9   StringTie   exon    51931154    51931337    1000    +   .   gene_id "STRG.17234"; transcript_id "STRG.17234.1"; exon_number "4"; cov "7.895041";
9   StringTie   transcript  51918953    51931337    1000    +   .   gene_id "STRG.17234"; transcript_id "STRG.17234.2"; cov "8.106643"; FPKM "1.179381"; TPM "2.615635";
9   StringTie   exon    51918953    51919060    1000    +   .   gene_id "STRG.17234"; transcript_id "STRG.17234.2"; exon_number "1"; cov "5.100532";
9   StringTie   exon    51927055    51927125    1000    +   .   gene_id "STRG.17234"; transcript_id "STRG.17234.2"; exon_number "2"; cov "11.547774";
9   StringTie   exon    51928061    51928135    1000    +   .   gene_id "STRG.17234"; transcript_id "STRG.17234.2"; exon_number "3"; cov "12.487562";
9   StringTie   exon    51931136    51931337    1000    +   .   gene_id "STRG.17234"; transcript_id "STRG.17234.2"; exon_number "4"; cov "6.877785";

I have some questions about these transcripts.

  1. Could you explain why StringTie does not merge these transcripts and report them as a single transcript?
  2. I found that there are some exons have the same coordinates but have different coverage values for each transcript (Ex: STRG.17234.1's exon number 1 and STRG.17234.2's exon number 1). Why do these exons have different coverage values?
  3. Is there any method for handling these transcripts for the following analysis using this GTF file?

Thank you!