After running the merge step,
/stringtie-1.3.3b/stringtie -eB -p 8 -G stringtie_merged.gtf -A abundance/S${i}.gene_abundance.tab -o ballgown/S${i}/S${i}.genome.gtf ${i}.mapped.sorted.bam
I get lots of transcripts (sometimes up to 45) for a particular gene. And looking into the coordinates of those transcripts, I could see that the genomic coordinates (start and end) and the strand are the same but with different transcript ids. examples below:
I understand the transcripts are based on the exons but even with just one base pair difference, stringtie generates different exons and transcripts - is this normal? Does it affect counts for differential expression analysis?
Hi,
After running the merge step, /stringtie-1.3.3b/stringtie -eB -p 8 -G stringtie_merged.gtf -A abundance/S${i}.gene_abundance.tab -o ballgown/S${i}/S${i}.genome.gtf ${i}.mapped.sorted.bam
I get lots of transcripts (sometimes up to 45) for a particular gene. And looking into the coordinates of those transcripts, I could see that the genomic coordinates (start and end) and the strand are the same but with different transcript ids. examples below:
Q473725.1 StringTie transcript 3605 11969 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.1"; KQ473725.1 StringTie transcript 3646 11969 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.2"; KQ473725.1 StringTie transcript 3646 11969 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.3"; KQ473725.1 StringTie transcript 3646 11969 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.4"; KQ473725.1 StringTie transcript 18376 22904 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.28"; KQ473725.1 StringTie transcript 18376 22971 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.29"; KQ473725.1 StringTie transcript 18376 22904 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.30"; KQ473725.1 StringTie transcript 18376 22903 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.31"; KQ473725.1 StringTie transcript 18376 22904 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.32"; KQ473725.1 StringTie transcript 18376 22902 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.33"; KQ473725.1 StringTie transcript 18376 22903 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.34"; KQ473725.1 StringTie transcript 18376 22903 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.35"; KQ473725.1 StringTie transcript 18376 22902 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.36";
KQ473725.1 StringTie transcript 18376 22903 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.31"; KQ473725.1 StringTie exon 18376 18592 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.31"; exon_number "1"; KQ473725.1 StringTie exon 18666 20488 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.31"; exon_number "2"; KQ473725.1 StringTie exon 20554 21076 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.31"; exon_number "3"; KQ473725.1 StringTie exon 21157 21227 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.31"; exon_number "4"; KQ473725.1 StringTie exon 21307 21559 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.31"; exon_number "5"; KQ473725.1 StringTie exon 21632 22903 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.31"; exon_number "6"; KQ473725.1 StringTie transcript 18376 22904 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.32"; KQ473725.1 StringTie exon 18376 18596 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.32"; exon_number "1"; KQ473725.1 StringTie exon 18666 20488 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.32"; exon_number "2"; KQ473725.1 StringTie exon 20554 21076 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.32"; exon_number "3"; KQ473725.1 StringTie exon 21157 21227 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.32"; exon_number "4"; KQ473725.1 StringTie exon 21307 21559 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.32"; exon_number "5"; KQ473725.1 StringTie exon 21632 22904 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.32"; exon_number "6"; KQ473725.1 StringTie transcript 18376 22902 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.33"; KQ473725.1 StringTie exon 18376 19333 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.33"; exon_number "1"; KQ473725.1 StringTie exon 19440 20488 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.33"; exon_number "2"; KQ473725.1 StringTie exon 20554 21076 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.33"; exon_number "3"; KQ473725.1 StringTie exon 21157 21227 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.33"; exon_number "4"; KQ473725.1 StringTie exon 21307 21559 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.33"; exon_number "5"; KQ473725.1 StringTie exon 21632 22902 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.33"; exon_number "6"; KQ473725.1 StringTie transcript 18376 22903 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.34"; KQ473725.1 StringTie exon 18376 18595 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.34"; exon_number "1"; KQ473725.1 StringTie exon 18665 19684 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.34"; exon_number "2"; KQ473725.1 StringTie exon 19760 20488 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.34"; exon_number "3"; KQ473725.1 StringTie exon 20554 21076 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.34"; exon_number "4"; KQ473725.1 StringTie exon 21157 21227 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.34"; exon_number "5"; KQ473725.1 StringTie exon 21307 22903 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.34"; exon_number "6"; KQ473725.1 StringTie transcript 18376 22903 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.35"; KQ473725.1 StringTie exon 18376 19683 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.35"; exon_number "1"; KQ473725.1 StringTie exon 19750 20488 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.35"; exon_number "2"; KQ473725.1 StringTie exon 20554 21076 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.35"; exon_number "3"; KQ473725.1 StringTie exon 21157 21227 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.35"; exon_number "4"; KQ473725.1 StringTie exon 21307 21559 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.35"; exon_number "5"; KQ473725.1 StringTie exon 21632 22903 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.35"; exon_number "6"; KQ473725.1 StringTie transcript 18376 22902 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.36"; KQ473725.1 StringTie exon 18376 19333 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.36"; exon_number "1"; KQ473725.1 StringTie exon 19469 20488 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.36"; exon_number "2"; KQ473725.1 StringTie exon 20554 21076 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.36"; exon_number "3"; KQ473725.1 StringTie exon 21157 21227 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.36"; exon_number "4"; KQ473725.1 StringTie exon 21307 22902 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.36"; exon_number "5";
KQ473730.1 StringTie transcript 1 8803 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.3"; KQ473730.1 StringTie transcript 1 8803 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.4"; KQ473730.1 StringTie transcript 1 8803 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.5"; KQ473730.1 StringTie transcript 1 8803 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.6"; KQ473730.1 StringTie transcript 1 8803 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.7"; KQ473730.1 StringTie transcript 1 8803 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.8"; KQ473730.1 StringTie transcript 1 8803 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.9"; KQ473730.1 StringTie transcript 2163 6649 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.10"; KQ473730.1 StringTie transcript 2174 8803 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.11"; KQ473730.1 StringTie transcript 2174 8803 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.12"; KQ473730.1 StringTie transcript 2174 8803 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.13";
KQ473731.1 StringTie transcript 19466 31948 1000 - . gene_id "MSTRG.31"; transcript_id "MSTRG.31.28"; KQ473731.1 StringTie transcript 19466 35864 1000 - . gene_id "MSTRG.31"; transcript_id "MSTRG.31.29"; KQ473731.1 StringTie transcript 19466 31948 1000 - . gene_id "MSTRG.31"; transcript_id "MSTRG.31.30"; KQ473731.1 StringTie transcript 19466 31948 1000 - . gene_id "MSTRG.31"; transcript_id "MSTRG.31.31";
KQ473732.1 StringTie transcript 25810 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.1"; KQ473732.1 StringTie transcript 25810 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.2"; KQ473732.1 StringTie transcript 25810 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.3"; KQ473732.1 StringTie transcript 25810 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.4"; KQ473732.1 StringTie transcript 25810 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.5"; KQ473732.1 StringTie transcript 25810 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.6"; KQ473732.1 StringTie transcript 25811 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.7"; KQ473732.1 StringTie transcript 25811 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.8"; KQ473732.1 StringTie transcript 25816 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.9"; KQ473732.1 StringTie transcript 28135 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.10"; KQ473732.1 StringTie transcript 32401 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.11"; KQ473732.1 StringTie transcript 32401 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.12"; KQ473732.1 StringTie transcript 32401 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.13";
I understand the transcripts are based on the exons but even with just one base pair difference, stringtie generates different exons and transcripts - is this normal? Does it affect counts for differential expression analysis?
Any suggestions please? Thanks