gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
383 stars 78 forks source link

different transcript ids but with same genomic coordinates #129

Open bioinfo17 opened 7 years ago

bioinfo17 commented 7 years ago

Hi,

After running the merge step, /stringtie-1.3.3b/stringtie -eB -p 8 -G stringtie_merged.gtf -A abundance/S${i}.gene_abundance.tab -o ballgown/S${i}/S${i}.genome.gtf ${i}.mapped.sorted.bam

I get lots of transcripts (sometimes up to 45) for a particular gene. And looking into the coordinates of those transcripts, I could see that the genomic coordinates (start and end) and the strand are the same but with different transcript ids. examples below:

Q473725.1 StringTie transcript 3605 11969 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.1"; KQ473725.1 StringTie transcript 3646 11969 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.2"; KQ473725.1 StringTie transcript 3646 11969 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.3"; KQ473725.1 StringTie transcript 3646 11969 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.4"; KQ473725.1 StringTie transcript 18376 22904 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.28"; KQ473725.1 StringTie transcript 18376 22971 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.29"; KQ473725.1 StringTie transcript 18376 22904 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.30"; KQ473725.1 StringTie transcript 18376 22903 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.31"; KQ473725.1 StringTie transcript 18376 22904 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.32"; KQ473725.1 StringTie transcript 18376 22902 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.33"; KQ473725.1 StringTie transcript 18376 22903 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.34"; KQ473725.1 StringTie transcript 18376 22903 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.35"; KQ473725.1 StringTie transcript 18376 22902 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.36";

KQ473725.1 StringTie transcript 18376 22903 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.31"; KQ473725.1 StringTie exon 18376 18592 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.31"; exon_number "1"; KQ473725.1 StringTie exon 18666 20488 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.31"; exon_number "2"; KQ473725.1 StringTie exon 20554 21076 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.31"; exon_number "3"; KQ473725.1 StringTie exon 21157 21227 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.31"; exon_number "4"; KQ473725.1 StringTie exon 21307 21559 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.31"; exon_number "5"; KQ473725.1 StringTie exon 21632 22903 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.31"; exon_number "6"; KQ473725.1 StringTie transcript 18376 22904 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.32"; KQ473725.1 StringTie exon 18376 18596 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.32"; exon_number "1"; KQ473725.1 StringTie exon 18666 20488 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.32"; exon_number "2"; KQ473725.1 StringTie exon 20554 21076 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.32"; exon_number "3"; KQ473725.1 StringTie exon 21157 21227 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.32"; exon_number "4"; KQ473725.1 StringTie exon 21307 21559 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.32"; exon_number "5"; KQ473725.1 StringTie exon 21632 22904 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.32"; exon_number "6"; KQ473725.1 StringTie transcript 18376 22902 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.33"; KQ473725.1 StringTie exon 18376 19333 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.33"; exon_number "1"; KQ473725.1 StringTie exon 19440 20488 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.33"; exon_number "2"; KQ473725.1 StringTie exon 20554 21076 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.33"; exon_number "3"; KQ473725.1 StringTie exon 21157 21227 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.33"; exon_number "4"; KQ473725.1 StringTie exon 21307 21559 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.33"; exon_number "5"; KQ473725.1 StringTie exon 21632 22902 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.33"; exon_number "6"; KQ473725.1 StringTie transcript 18376 22903 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.34"; KQ473725.1 StringTie exon 18376 18595 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.34"; exon_number "1"; KQ473725.1 StringTie exon 18665 19684 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.34"; exon_number "2"; KQ473725.1 StringTie exon 19760 20488 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.34"; exon_number "3"; KQ473725.1 StringTie exon 20554 21076 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.34"; exon_number "4"; KQ473725.1 StringTie exon 21157 21227 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.34"; exon_number "5"; KQ473725.1 StringTie exon 21307 22903 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.34"; exon_number "6"; KQ473725.1 StringTie transcript 18376 22903 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.35"; KQ473725.1 StringTie exon 18376 19683 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.35"; exon_number "1"; KQ473725.1 StringTie exon 19750 20488 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.35"; exon_number "2"; KQ473725.1 StringTie exon 20554 21076 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.35"; exon_number "3"; KQ473725.1 StringTie exon 21157 21227 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.35"; exon_number "4"; KQ473725.1 StringTie exon 21307 21559 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.35"; exon_number "5"; KQ473725.1 StringTie exon 21632 22903 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.35"; exon_number "6"; KQ473725.1 StringTie transcript 18376 22902 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.36"; KQ473725.1 StringTie exon 18376 19333 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.36"; exon_number "1"; KQ473725.1 StringTie exon 19469 20488 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.36"; exon_number "2"; KQ473725.1 StringTie exon 20554 21076 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.36"; exon_number "3"; KQ473725.1 StringTie exon 21157 21227 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.36"; exon_number "4"; KQ473725.1 StringTie exon 21307 22902 1000 + . gene_id "MSTRG.5"; transcript_id "MSTRG.5.36"; exon_number "5";

KQ473730.1 StringTie transcript 1 8803 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.3"; KQ473730.1 StringTie transcript 1 8803 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.4"; KQ473730.1 StringTie transcript 1 8803 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.5"; KQ473730.1 StringTie transcript 1 8803 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.6"; KQ473730.1 StringTie transcript 1 8803 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.7"; KQ473730.1 StringTie transcript 1 8803 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.8"; KQ473730.1 StringTie transcript 1 8803 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.9"; KQ473730.1 StringTie transcript 2163 6649 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.10"; KQ473730.1 StringTie transcript 2174 8803 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.11"; KQ473730.1 StringTie transcript 2174 8803 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.12"; KQ473730.1 StringTie transcript 2174 8803 1000 - . gene_id "MSTRG.25"; transcript_id "MSTRG.25.13";

KQ473731.1 StringTie transcript 19466 31948 1000 - . gene_id "MSTRG.31"; transcript_id "MSTRG.31.28"; KQ473731.1 StringTie transcript 19466 35864 1000 - . gene_id "MSTRG.31"; transcript_id "MSTRG.31.29"; KQ473731.1 StringTie transcript 19466 31948 1000 - . gene_id "MSTRG.31"; transcript_id "MSTRG.31.30"; KQ473731.1 StringTie transcript 19466 31948 1000 - . gene_id "MSTRG.31"; transcript_id "MSTRG.31.31";

KQ473732.1 StringTie transcript 25810 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.1"; KQ473732.1 StringTie transcript 25810 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.2"; KQ473732.1 StringTie transcript 25810 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.3"; KQ473732.1 StringTie transcript 25810 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.4"; KQ473732.1 StringTie transcript 25810 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.5"; KQ473732.1 StringTie transcript 25810 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.6"; KQ473732.1 StringTie transcript 25811 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.7"; KQ473732.1 StringTie transcript 25811 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.8"; KQ473732.1 StringTie transcript 25816 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.9"; KQ473732.1 StringTie transcript 28135 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.10"; KQ473732.1 StringTie transcript 32401 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.11"; KQ473732.1 StringTie transcript 32401 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.12"; KQ473732.1 StringTie transcript 32401 34688 1000 + . gene_id "MSTRG.50"; transcript_id "MSTRG.50.13";

I understand the transcripts are based on the exons but even with just one base pair difference, stringtie generates different exons and transcripts - is this normal? Does it affect counts for differential expression analysis?

Any suggestions please? Thanks

PEHGP commented 6 years ago

I have same problem!