gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
365 stars 76 forks source link

Duplication of an exon is consider as an insertion #373

Open Nedss opened 2 years ago

Nedss commented 2 years ago

Hello !

I have a question about an interpretation done by stringtie, using Nanopore datas. I generated my bam files with minimap2 and used them as input of stringtie without a guided file with the followed command :

stringtie \
  -L \
  -o <my output file>
  <my bam file>

On one sample we should observe an exon duplication. On IGV we see this duplication : Capture d’écran 2022-07-29 à 17 04 17

And with the following coordinated : Capture d’écran 2022-07-29 à 17 04 42

However using stringtie, duplication is consider as an insertion and merged with the exon annotated as 7 :

17      StringTie       exon    41196305        41197819        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "1"; cov "57.179531";
17      StringTie       exon    41199660        41199720        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "2"; cov "67.615112";
17      StringTie       exon    41201138        41201211        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "3"; cov "68.242302";
17      StringTie       exon    41203080        41203134        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "4"; cov "67.747482";
17      StringTie       exon    41209069        41209152        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "5"; cov "69.329041";
17      StringTie       exon    41215350        41215390        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "6"; cov "68.615166";
17      StringTie       exon    41215891        41215968        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "7"; cov "68.967773";
17      StringTie       exon    41219625        41219712        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "8"; cov "67.044159";
17      StringTie       exon    41222945        41223255        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "9"; cov "77.162979";
17      StringTie       exon    41226348        41226538        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "10"; cov "74.733871";
17      StringTie       exon    41228505        41228631        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "11"; cov "71.012573";
17      StringTie       exon    41234421        41234592        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "12"; cov "75.215317";
17      StringTie       exon    41242961        41243049        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "13"; cov "74.153717";
17      StringTie       exon    41246761        41246877        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "14"; cov "46.779739";
17      StringTie       exon    41251697        41251897        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "15"; cov "82.633339";
17      StringTie       exon    41256139        41256278        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "16"; cov "47.550884";
17      StringTie       exon    41256885        41256973        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "17"; cov "47.054935";
17      StringTie       exon    41258473        41258550        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "18"; cov "44.780148";
17      StringTie       exon    41267743        41267796        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "19"; cov "45.616928";
17      StringTie       exon    41276034        41276132        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "20"; cov "44.381065";
17      StringTie       exon    41277288        41277540        1000    -       .       gene_id "STRG.9542"; transcript_id "STRG.9542.13"; exon_number "21"; cov "16.723976";

Is it a normal behavior ? Because I would expect as transcript annotation a duplication of coordinates around exon 7.

Thanks for your help !