gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
365 stars 76 forks source link

Stringtie skips some reference gene ids #375

Closed m-waqas closed 2 years ago

m-waqas commented 2 years ago

I have mapped data using STAR and know trying to generate assembly using stringtie, the reference annotation gtf file contains 38464 genes and 47387 transcripts. When I tried to assemble just known genes (38464 genes) and transcripts (47387) using the following command:

stringtie -p 8 -e -B -G Genome_annotation/data.gtf -o Path_to_Assembly/GFP2/GFP2.gtf Path_to_Mapped_files/GFP2/GFP2Aligned.sortedByCoord.out.bam

Stringtie assign gene ids to around 23139 genes instead of 38464 genes and 473876 transcripts. Next I check the GENE TYPE of the ids around 15000 genes which were missed by stringtie and gene type of most (14977) of them is PSEUDO. Is it possible to quantify expression of all 38464 genes? Means how can I include pseudo genes in stringtie

Any help will be highly appreciated.