gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
365 stars 76 forks source link

Stringtie skips pseudo genes from reference gtf #376

Open m-waqas opened 2 years ago

m-waqas commented 2 years ago

I have mapped data using STAR and know trying to generate assembly using stringtie, the reference annotation gtf file contains 38464 genes and 47387 transcripts. When I tried to assemble just known genes (38464 genes) and transcripts (47387) using the following command:

stringtie -p 8 -e -B -G Genome_annotation/data.gtf -o Path_to_Assembly/GFP2/GFP2.gtf Path_to_Mapped_files/GFP2/GFP2Aligned.sortedByCoord.out.bam

Stringtie assign gene ids to around 23139 genes instead of 38464 genes and 473876 transcripts. Next I check the GENE TYPE of the ids around 15000 genes which were missed by stringtie and gene type of most (14977) of them is PSEUDO. Is it possible to quantify expression of all 38464 genes? or How can I quantify pseudo genes as well?

Any help will be highly appreciated.