gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
365 stars 76 forks source link

One gene outputs multiple TPM values #382

Open HengkuanLi opened 1 year ago

HengkuanLi commented 1 year ago

When quantified with stringtie, multiple lines were output from a gene. By looking at the gtf file, I found that these were different transcripts of the gene. How do I solve this?

ENSSSCG00000035639 TERB1 6 - 27450632 27503893 45.755924 12.578593 26.445555 ENSSSCG00000035639 TERB1 6 - 27511466 27540118 30.928654 13.596822 28.586306 ENSSSCG00000003981 ZFP69B 6 - 170660884 170675879 5.751518 1.388807 2.926265 ENSSSCG00000003981 ZFP69B 6 - 170687418 170701255 35.762066 8.635393 18.195074

Wangchangsh commented 1 year ago

I have the same problem.

LOC_Os10g31460 - Chr10 - 16494293 16495238 0.850951 0.184613 0.368143 LOC_Os10g31460 - Chr10 - 16486322 16492007 14.433779 6.385049 12.732656

zpliu1126 commented 7 months ago

me too

mpertea commented 7 months ago

This happens when your annotation file contains genes with non-overlapping transcripts. We consider this to be an annotation error. In this case we suggest to complete your downstream analysis by consider this two gene locations as distinct: i.e. you could label ENSSSCG00000035639 TERB1 6 - 27450632 27503893 45.755924 12.578593 26.445555 as ENSSSCG00000035639_1 and ENSSSCG00000035639 TERB1 6 - 27511466 27540118 30.928654 13.596822 28.586306 as ENSSSCG00000035639_2. Alternatively (not prefered) you could just remove the location with the smaller TPM.