gpertea / gffcompare

classify, merge, tracking and annotation of GFF files by comparing to a reference annotation GFF
MIT License
205 stars 32 forks source link

Should I filter the transcript with class code "s"? #89

Open JD12138 opened 5 months ago

JD12138 commented 5 months ago

Hi, I aimed to detect novel transcript with HIFI reads using stringtie2(guided mode). Then I compare the result to gencode(v44) annotation. But almost half of the novel transcripts are marked as "s". Should I discard these kind of transcripts? And which class codes could be considered as novel transcripts expect class code "u"? And I aligned the reads to GRCh38 with minimap2(newest version, recommend parameter). Thanks!

santataRU commented 1 month ago

Is there a reason to discard novel transcripts? In my case, I need to build cell-type-specific transcriptome GTF annotations for alternative splicing analysis, so I would not discard any novel transcripts. From my understanding, there isn't a single code (e.g., "s" or "u") that exclusively indicates novel transcripts—code "j" could also represent novel transcripts.

Interpret the Results: In the .tmap file, each transcript is assigned a class code to indicate its relationship to the reference. Some of the class codes you might encounter include:

=: Exact match with a reference transcript. j: Novel isoform with at least one splice junction shared with a reference transcript. u: Intergenic transcript (no overlap with any reference transcript). x: Exonic overlap with reference transcript on the opposite strand. i: Fully contained within a reference intron (potential novel transcript).