gpertea / gffcompare

classify, merge, tracking and annotation of GFF files by comparing to a reference annotation GFF
MIT License
199 stars 32 forks source link

how to find novel genes and transcripts in gffcompare? #24

Closed naveenkumarv40 closed 6 years ago

naveenkumarv40 commented 6 years ago

I am also new to RNA-SEQ analysis. I also followed this same protocol for my dataset. And i am having some problem for identifying novel transcripts and genes from the below gffcompare ".stats" file

i was running the following command $ gffcompare –r chrX_data/genes/chrX.gtf –G –o merged stringtie_merged.gtf. and i got the result file as = Summary for dataset: stringtie_merged.gtf Query mRNAs : 253181 in 70635 loci (216728 multi-exon transcripts) (23264 multi-transcript loci, ~3.6 transcripts per locus) Reference mRNAs : 216257 in 60158 loci (189357 multi-exon) Super-loci w/ reference transcripts: 51791 -----------------| Sensitivity | Precision |

Base level: 100.0 | 93.0 | Exon level: 99.9 | 93.6 |

Intron level: 99.4 | 94.0 |

Intron chain level: 99.7 | 87.1 | Transcript level: 99.8 | 85.2 | Locus level: 100.0 | 84.4 |

Matching intron chains: 188838 Matching transcripts: 215729 Matching loci: 60158

Missed exons: 0/623537 ( 0.0%) Novel exons: 23741/674273 ( 3.5%) Missed introns: 2160/383827 ( 0.6%) Novel introns: 4747/405880 ( 1.2%) Missed loci: 0/60158 ( 0.0%) Novel loci: 10239/70635 ( 14.5%)

Total union super-loci across all input datasets: 70632 253181 out of 253181 consensus transcripts written in merged.annotated.gtf (0 discarded as redundant).

from this result where i can find the novel genes and transcripts?

gpertea commented 6 years ago

GffCompare produces multiple output files -- the one you show is just the summary statistics, but you can dig deeper in the other files for the actual transcript IDs of novel genes/transcripts. Please check the documentation available at http://ccb.jhu.edu/software/stringtie/gffcompare.shtml for the other output files and the classification of transcripts according to their relationship (or lack thereof) to the reference annotation.

theheking commented 5 years ago

I get similar statistics from when I run gffcompare. Is having such high percentage for sensitivity normal- or am I using the wrong flags? I am also using -R flag which caused no change in sensitivity values.