gpertea / gffcompare

classify, merge, tracking and annotation of GFF files by comparing to a reference annotation GFF
MIT License
198 stars 32 forks source link

Exon-level sensitivity is not 100%, but there are zero missed exons. #61

Open junghoon-shin opened 3 years ago

junghoon-shin commented 3 years ago

Here is my .stats file.

# gffcompare v0.11.7 | Command line was:
#gffcompare-0.11.7.Linux_x86_64/gffcompare -r gencode.v19.annotation.gtf -o gffcompare_stringtie_merged stringtie_merged.gtf
#

#= Summary for dataset: stringtie_merged.gtf
#     Query mRNAs :  612833 in   89388 loci  (566229 multi-exon transcripts)
#            (29760 multi-transcript loci, ~6.9 transcripts per locus)
# Reference mRNAs :  194187 in   54800 loci  (169307 multi-exon)
# Super-loci w/ reference transcripts:    46595
#-----------------| Sensitivity | Precision  |
        Base level:   100.0     |    27.6    | 
        Exon level:    84.2     |    46.6    |  
      Intron level:    99.8     |    51.1    |
Intron chain level:    99.9     |    29.9    |
  Transcript level:    98.2     |    31.1    | 
       Locus level:    95.6     |    50.9    |  

     Matching intron chains:  169199
       Matching transcripts:  190667
              Matching loci:   52414

          Missed exons:       0/559962  (  0.0%)
           Novel exons:  275572/1115730 ( 24.7%)
        Missed introns:     639/343915  (  0.2%)
         Novel introns:  137224/671588  ( 20.4%)
           Missed loci:       0/54800   (  0.0%)
            Novel loci:   42793/89388   ( 47.9%)

 Total union super-loci across all input datasets: 89388
612833 out of 612833 consensus transcripts written in gffcompare_stringtie_merged.annotated.gtf (0 discarded as redundant)

I see several inconsistencies here.

  1. The number of missed exons is zero, but the exon-level sensitivity is only 84.2%, not 100%.
  2. Among the 1115730 query exons, only 24.7% are novel, but the exon-level precision is 46.6%, not 75.3% (100% - 24.7%).
  3. Among the 671588 query introns, only 20.4% are novel, but the intron-level precision is 51.1%, not 79.6% (100% - 20.4%).
  4. The number of missed loci is zero, but the locus-level sensitivity is 95.6%, not 100%.
  5. Among the 89388 query loci, only 47.9% were novel, but the locus-level precision is 50.9%, not 52.1% (100% - 47.9%).

Am I misunderstanding something here?