gpertea / gffcompare

classify, merge, tracking and annotation of GFF files by comparing to a reference annotation GFF
MIT License
198 stars 32 forks source link

Definition of "Locus Level" #68

Closed gallardo-seq closed 2 years ago

gallardo-seq commented 2 years ago

Hi Geo, thanks for developing this tool. We're currently using Gffcompare for our recent full-length sequencing preprint and it's working like a charm. I do have a lingering question regarding the definition of "Locus Level" as relevant to Sensitivity and Precision calculations. I'm quite clear as to how Base, Exon, Intron, etc, Levels are calculated, but the 'Locus Level' seems a bit elusive to me based on the documentation. My basic understanding is that the Locus Level analysis accounts for how robust the results are if the reference annotation is slightly different, is this the case? Any clarity on this would be much appreciated. Thanks!

gpertea commented 2 years ago

I am afraid it's not that meaningful and you are probably reading too much into it -- instead it's a rather simple metric that fails to capture the complexity of multi-transcript loci (genes or gene clusters). Locus level accuracy, as described in the gffcompare manual (https://ccb.jhu.edu/software/stringtie/gffcompare.shtml#levels) simply counts as a "true positive" any locus where at least 1 (one!) transcript has a "match" with another transcript in the corresponding reference locus.

Since some of these loci can sometimes span multiple adjacent genes (due to various reasons, including transcriptional "noise", spurious alignments etc.) this metric is rather low resolution and frankly I think it was more of an after-thought to throw it in there, not sure if it is actually useful for anyone..

gallardo-seq commented 2 years ago

Thanks Geo for your quick and thorough answer. I guess I was reading too much into it, but your explanation makes sense. Will use this to re calibrate my assessment of some of our results. Thanks again!