photocyte / PPYR_OGS

Official Geneset (OGS) for Photinus pyralis, at least until I figure out a long term alternative
MIT License
0 stars 2 forks source link

Scoring system for gene model quality. #6

Open photocyte opened 6 years ago

photocyte commented 6 years ago

Description in the title. An ideal gene-model scoring system would be something simple and easily understandable (akin to the 1-5 scale in the Uniprot "Annotation Score")? Once a scoring system is produced, can be a good method to assess those gene models that need following up for manual fixing.

Could also annotate if a gene model has been manually reviewer or not.

Some things we can assess (entries represent True/False characteristics for a gene model):

1) 3' UTR present 1a) 3' UTR correct (assessable by presence/alignment of Poly-A tail in PASA/Gmap/blat alignments of de novo transcriptome assembled transcripts) 2) 5' UTR present 2a) 5' UTR correct (hard to assess - IsoSeq w/ TeloPrime kit produced cDNA best) 3) Correct CDS C-terminus (maybe best manually assessed, but if not, trust de novo transcript DCGM the most) 4) Correct CDS N-terminus (In terms of real emperical data, bottom-up proteomics can find this, but otherwise, assess Orthogroup characteristics, e.g. like OMgene?) 5) Correct number of exons (assessable from Orthogroup characteristics) 6) Manually reviewed (True/False)