pombase / curation

PomBase curation
7 stars 0 forks source link

/note= #33

Closed pombase-admin closed 6 years ago

pombase-admin commented 13 years ago

Need to follow up on handling of notes

On 09/02/2011 19:12, Kim Rutherford wrote: > On Wednesday 9 February 2011 at 18:20:36, Val Wood wrote: > >> There are ~9575 notes on various features. Many non-CDS features have a >> single note in free text (because controlled_curation is only allowed in >> GeneDB for CDS) > The Sanger loader stores /notes in the featureprop table with a > property type of "comment" (as far as I can tell). There are only 2539 > comment properties in the database, so I have a bit of investigation to > do. > > The breakdown of comments by type is: > > count | name > -------+--------------- > 174 | tRNA > 13 | rRNA > 391 | ncRNA > 361 | repeat_region > 6 | snoRNA > 372 | polypeptide > 1222 | region > > The Sanger loader has cleverly not bothered to put any on the gene > features. > > >> These are mainly "a bit controlled" and can be refined quite easily >> into some sort of vocabulary. >> You mentioned that 366 notes are on CDS...I would like to get rid of >> these if possible so if you can send me a list at some point (no hurry >> as I won't have time for a few weeks) > Do you need a list of the notes, or a list of the CDSs that have notes? > > Kim.

Val There are a bunch of notes on 5'UTR 3'UTR LTR etc

Original comment by: ValWood

pombase-admin commented 9 years ago

number of notes down to 795

Original comment by: ValWood

ValWood commented 8 years ago

Down to 784 .....so it isn't going up....

ValWood commented 8 years ago

Most of the notes are on non-CDS features, and some of these might actually be displayed/used, because they also act as the "description" of the feature.

The CDS ones are definitely ignored.

ValWood commented 8 years ago

772. I can see a lot which can be deleted, as they don't add anything, many are covered by the SO annotation. Will do this before final chado load.

The remaining useful ones can migrate to controlled curation (warning or misc)

@Antonialock are these any use since you updated the tRNA annotations? If not we can delete all of these: e.g. /note="tRNA Ser anticodon CGA, Cove score 77.58"

/note="NuMT, 94% identity to mitochondrial chromosome type can go, covered by SO ID

anything which mentions a frameshift should be in 'frameshift' format warnings e.g. /note="PMID:16823372 predict a frameshift at base 1684,

/note="warning, gene name change" should move to standard format warning (in controlled curation)

can be deleted, is now annotated: chromosome1.contig:FT /note="candidate ortholog S. cerevisiae YJL062W-A"

nice Friday afternoon tasks. Will can work through these and then I'll add more.

ValWood commented 8 years ago

moving to low priority because of the chado final load requirement. I could just wipe them but theres some useful legacy information in this list that needs to be recorded.

ValWood commented 8 years ago

...708 migrated lots to warning, synonyms, name descriotion, dbxref I thought I had done more than that....

ValWood commented 8 years ago

...606, done for the day

Antonialock commented 8 years ago

yeah that trna one can just go, I don't recognize it.

ValWood commented 8 years ago

Yep, I have been deleting these, they are really old (very first pass annotation!). The codon is covered by the GO annotation, and the score is pretty meaningless. The ones I have kept are where the reported codon differs from the annotated codon. At the end I'll open a ticket for these to be checked.

ValWood commented 8 years ago

Down to 514. That's enough of that for today....

ValWood commented 8 years ago

Down to 459

need to do chromosome 2 and then I think that is ALL of the CDS notes. After that, most are required for feature descriptions but need to be part of some standard vocabularies...should mainly map to SO etc. At this point I'll make a new ticket...

ValWood commented 7 years ago

down to 425......