intermine / pombemine

0 stars 1 forks source link

Don't load "comments " from UniProt #55

Open ValWood opened 2 years ago

ValWood commented 2 years ago

The comments are the free text annotation from UniProt. So they are really just "unstructured" annotations.

Screenshot 2022-05-23 at 21 18 43

However these annotations are not (currently) attached to entities, so you can't really do anything with this data.

Although the comments are "typed", they are still problematic because as free text they are not standardized, and they do not include the associated metadata and prrovenence (sometimes the source is within the comment, but not often)

(176) Phenotypes -> covered by FYPO (PomBase: 60,000)
(32750 Functions -& 500 pathway> covered by GO MF & BP (PomBase:25,000) (1780) Subunit -> covered by GO complex annotations(Pombase:5000) (464) Subcellular location - Covered by PomBase GO component annotations(Pombase:10,000) (220) PTM -> covered by PRO annotation (PomBase:602210 PTMs (not yet imported-https://github.com/intermine/pombemine/issues/12

These annotation aren't really "mineable" because the text is a mixture of concepts i.e. phenotypes/ penetrance The phenotypes are not associated with specific alleles (even if the connection was imported it is only to the protein, not the allele) We have standardised assignments to PRO ontology terms and captured modified residues and modifying entities in a standardized way etc

On balance I don't think. it is very useful to include these annotation in a data mining tool because the inconsistencies. and incmpleteness could make them more misleading than useful.

ACTION: don't import data in "comments" at least for the time being

discuss with @manulera @kimrutherford for another opinion,

danielabutano commented 2 years ago

@ValWood the entities Comment are linked to UniProtEntry. Let me know if you want to remove it, it will be a short fix from intermine side image

ValWood commented 2 years ago

Yes please. It is unlikely to add anything that is not present in structured annotation, it's really just noise.

kimrutherford commented 2 years ago

discuss with @manulera @kimrutherford for another opinion,

I think comments shouldn't be loaded. In general I think PombeMine should only contain fields that might be queried.

danielabutano commented 2 years ago

Ok, I will remove Comment entities

ValWood commented 2 years ago

Related ....when you do this, please suppress "component" at the same time https://github.com/intermine/pombemine/issues/16#issuecomment-1147405508

ValWood commented 2 years ago

Comment has disappeared from the Model browser. "Component" is still present. It would be good if this goes too as it is a very heterogeneous and incomplete set of things that are more fully annotated with GO (and other data types). Not critical but it is misleading so if it is easy not to load this it would be useful.