Retrofit term definitions using CP terms to use GO CC + PATO

obophenotype / cell-ontology

An ontology of cell types

https://obophenotype.github.io/cell-ontology/

Creative Commons Attribution 4.0 International

145 stars 49 forks source link

Retrofit term definitions using CP terms to use GO CC + PATO #857

Open dosumis opened 3 years ago

dosumis commented 3 years ago

See https://github.com/obophenotype/cell-ontology/issues/572#issuecomment-762830332

This is dependent on all relevant terms being added to PATO.

cmungall commented 3 years ago

Agreed, but I have a simpler proposal:

these CP terms are used in axioms that are either rococco axioms that serve no purpose, or in overstated logical definitions that do not match the text def:

These clearly violate S11 in the SRS guidelines https://douroucouli.wordpress.com/2019/07/08/ontotip-write-simple-concise-clear-operational-textual-definitions/

We should simply turn these equivalence axioms into subclass axioms, and toss out any that are useless - e.g. cell phenotypes. These can be turned into textual comments.

Note: whenever we do address this problem, note that we have pseudo-CPs with UUIDs, see linked ticket

addiehl commented 3 years ago

The granulocyte hierarchy is problematic in a number of ways, and needs some attention. The nuclei shapes are important markers of the differentiation status of the various granulocytes, and combined with staining, are part of the histological definition of general types (neutrophil, eosinophil, basophil) and their differentiation stage (for at least human granulocytes, with some similarities in mouse as well, PMID:25926395 and even zebrafish, PMID:23463724), which is why the shapes are part of the definitions.

Fixing the granulocyte hierarchy will take a chunk of curation time. Certainly, better definitions and handling of markers and capabilities is important and human subclasses are needed that include reference to defining markers used in scRNA seq and modern flow cytometry/CyTOF.

shawntanzk commented 7 months ago

Hihi, opening this up again as CP terms are coming back to haunt me lol. Is there any movement on how to handle this? Is the plan still to remove all ex CP terms and retrofit all the stuff that uses them with CC+PATO or we decided to let CP terms sneaky be in CL lol. Thanks :)

addiehl commented 7 months ago

When I look at CL-edit.owl in Protege, it looks like all the CP: terms are obsoleted, so I am confused by your comment.

dosumis commented 7 months ago

We changed them to the CL namespace. In a minority of cases so far we have switched to using the nested pattern with PATO. We could work to complete this, however, I now think this is not ideal - as these axioms are invisible to knowledge graphs & to UberGraph. We are working on a KG linked to annotation of a very large and growing corpus of single cell transcriptomes (>10^8) annotated with CL. There are interesting possibilities for using this to look for transcriptomic correlates of cell properties (e.g presence of a lobed nucleus - one of the original recognised markers of basophils). We can't to this if these properties are invisible to the KGs.

@shawntanzk is the problem that your pipelines assume that everything with a CL ID must be a cell? Can't you use the graph to work that out?

shawntanzk commented 7 months ago

@shawntanzk is the problem that your pipelines assume that everything with a CL ID must be a cell? Can't you use the graph to work that out?

yeah basically this, its super easily fixed on our end where we can just take children of cell, and I can't reveal too much yet (not cause its anything REALLY secretive but we dont want to figure out with legal people what we can and cannot say atm esp since we will release all this stuff in a preprint soon) but basically we are deciding how to handle this in our in-house models which assumes everything is a cell or something. We can just do what CL does and append to the cellular component in GO branch, but also, its not a simple cellular component cause it involves phenotype too (though I get that there's no actual issue with just doing that) ANYWAY, mainly wanted to know how CL was dealing with it so I can plan accordingly and "future proof" my stuff :D

dosumis commented 7 months ago

I think always safer to rely on semantics over namespace - or at the very least to combine them. Isn't that a major reason for most of what we do?