Open cmungall opened 4 years ago
Remember, classes do not have to have logical defs /N+S conditions. It is perfectly acceptable to have only N conditions if the true definition is too hard to specify in OWL.
Amen
It's worth noting that most of this branch is still manually classified, so there isn't (currently) an expectation that most logical definitions will support auto-classification of the branch. This still leaves the problem of very large lists of inherited marker assertions. I've always worried that these are very hard to check.
- if the text definition is narrower than the logical definition, then either:
- convert the logical def (N+S) to subclassOf [quick and always correct]
- if time and priorities permit, add clauses to the logical def until it matches [slower]. This is more important the higher up the hierarchy you go.
I like this. It will be interesting to look at logical diffs while this is happening. Given the specific genus typically used, I expect loss of useful inference to be rare, but still worth tracking.
- if the text definition is broader then then logical definition
- consider splitting into logical def + GCIs [prioritize for non-leaves]
Can you provide more detail. Is the idea to use GCIs as a way to record species specific markers?
Expressions like lacks_plasma_membrane_part some 'CD19 molecule'
will produce a classification opposite to the intent. If these are meant to be expanded according to the annotation on lacks_plasma_membrane_part
, they should be in annotation properties, or pun the filler classes as individuals, or be stored in a template source rather than OWL.
if the text definition is broader then then logical definition, consider splitting into logical def + GCIs [prioritize for non-leaves] Can you provide more detail. Is the idea to use GCIs as a way to record species specific markers?
Not necessarily for this case. But apologies, I should have said hidden GCIs
Example:
textdef: "An X cell is a Foo cell that has function F. X cells express M1 and M2" Original OWL def: X = Foo and has-function F and has-part M1 and has-part M2 Proposed OWL axioms: Class: X Equiv: Foo and has-function some F SubClassOf: has-part some M1, has-part some M2
Hidden GCIs are powerful so use advisedly
Speculation: It may be that this pattern is used less frequently in the future if marker type definitions typically moved down to the species level. But there are definitely cases now where this stratification would improve things
Agree on the lacks relations. Things are not quite as bad as they look though since Alex and James worked on converting to real negation and using this to find inconsistencies.
what was the strategy for finding inconsistencies? It's not uncommon to have cryptic unwanted inferences without raising an inconsistency especially if ontology is under-axiomtized e.g with disjoints, as is the case for CL https://douroucouli.wordpress.com/2018/08/03/debugging-ontologies-using-owl-reasoning-part-1-basics-and-disjoint-classes-axioms/
If all negated statements are restricted to the same level in PRO AND we are working at the species generic level then this should minimize issues. But the moment we bring in isoforms or species specific subclasses we get cryptic incorrect inferences up the wazoo.
what was the strategy for finding inconsistencies?
I believe they managed to modularise, add explicit negation and use HermiT. @jamesaoverton & @addiehl should be able to fill you in on the details. Would need a robust approach to modularization in future to ensure scaling though.
But the moment we bring in isoforms or species specific subclasses we get cryptic incorrect inferences up the wazoo.
Yep. We once had inference of acellular from the axioms "lacks some 'lobed nucleus'". Looking at the classification of PRO & GO terms used for molecular definitions used in lacks statements, there's already lots of potential for problems like this, e.g. 'natural helper lymphocyte' can be found by the DL query lacks_plasma_membrane_part some 'T cell receptor complex'
because it has the axiom clause lacks_plasma_membrane_part some 'alpha-beta T cell receptor complex'
and 'alpha-beta T cell receptor complex' subClassOf 'T cell receptor complex'. Nothing in the definition precludes this cell type from expressing a different isoform.
Given this, perhaps moving towards Jim's suggestion of punning the filler classes as individuals would be the safest & least disruptive approach (see https://arxiv.org/abs/1410.3862)
I found my notes from 2019-05-30. Our goal was to run a quick experiment looking for contradictions between the various membrane part relations.
Starting with the full Cell Ontology:
We found a list of 14 possible errors (some of them duplicates) that Alex reviewed. A few of those were genuine problems and I believe that Alex submitted fixes.
It would be better not to remove all the other object properties (step 1), but we wanted to use HermiT to handle the negation, and HermiT was failing for me on the full Cell Ontology.
all "lacks_plasma_membrane_part some X" axioms replaced by "'has plasma membrane part' only (not X)"
Shouldn't that be:
not (has_plasma_membrane_part some X) ?
all "lacks_plasma_membrane_part some X" axioms replaced by "'has plasma membrane part' only (not X)"
Shouldn't that be:
not (has_plasma_membrane_part some X) ?
I believe these are logically the same, but some incomplete reasoners will have better support for one vs. the other. For example ELK can find some contradictions for the second form.
I think it is important that the ontology not publish (or ever contain) the "lacks" existential restrictions. The case for real negation here is much clearer than for the "absence" phenotypes that we struggle with. They are obviously wrong (i.e. we don't need to run a reasoner to find that out), and the CL release pipeline is not the only tool running an OWL reasoner on CL—these will feed into any reasoners used by downstream users.
Agreed, but there's always a big demand for these from immunologists, so we need a way to retain them in some less dangerous form - especially where axioms are used by other resources as a (non-logical?) reference. Do you think punning plus value restriction would be sufficient? I think this is probably the least disruptive approach. @addiehl can provide background on how these are being used outside of an OWL context.
I like the punning—I think it makes sense because you are kind of directly talking about the class. Not sure how this comes out in OBO though, if that is an issue.
We're already outside of OBO with nested class expressions, so I don't think that matters.
Immunologists use the expression or lack of expression of certain key markers to identify or exclude major cell lineages. CD3 episilon = marker of the T cell lineage CD19 = marker of the B cell lineage CD14 = marker of monocytes/macrophages/neutrophils/dendritic cells (myeloid cells)
Sometimes immunologists identify cell types by what a cell both expresses and does not express.
We have included these lacks statements about lineage markers on immune cell types in the CL to facilitate analyses of flow cytometry and CyTOF results.
With punning we'd turn these from lacks some CD19 to
lacks value CD19` as a way of retaining the axioms but avoiding dangerous inference over PRO heirarchy.
That sounds fine to me, though I don't fully understand the technical implementation of punning.
[@dosumis this ticket makes points that you have been making for a decade, apologies if this duplicates an existing ticket, I didn't see one and thought to start one fresh]
In general CL follows sensible Rector Normalization but the immune cell hierarchy includes two kinds of problems:
This is not just fussing. This has serious impacts on maintainability and the ability to use reasoning to automate ontology classification. There are also potentially many cryptic errors lurking with these definitions. The impacts for inference are worse for non-leaf classes so I suggest prioritizing these.
An example is 'natural killer cell':
This is likely overspecified.
It also bears little resemblance to the textual definition: A lymphocyte that can spontaneously kill a variety of target cells without prior antigenic activation via germline encoded activation receptors and also regulate immune responses via cytokine release and direct contact with other cells.
The solution in this case may involve refactoring the logical axioms to a tighter equivalence axiom plus GCIs, potentially alternate equivalence axioms, or perhaps no equivalence axioms and only subClassOf (Necessary conditions only).
A more insidious example is 'group 1 innate immune cell' which has logical def:
and the text definition An innate lymphoid cell that is capable of producing the type 1 cytokine IFN-gamma, but not Th2 or Th17 cell-associated cytokines.
The exclusion criteria in the text def suggests the logical definition is under-specified and the ontology is over-axiomatized. This could cause mis-classification.
Remember, classes do not have to have logical defs /N+S conditions. It is perfectly acceptable to have only N conditions if the true definition is too hard to specify in OWL.
Another example is 'group 3 innate lymphoid cell', equivalent to
The textual definition is:
An innate lymphoid cell that constituitively expresses RORgt and is capable of expressing IL17A and/or IL-22.
It's not clear that constitutively expressing a protein is the same as having at least one instance as a part, but the text def includes two additional criteria not reflected in the logical definition or any logical axiom. I think in this case from my reading of https://www.nature.com/articles/nri3365 this is reasonably safe but nevertheless IL17A and IL-22 production should be stated as GCIs.
There are many such cases.
The strategy here should be to examine all immune cell classes and their logical axioms side by side, and implement the following process: