phyloref / phyloref-ontology

Phyloreferencing Ontology and OWL DL reasoning with phyloreferences
Creative Commons Zero v1.0 Universal
6 stars 1 forks source link

An object property for indicating taxa included in a taxonomic unit #13

Closed gaurav closed 5 years ago

gaurav commented 6 years ago

Proposed term: includes_taxon Definition: Indicates a taxon included in a Taxonomic Unit (#10). Domain: Taxonomic Unit (#10) Range: dwc:Taxon See also: PHYX context

Competency questions:

This is currently implemented in our tools as a has_scientific_name term that indicates scientific names that themselves indicate taxa. However, refocusing this term to directly include a taxon provides us with a clearer link to Darwin Core and is structured similarly to includes_specimens (#12).

hlapp commented 6 years ago

What is a taxon that is not a TU? I think that question has to be answered first.

gaurav commented 6 years ago

It does indeed! We're interested in two things when it comes to TUs:

  1. A node on a phylogeny represents some biological entity, whether a specimen or a taxon.
  2. A phyloreference defines a clade which must include or exclude a specimen, a taxon or another phyloreference.

So, a taxon is always a TU, but a TU may also include individual specimens and -- ideally -- should be stretchable enough to include lineages.

An interesting solution to this problem is used by the Population and Community Ontology (PCO) and the Biological Collections Ontology (BCO): the PCO has a class for an organismal entity, which has subclasses for a single organism or virus as well as for a collection of organisms. The BCO allows an organismal entity to be related to a dwc:Taxon via an Identification, but also appears to assert that a dwc:Taxon is itself a collection of organisms.

So, one possible solution for us would be to use organismal entity instead of TUs. We would need to invent a term for the CDAO ontology to assert that a cdao:Node represents a particular organismal entity, but then we could immediately define the entity in terms of particular specimens or assert that it has been identified as a particular taxon without needing to clarify the definition of TUs in CDAO (#10). I think CDAO would benefit from having such a term, and we would benefit from being about to use the BFO and PCO in building the taxon- and specimen-related parts of our model.

ncellinese commented 6 years ago

I would like to talk about this when we meet in person in New Zealand. There si something I am not sure I understand or appreciate here

Nico

On Aug 21, 2018, at 9:19 PM, Gaurav Vaidya notifications@github.com wrote:

It does indeed! We're interested in two things when it comes to TUs:

A node on a phylogeny represents some biological entity, whether a specimen or a taxon. A phyloreference defines a clade which must include or exclude a specimen, a taxon or another phyloreference. So, a taxon is always a TU, but a TU may also include individual specimens and -- ideally -- should be stretchable enough to include lineages.

An interesting solution to this problem is used by the Population and Community Ontology (PCO) and the Biological Collections Ontology (BCO): the PCO has a class for an organismal entity http://purl.obolibrary.org/obo/PCO_0000031, which has subclasses for a single organism or virus http://www.ontobee.org/ontology/PCO?iri=http://purl.obolibrary.org/obo/CARO_0001010 as well as for a collection of organisms http://purl.obolibrary.org/obo/PCO_0000000. The BCO allows an organismal entity to be related to a dwc:Taxon http://www.ontobee.org/ontology/BCO?iri=http://rs.tdwg.org/dwc/terms/Taxon via an Identification http://purl.obolibrary.org/obo/BCO_0000084, but also appears to assert that a dwc:Taxon is itself a collection of organisms.

So, one possible solution for us would be to use organismal entity http://purl.obolibrary.org/obo/PCO_0000031 instead of TUs. We would need to invent a term for the CDAO ontology to assert that a cdao:Node represents a particular organismal entity, but then we could immediately define the entity in terms of particular specimens or assert that it has been identified as a particular taxon without needing to clarify the definition of TUs in CDAO (#10 https://github.com/phyloref/phyloref-ontology/issues/10). I think CDAO would benefit from having such a term, and we would benefit from being about to use the BFO and PCO in building the taxon- and specimen-related parts of our model.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/phyloref/phyloref-ontology/issues/13#issuecomment-414874385, or mute the thread https://github.com/notifications/unsubscribe-auth/ACaXwUzyQkz91oEHFEqoHzrbSwmdrwGCks5uTLGmgaJpZM4UbXbr.

hlapp commented 6 years ago

I think we're going to have to look at a data model we're driving towards in the form of a graph. @gaurav I'm not convinced by your argument that what we need can't be achieved in a simpler way, but if we look at this in a graphical way maybe it becomes clearer.

hlapp commented 5 years ago

I would argue that given our curation results there is ample evidence supporting the conclusion that there is no discernible difference between a specifier and a referent to a taxon concept. Hence, for these questions raised earlier:

We are interested in two things:

  1. A node on a phylogeny represents some biological entity, whether a specimen or a taxon.

In the CDAO model, a node on a phylogeny represents a cdao:TU. Arguably, the class of cdao:TU may be broader than a taxonomic concept, but at least it subsumes taxon concepts, such as tc:TaxonConcept. A taxon concept can be qualified by a variety of properties, including a specimen, a scientific taxon name, etc. The TCAN ontology (see #28) formalizes this.

  1. A phyloreference defines a clade which must include or exclude a specimen, a taxon or another phyloreference.

I think we can be much more specific: A phyloreference defines a clade that must include or exclude certain taxon concepts. I think that's well supported by our curation results so far.

Note also that #29 introduces phyloref:includes_TU. In light of that and the above, I think this issue should be closed.

ncellinese commented 5 years ago

A node on a phylogeny represents some biological entity, whether a specimen or a taxon. Technically, a sequence, too, which is different from a specimen or taxon.

Nico

In the CDAO model, a node on a phylogeny represents a cdao:TU. Arguably, the class of cdao:TU may be broader than a taxonomic concept, but at least it subsumes taxon concepts, such as tc:TaxonConcept. A taxon concept can be qualified by a variety of properties, including a specimen, a scientific taxon name, etc. The TCAN ontology (see #28 https://github.com/phyloref/phyloref-ontology/pull/28) formalizes this.

A phyloreference defines a clade which must include or exclude a specimen, a taxon or another phyloreference. I think we can be much more specific: A phyloreference defines a clade that must include or exclude certain taxon concepts. I think that's well supported by our curation results so far.

Note also that #29 https://github.com/phyloref/phyloref-ontology/pull/29 introduces phyloref:includes_TU. In light of that and the above, I think this issue should be closed.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/phyloref/phyloref-ontology/issues/13#issuecomment-448453335, or mute the thread https://github.com/notifications/unsubscribe-auth/ACaXwWdxM94bgRrM-e_mFkABqS63pHjKks5u6a0WgaJpZM4UbXbr.

hlapp commented 5 years ago

Technically, a sequence, too, which is different from a specimen or taxon.

It still represents an OTU (cdao:TU), and arguably a taxon concept, even if possibly a very narrow one.

ncellinese commented 5 years ago

I thought you wanted to be exhaustive in listing anything that a node can represent.

On Dec 18, 2018, at 10:52 PM, Hilmar Lapp notifications@github.com wrote:

Technically, a sequence, too, which is different from a specimen or taxon.

It still represents an OTU (cdao:TU), and arguably a taxon concept, even if possibly a very narrow one.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/phyloref/phyloref-ontology/issues/13#issuecomment-448460772, or mute the thread https://github.com/notifications/unsubscribe-auth/ACaXwdl2C3s2Odu1Z5WqFsa1Nl3d7ig1ks5u6bfwgaJpZM4UbXbr.

hlapp commented 5 years ago

No. This is about what specifiers given in clade definitions can represent, and how we map that to what what a node in a tree represents. My argument is that based on what we have seen from the curation results, both are cdao:TUs.

ncellinese commented 5 years ago

A node on a phylogeny represents some biological entity, whether a specimen or a taxon. <— This sentence says nothing about what a specifier is. I was commenting on the meaning of this.

On Dec 18, 2018, at 11:03 PM, Hilmar Lapp notifications@github.com wrote:

No. This is about what specifiers given in clade definitions can represent, and how we map that to what what a node in a tree represents. My argument is that based on what we have seen from the curation results, both are cdao:TUs.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/phyloref/phyloref-ontology/issues/13#issuecomment-448462359, or mute the thread https://github.com/notifications/unsubscribe-auth/ACaXwRttlZ0ejdkCBcv9zfsC6036ydVsks5u6bqhgaJpZM4UbXbr.

hlapp commented 5 years ago

Ah. I didn't state this, I simply quoted it from an earlier comment of @gaurav's. It's sometimes hard to see in email, one of the reasons I usually go to Github for commenting rather than email.

mjy commented 5 years ago

Coming in here somewhat blind, but two thoughts jumped to mind.

  1. Why do you need to worry about the concept of a TU? Isn't your core concept a "node TU"? I.e. you don't want to define everything you could say about a TU, but rather the nature of the TU as it merges with the concept of a phylo node? Perhaps constrain your new concept discussion to what is special about the concept of a TU as it relates to a phylo node. In this vein it strikes me that the node concept must include something relating the nature of the (old) TU concept which includes some core evoluationary concept (e.g. DNA, or anatomy in the context of homology). Those concepts (and/or others) must be present or the phylogeny (set of nodes/edges) is not a phylogeny. I.e. I can make a tree of herbivocres and carnivores and omnivores, but I as a scientist don't want this to be interpretted as a phylogeny, why, because I have not asserted any data of a particular type (genetic, homology hyptheses).

  2. So, do you need this/these properties? What is gained? You already know the class of the thing that is part of(?) the node thanks to some other classification (e.g. dwc:Taxon).

    • With this pattern you're going to have enumerate every possible thing that can be part of a TU
    • I haven't looked hard at the ontology, it might exist, but can you have a general purpose property that includes stuff into a "node TU", if need be you could restrict the property by domain;range (but this is also enumerating all possible things)
    • I.e. rather than assert that the thing in the TU is an X, assert that X is also a (node) TU. Given 1) then you can infer that there is some "phylogenetic" information/data is recorded/asserted about X.
hlapp commented 5 years ago

Thanks for chiming in @mjy ! Note that nodes on a particular tree are instances, always. You could argue that these are instances of taxon concepts. However, that's not the model CDAO uses – in CDAO, nodes on a tree are not a kind of TU, but represent a TU. While there might be instances of cdao:TU that aren't taxon concepts, for ones we are dealing with all are.

We will invariably have to match TUs, between those used by authors of clade definitions and those found on some tree. This can't take advantage of same individuals (or classes for that matter), as unfortunately there is no canonical catalog of taxon concepts with GUIDs that everyone could just reference. Instead, everyone uses various properties (scientific name, perhaps authors, according to publication, type specimen, code, etc) to qualify the taxon concept they are referring to. We will therefore have to use those qualifying properties to assert which ones match (are "congruent" with) which.

One way to do this is to use individuals, which we catalog and for each of which we somehow "know" (say, using some kind of database) what all their qualifying properties are. We would then assert appropriate relations between those individuals that through computation or other means we somehow determine as "matching". Using this for resolving phyloreferencing through reasoning would require a lot of Abox reasoning that EL reasoners don't support.

The other way to do this is to assert all qualifying properties as property restrictions on classes. Because it is neither necessary nor useful to name all these classes, with this we can avoid building an explicit catalog, and instead can just use class expressions, both for phyloreferences (which reflect what a clade definition's author said) and TUs found on a tree. This requires Tbox reasoning, and is what #29 proposes.

However, with the latter approach "matches" need to be asserted between classes, along the lines of <TU1> rdfs:subClassOf (matches_TU some <TU2>), with both <TU1> and <TU2> being class expressions using property restrictions. Hence, those TU class expressions need to be fully qualified or the reasoner will make incorrect inferences.

I'm not sure this actually answers your question, but I didn't fully understand what you were asking or suggesting 😄

gaurav commented 5 years ago

Because we're changing over to using property restrictions instead of TU individuals in our model (see https://github.com/phyloref/phyloref-ontology/issues/9#issuecomment-461604456 for a fuller write-up on how this looks), I think the question of including taxa in a taxonomic unit is now moot: a TU restriction will match another TU restriction only if the properties being restricted match, either directly (identical scientific names) or through other assertions (if we assert two scientific names as being synonymous, say). In this view, I'm not sure we will ever need to express the idea of multiple taxa being combined into a single TU, which is what this issue envisioned. Combining multiple TUs also makes it harder to clearly express how those TUs are related to each other, while relationships between scientific names and specimens should be easier to express thanks to ontologies like Nomen. Therefore, I think this line of development should be abandoned and this issue should be closed.

hlapp commented 5 years ago

Good summary, @gaurav. I agree.