phyloref / phylo2owl

Tool to convert phylogenies to OWL ontologies
MIT License
4 stars 2 forks source link

Phyloreferences that can be interpreted as either node-based or branch-based #29

Closed gaurav closed 7 years ago

gaurav commented 7 years ago

When we have a tree that looks like:

(((A, B), C), D)

And a phyloreference that has 'A' and 'B' as internal specifiers and 'D' as an external specifier, there are two possible clades we can match. If the definition says "the least inclusive ancestor of A and B that is not an ancestor of D" and all of its descendants, then this is essentially a node-based phyloreference and we would match:

(A, B)

However, if the definition reads, "the most inclusive ancestor of A and B that is not an ancestor of D" and all of its descendants, then this is essentially a branch-based phyloreference and we would match:

((A, B), C)

A real-world example of this is the definition of the clade Mitthyridium in Fisher et al., 2007, which has two internal specifiers and one external specifier. The tree in the paper interprets this as the least inclusive clade, but the definition explicitly calls this a stem-based definition; it reads:

Mitthyridium nomen cladi conversum, Mitthyridium fasciculatum (Hook. & Grev.) H. Rob., Phytologia 32: 432. (1975)

Stem-based definition:

  • internal specifier: Type: Syrrhopodon fasciculatum Hook. & Grev., Edinb. J. Sci. 3: 225. (1825)
  • internal specifier: Type: Codonoblepharum undulatum Dozy & Molk., Ann. Sci. Nat., Bot., III, 2: 301. (1844)
  • external specifier: Type: Syrrhopodon croceus Mitt., J. Proc. Linn. Soc., Bot. Suppl. 1: 41. (1859)

Important synapomorphies: cladocarpy, many cancellinar columns, very wide sterome, creeping habit Included terminal clades: undulatum, jungquilianum, fasciculatum, constrictum, obtusifolium

This is a subtlety we don't currently model in our simple "internalSpecifiers/externalSpecifiers" model of phyloreferences, in which internal specifiers are treated as has_Descendant constraints and external specifiers as excludes_lineage_to constraints. How can we incorporate this?

Once we get node-based phyloreferences fully working (see #26, #28), we could use that to differentiate the two, with the first definition written as a constrained node-based definition ("mrca(A, B) and has_Descendant not value D") and the second as a constrained branch-based definition ("has_Descendant value A and has_Descendant value B and excludes_lineage_to value D").

gaurav commented 7 years ago

On discussing this, we decided that we didn't need an extra property for this: an external specifier always implies a branch-based definition, so there is no ambiguity about how to interpret the internal specifiers in this case. Closing.

Note that one way of distinguishing this would be to replace the external specifier with an external qualifier -- we would then have two internal specifiers forming a node-based definition, and if the external qualifier was not outside the clade, the phyloreference would fail to resolve.

ncellinese commented 7 years ago

Note that one way of distinguishing this would be to replace the external specifier with an external qualifier -- we would then have two internal specifiers forming a node-based definition, and if the external qualifier was not outside the clade, the phyloreference would fail to resolve.

But it is not your prerogative to assign qualifiers vs specifiers. It is up to the author of the definition, which is never us.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/phyloref/phylo2owl/issues/29#issuecomment-326680931, or mute the thread https://github.com/notifications/unsubscribe-auth/ACaXwQgJOW_01BIC8Fh3ndmk6h73YjCuks5seG35gaJpZM4Oall6.

gaurav commented 7 years ago

I just double-checked this, and the figure in the paper matches the wrong clade: it matches a node-based definition for mrca(Codonoblepharum undulatum, Syrrhopodon fasciculatum), not the branch-based definition it ought to match.