phyloref / phylo2owl

Tool to convert phylogenies to OWL ontologies
MIT License
4 stars 2 forks source link

phyloref:excludes_lineage_to cannot function without a sibling clade #25

Closed gaurav closed 7 years ago

gaurav commented 7 years ago

Our phyloreferencing machinery uses excludes_lineage_to, which identifies a sibling to some ancestor of the target and that sibling's descendants, but not ancestors of the target themselves. This allows us to construct phyloreferences such as:

has_Descendant value Campanula_erinus and excludes_lineage_to value Campanula_drabifolia

However, consider tree S21, in which C. erinus and C. drabifolia are found to be sister taxa: this can be represented in Newick as (Campanula_erinus, Campanula_drabifolia), or visually as:

*C. erinus* and *C. drabifolia* as sister taxa, from

In this case, there is no sibling node for our phyloreference to match, since every possible node is an ancestor of the target itself. I got around this by going one node up on the excludes_lineage_to:

has_Child some (excludes_lineage_to value Campanula_drabifolia) and has_Child value Campanula_erinus_AC107

This appears to work correctly for all three phylogenies. So should we recommend that excludes_lineage_to always be used with a has_Child, or is there a cleverer solution this problem?

gaurav commented 7 years ago

@hlapp @ncellinese This is the problem with excludes_lineage_to I mentioned on our last call. Have a look and tell me what you think!

hlapp commented 7 years ago

I'm not seeing the problem. In the tree you give as example, there obviously is no node for which Campanula erinus is a descendant but Campanula drabifolia is not, because their immediate parents are identical and thus their common ancestor. So the phyloreference ought not to resolve, and that indeed it doesn't seems the correct result to me. I'm in fact not sure how your variation can change this, because there is really no node that can satisfy the semantics.

What am I missing?

gaurav commented 7 years ago

It's true that there's no semantic problem here, but I think the expression:

has_Descendant value Campanula_erinus and excludes_lineage_to value Campanula_drabifolia

ought to reference Campanula_erinus rather than Nothing: there might be only one explicit individual at the leaf, but there are implicit ancestors between it and the node where the lineages diverged, and I think those should be matched with a branch-based phyloreference.

This could be pretty easy to implement, too: we could add an extra node above every leaf node, which would then match this expression.

hlapp commented 7 years ago

Actually no. A clade is not the descendants of its common ancestor or failing that, some descendant. Those are two different things. (You could, if you wanted to, combine with them a UNION. But that'd be pretty ugly.)

What you are trying to express is something that isn't in the tree, so the result should be nothing. The tree is a real instance; you can't imagine something into it that you postulate ought to be there. Either a common ancestor is there, or it's not there.

If there is no node in the tree that has_Descendant value Campanula_erinus and excludes_lineage_to value Campanula_drabifolia, then there is not some implicit "fallback" semantics that says it's then the object of has_Descendant. Neither in OWL, nor in phylogenetic taxonomy.

gaurav commented 7 years ago

I tried to think of a concrete example of this (do I has_Descendant value me and excludes_lineage_to value my_sister), and I guess you're right, there isn't anyone who matches that definition. This does cause problems with portable node-based definitions as we've currently coded those, but I'll open a new issue for that. Closing this one now.