Closed cboettig closed 6 years ago
It would be good to have annotations on the nodes that were 'anchored', e.g. what the basis of the date was (fossil?), whether it's a lower or upper limit or a range.
Extending the to-do list based on semantic objectives proposed in Kseniia's project description, with some comments from me on implementation:
I think I will just map R's citation
class object into prism
metadata, as done in TreeBASE. Any preference for prism over Shotton's SPAR ontologies for this??
Native to NeXML already. We should just add a function that will use Scott's taxize
package to get TSN identifiers for species names (e.g. when extracted from an ape::phylo$tip.label
and add the identifier to the otu
node metadata
Um, not so sure. Can someone point me to examples of NeXML files that have such annotations?
It seems like this would be most useful if we provided functions that could also operate on this data. For now, this data would be read in to R and could be displayed, but as it is not part of the ape or phylo4 classes, no function could do anything with it. Ideally I imagine providing a function that could "draw a tree" from the distribution implied by the branch uncertainty, providing an easy way for R programmers to integrate over this uncertainty using only existing tools. Also still need to figure out the best way to write annotations to branches. Currently requires knowledge of the S4 structure.
Re: citation metadata, you might want to consider the BIBO vocabulary.
Hi Carl,
- to convey links from trees to associated publications; I think I will just map R's citation class object into prism metadata, as done in TreeBASE. Any preference for prism over Shotton's SPAR ontologies for this??
I don't care.
- to convey links from terminal nodes (less importantly, internal nodes) to taxonomic identifiers (and other forms of alternative labeling); Native to NeXML already. We should just add a function that will use Scott's taxize package to get TSN identifiers for species names (e.g. when extracted from an ape::phylo$tip.label and add the identifier to the otu node metadata
Sounds great. I was a little worried at first when you said "terminal nodes" but you clarify later that you mean metadata attached to the otu element, which is probably the better place for taxonomic identifiers.
- to convey reconciliation results (duplication, speciation, lateral transfer); Um, not so sure. Can someone point me to examples of NeXML files that have such annotations?
Gene duplication and speciation events are usually mapped onto trees using phyloxml or nhx (i.e. Chris Zmasek has developed this). In Bio::Phylo I've added the option of reading and writing phyloxml and translating it to nexml. The way I dealt with the events annotations was to make the terms as they are used in phyloxml into semantic annotations whose namespace is " http://www.phyloxml.org/1.10/terms#". I don't know if it's urgent to replicate this functionality in R, though.
- to convey compound branch features such as lengths with uncertainties (a la DateLife), or multiple types of support values (bootstrap + posteriors). _It seems like this would be most useful if we provided functions that could also operate on this data. For now, this data would be read in to R and could be displayed, but as it is not part of the ape or phylo4 classes, no function could do anything with it. Ideally I imagine providing a function that could "draw a tree" from the distribution implied by the branch uncertainty, providing an easy way for R programmers to integrate over this uncertainty using only existing tools
My guess is that this might be the most important feature that R users might take out of this. They're going to want to do numerical things so if NeXML can offer them branch lengths (with intervals) and support values so they can easily rip through them across a large tree or a set of trees I think that would be great.
Secondly, by "draw a tree" I suppose you mean to simulate one (or one million) within the interval that is specified in the annotation (prettier still if that annotation also specifies what the underlying distribution is, I guess).
With my monday morning eyes I first thought you were talking about visualization - which would also be excellent. Is the current industry standard to somehow convince figtree to show node bars which you then poke at in illustrator? Anyway, visualization of NeXML annotations would be great too - though kind of a separate story altogether.
Also still need to figure out the best way to write annotations to branches. Currently requires knowledge of the S4 structure.
I have no good tips here. Other than for the Java API I haven't implemented edge objects with annotations attached to them.
Rutger
Many R-based tools need ultrametric / time-calibrated phylogenies. R also provides several tools to do this. A good use case for metadata reading and writing might be to work out what metadata we might add if we: read in an uncalibrated phylogeny, use a given function (and parameter choice potentially) in a given software to perform the time-calibration, and then write out the time-calibrated tree. For instance, we might annotate: