ropensci / RNeXML

Implementing semantically rich NeXML I/O in R
https://docs.ropensci.org/RNeXML
Other
13 stars 9 forks source link

Generate MIAPA checklist-compliant nexml #46

Open cboettig opened 10 years ago

cboettig commented 10 years ago

RNeXML should optionally be able to include all the basic metadata listed on the MIAPA checklist, hopefully guiding users that are unfamiliar with the process and being able to provide reasonable automated suggestions when possible (e.g. suggesting external identifiers based on OTU labels, #24) A function might be provided that could check (and perhaps summarize/return) miapa compliance(?).

I've reproduced the checklist below with notes added on how we're doing in RNeXML.

For each item, I've either made a note on if/how we handle it in NeXML, or a question when I'm unsure how to handle it. For instance, I can sometimes find a corresponding block in the example files in the miapa repo, but they are in OWL and the translation to NeXML's meta/RDFa isn't clear to me. An example nexml file that satisfies all these requirements would be super helpful to me.

Topology

All terminal nodes should be appropriately labelled and referenced in one of the following ways. Internal nodes need not be.

Character matrix:

I note that this description is entirely in reference to the character matrix being data from which the tree was derived. It appears that the MIAPA standard doesn't refer to comparative trait data. Further, it many not always be desirable to include a copy of the character matrix in the data file, where that alignment can be found in a separate file might suffice?

MIAPA shows an example how how to state that the tree wasDerivedFrom the alignment, not sure whe corresponding rdfa in the nexml would look like

 <owl:NamedIndividual rdf:about="&Peters2011hymenoptera;tree0000001">
        <rdf:type rdf:resource="&obo;CDAO_0000012"/>
        <rdf:type rdf:resource="&obo;CDAO_0000073"/>
        <prov:wasGeneratedBy rdf:resource="&annot;InferenceOfPetersTree"/>
        <prov:wasDerivedFrom rdf:resource="&annot;PetersAlignment"/>
    </owl:NamedIndividual>

MIAPA defines that the alignment wasGeneratedBy some software.

    <owl:NamedIndividual rdf:about="&annot;PetersMUSCLEAlignmentActivity">
        <rdf:type rdf:resource="&edamontology;operation_2928"/>
        <rdf:type rdf:resource="&obo;MIAPA_0000003"/>
        <prov:wasAssociatedWith rdf:resource="&annot;Muscle"/>
        <prov:used rdf:resource="&obo;MIAPA_0000013"/>
    </owl:NamedIndividual>

This is not part of the draft MIAPA standard, but merely my own suggestions/brainstorm list, based on the required metadata for EML description of character traits

    <owl:NamedIndividual rdf:about="&annot;RaXML_7.2.8">
        <rdf:type rdf:resource="&obo;MIAPA_0000016"/>
        <rdfs:label>RAxML_7.2.8</rdfs:label>
        <swo2:SWO_0000740 rdf:resource="&annot;UseMaximumLikelihood"/>
        <swo:SWO_0004000 rdf:resource="&obo;MIAPA_0000017"/>
    </owl:NamedIndividual>
 <owl:NamedIndividual rdf:about="&annot;UseMaximumLikelihood">
        <rdf:type rdf:resource="&obo;MIAPA_0000015"/>
        <rdfs:label>Maximum Likelihood algorithm</rdfs:label>
        <dc:description>The inference algorithm uses maximum likelihood as an optimality criterion. </dc:description>
    </owl:NamedIndividual>
cboettig commented 10 years ago

Also see