phenoscape / rphenoscape

R package to make phenotypic traits from the Phenoscape Knowledgebase available from within R.
https://rphenoscape.phenoscape.org/
Other
5 stars 5 forks source link

Optionally include identifiers for taxa and characters (entities) in return value #2

Open hlapp opened 9 years ago

hlapp commented 9 years ago

Taxon names can be ambiguous due to synonymies and homonymies. To facilitate integration of returned trait data matrices with other trait data, having also identifiers for taxa rather than just names can help greatly.

Some characters (entities) in a matrix may be known to be much more similar semantically (or conceptually) than others, but to assess this with metrics the entities need to be tied into an ontology. To enable this, the identifiers for characters (for pk_ontotrace, this would be the identifiers of their entities) are required.

The identifiers could all be queried for one-by-one from the Phenoscape API, but for larger matrices this may be time consuming, and because the identifiers are (or ought to be, see phenoscape/phenoscape-kb-services#20) already returned in NeXML from the Phenoscape API, having to query for them again seems unnecessary.

Initial plan for implementing this is to optionally return a list instead of a data.frame. The list would include the matrix, a table of taxon identifiers, and a table of entity identifiers. @sckott and @cboettig - are there better ways of doing this?

Implementing this depends on the metadata extraction in RNeXML getting fixed (see ropensci/RNeXML#129), and on character identifier annotations being added to the output NeXML in OntoTrace (see phenoscape/phenoscape-kb-services#20).

sckott commented 9 years ago

That seems reasonable to me. To be clear what are these taxa identifiers? I assume they are IDs used internally within Phenoscape?

hlapp commented 9 years ago

To be clear what are these taxa identifiers? I assume they are IDs used internally within Phenoscape?

As for what we get back in the NeXML, these are VTO identifiers. Ideally there'd also be NCBI identifiers, I suppose (though they are only available for a small subset of VTO).

hlapp commented 8 years ago

I think this was addressed at least for taxa in fe7d29e968e3f3215600259d7a0a97c0857b0206.

hlapp commented 2 years ago

Should at least evaluated for current status prior to the TraitFest event. Not clear how of this is still relevant.