Cannot get otus reference from characters block

hlapp commented 8 years ago

NeXML as a format can contain multiple characters and otus blocks, and identifiers are used to tie them together. As per the brief conference call, there is currently no way to obtain the otus reference from a <characters/> block that would allow one to properly subset the data.frame returned by get_taxa().

cboettig commented 8 years ago

Do we need the ids of the otus element or just of the individual otu values? The latter is now done via b7635997e5b615961070d5589423844ccb49c85a

If I understood correctly from the call, we just need the latter since the ids need to be unique anyhow. Any user who wants the otus block ids as well can easily get them from the get_taxa() table, along with the labels, etc.

I think this addresses the issue, so will close, but please re-open and elaborate if necessary.

hlapp commented 8 years ago

Is it difficult to return the otus value? It's much easier to subset a data.frame by one value than it is to do so by possibly hundreds or thousands. I.e., yes, I can match the data.frame returned by get_taxa() against a potentially very long list of otu IDs, or I can subset the data.frame by a single value. Why not allow the latter, unless it's somehow difficult to return the necessary ID value from the get_characters() result?

cboettig commented 8 years ago

No, certainly we could add it in automatically if that's preferred. But note that the join is just 1 line and performance-wise quite fast:

e.g.

library("dplyr")
get_characters(nex, otu=TRUE) %>% 
  left_join(get_taxa(nex), by = c('otu' = 'id')) %>% 
  filter(otus == "os2")

isn't that much more onerous than:

library("dplyr")
get_characters(nex, otu=TRUE) %>% 
  filter(otus == "os2")

and is more general, in the sense that I'd like to encourage users to learn the join() pattern because it's more general. One person wants to filter on otus, but what if you want to filter on some aspect of the otu metadata, or other metadata? My thought was to avoid providing too many key/id columns by default, since they are apt to confuse more beginning users; advanced types using all the id hierarchies can probably manage the extra syntax.

But I don't mean to be stubborn on this, we could of course add it in. Just wanted to explain my reasoning here first.

ropensci / RNeXML

Cannot get otus reference from characters block #136