Closed hlapp closed 8 years ago
Do we need the ids of the otus
element or just of the individual otu
values? The latter is now done via b7635997e5b615961070d5589423844ccb49c85a
If I understood correctly from the call, we just need the latter since the ids need to be unique anyhow. Any user who wants the otus block ids as well can easily get them from the get_taxa()
table, along with the labels, etc.
I think this addresses the issue, so will close, but please re-open and elaborate if necessary.
Is it difficult to return the otus
value? It's much easier to subset a data.frame by one value than it is to do so by possibly hundreds or thousands. I.e., yes, I can match the data.frame
returned by get_taxa()
against a potentially very long list of otu
IDs, or I can subset the data.frame
by a single value. Why not allow the latter, unless it's somehow difficult to return the necessary ID value from the get_characters()
result?
No, certainly we could add it in automatically if that's preferred. But note that the join is just 1 line and performance-wise quite fast:
e.g.
library("dplyr")
get_characters(nex, otu=TRUE) %>%
left_join(get_taxa(nex), by = c('otu' = 'id')) %>%
filter(otus == "os2")
isn't that much more onerous than:
library("dplyr")
get_characters(nex, otu=TRUE) %>%
filter(otus == "os2")
and is more general, in the sense that I'd like to encourage users to learn the join()
pattern because it's more general. One person wants to filter on otus, but what if you want to filter on some aspect of the otu metadata, or other metadata? My thought was to avoid providing too many key
/id
columns by default, since they are apt to confuse more beginning users; advanced types using all the id hierarchies can probably manage the extra syntax.
But I don't mean to be stubborn on this, we could of course add it in. Just wanted to explain my reasoning here first.
NeXML as a format can contain multiple characters and otus blocks, and identifiers are used to tie them together. As per the brief conference call, there is currently no way to obtain the otus reference from a
<characters/>
block that would allow one to properly subset the data.frame returned byget_taxa()
.