Closed johnbradley closed 3 years ago
For now we plan on leaving these functions in rphenoscape.
Here is what I am thinking for renaming these functions:
pk_get_ontotrace(nex)
renamed to get_ontotrace_char_matrix(nex)
pk_get_ontotrace_meta(nex)
renamed to get_ontotrace_meta(nex)
pk_get_study(nexmls)
renamed to get_study_char_matrix(nexmls)
pk_get_study_meta(nexmls)
renamed to get_study_meta(nexmls)
Thoughts @hlapp ?
Looking at the possibility of reducing the ontotrace and study RNeXML::get_characters
functions into one function.
From a high level the differences are
pk_get_ontotrace()
passes otus_id = TRUE
to get_characters
which adds an additional otus
column to the resulting data frame
-pk_get_study_by_one
changes values from numbers into labelsWe create a nexml class for ontotrace NEXML data:
> nex <- pk_get_ontotrace_xml(taxon = c("Ictalurus", "Ameiurus"), entity = "fin spine")
For example pk_get_ontotrace()
the values are 1, NA, or 1 and 0:
> mat <- head(pk_get_ontotrace(nex))
> mat[,c(1,4,5)]
taxa anterior dentation of pectoral fin spine anterior distal serration of pectoral fin spine
1 Ameiurus brunneus 1 1
2 Ameiurus catus 1 1
3 Ameiurus melas NA 1 and 0
4 Ameiurus natalis NA 1
5 Ameiurus nebulosus 1 1
6 Ameiurus platycephalus 1 1
For example pk_get_study_by_one()
the values are "present", NA, or "":
> mat <- head(pk_get_study_by_one(nex))
> mat[,c(1,3,4)]
taxa anterior dentation of pectoral fin spine anterior distal serration of pectoral fin spine
1 Ameiurus brunneus present present
2 Ameiurus catus present present
3 Ameiurus melas NA
4 Ameiurus natalis NA present
5 Ameiurus nebulosus present present
6 Ameiurus platycephalus present present
To me it seems like some meaning has been lost here with "1 and 0" becoming "".
We create a nexml class for study NEXML data:
> (slist <- pk_get_study_list(taxon = "Ictalurus australis", entity = "fin"))
> (nex_list <- pk_get_study_xml(slist$id))
> nex2 <- nex_list[[1]]
For example pk_get_ontotrace()
we end up with only one number for column but a larger range:
> mat <- head(pk_get_ontotrace(nex2))
> mat[,c(1,4,5)]
taxa Anal-fin rays, species mean count Anterior dentations of pectoral spine
1 Ameiurus brunneus 1 3
2 Ameiurus catus 2 2
3 Ameiurus melas 2 1
4 Ameiurus natalis 3 1
5 Ameiurus nebulosus 2 2
6 Ameiurus platycephalus 2 3
For example pk_get_study_by_one()
we end up with labels that make it easier to understand the results:
> mat <- head(pk_get_study_by_one(nex2))
Map symbols to labels...
> mat[,c(1,4,5)]
taxa Anterior dentations of pectoral spine Anterior distal serrae of pectoral spine
1 Ameiurus brunneus large <3 moderately sharp serrae
2 Ameiurus catus moderate 3-6 moderately sharp serrae
3 Ameiurus melas small absent or scarcely developed
4 Ameiurus natalis small 3-6 moderately sharp serrae
5 Ameiurus nebulosus moderate <3 moderately sharp serrae
6 Ameiurus platycephalus large <3 moderately sharp serrae
How about creating a function that supports both cases like so?
get_char_matrix <- function(nex, otus_id = TRUE, translate_symbols=FALSE) {...}
The otus_id
parameter is just passed to RNeXML::get_characters
.
When translate_symbols
is TRUE we apply the logic that translates numbers to labels.
I think that's a good way to think about it. 0
, 1
, 2
etc are symbols that would be used in a traditional character state matrix format that most analysis programs (and R packages for comparative analysis) will expect (for categorical character states). (Note that technically, there is no such thing as a "range" of such numbers; there are no implied numeric semantics to 1
vs 0
or 3
other than that they signal distinct character states (much like nucleotide bases in genetic data).)
Instead of translate_symbols
, I'd suggest something like states_as_symbols
, which should probably default to TRUE (because most analysis functions will expect symbols, not labels).
To me it seems like some meaning has been lost here with "1 and 0" becoming "".
The problem there is that the state for this taxon is polymorphic. I suspect the code can't handle that when using labels (it would have to say present and absent
, for example). This sounds more like a bug.
Instead of
translate_symbols
, I'd suggest something likestates_as_symbols
, which should probably default to TRUE (because most analysis functions will expect symbols, not labels).
Of course, states_as_labels
with default FALSE would be equally suitable. Or perhaps even better, because if one sets states_as_symbols
to FALSE (i.e., away from the default), it's not obvious how states would be presented instead. Whereas with states_as_labels
, if one sets it to TRUE (i.e., away from the default), the name of the parameter suggests clearly what would happen.
Fixed by #228
Determine if the NeXML data extracting functions would better as additions to RNeXML.
There are several data extracting functions that process NeXML objects :
RNeXML::get_characters
Determine if the functions processing a list of NeXML objects (pk_get_study*) can be dropped in favor of exporting the internal functions they call.