ropensci / RNeXML

Implementing semantically rich NeXML I/O in R
https://docs.ropensci.org/RNeXML
Other
13 stars 9 forks source link

Packages uses IDs instead of labels when labels are non-unique #140

Closed hlapp closed 5 years ago

hlapp commented 8 years ago

After an hour of hunting around and debugging it seems that the get_characters() method will use the IDs of the <otu/> and <char/> elements for the row and column labels of the matrix, respectively, if the respective labels are not unique. Is that correct? Here is an example NeXML file from the Phenoscape API.

In principle this makes of course sense - R wants labels to be unique, or otherwise any kind of subsetting will yield confusing or undesired results. But I'm wondering whether this is documented somewhere prominently - it did hit me by surprise, and at first I thought upon seeing the IDs all over the place that something must have gone wrong. (And it might have on the data side - see phenoscape/phenoscape-kb-services#35 and phenoscape/phenoscape-kb-services#36 - but that's a different story.)

cboettig commented 8 years ago

Yup, that's correct (https://github.com/ropensci/RNeXML/blob/master/R/get_characters.R#L115); sorry, I thought it was in the docs for get_characters but it isn't. I better fix the docs

cboettig commented 5 years ago

looks like this behavior is reasonable and documented, so closing now.