ropensci / RNeXML

Implementing semantically rich NeXML I/O in R
https://docs.ropensci.org/RNeXML
Other
13 stars 9 forks source link

Coercing column type in get_characters() is broken for polymorphic and uncertain states #186

Closed hlapp closed 5 years ago

hlapp commented 5 years ago

The assignment of class for each column in get_characters() based on the data type assigned in the NeXML schema is broken. The result is that polymorphic and uncertain states in a StandardCells (discrete or continuous traits) matrix will be lost (they become NA as a result of the coercion).

The reason this has remained largely obscured is because the current implementation of the coercion has a bug that accidentally limited the coercion to only one of the trait columns. In the tests that have been run with data including polymorphic states that column happened not to be one with polymorphic states. (The problem surfaced when tested with a matrix that contained only one trait column, with a polymorphic state.)

Arguably, the coercion of data type for a column from the character default to a numeric type is useful, and arguably easier done by this library than left to the user. So in the interest of user-friendliness, the type coercion shouldn't just be removed altogether. However, the algorithm needs to change to coerce type to a numeric type only if there are no polymorphic or uncertain states in a column.