makeSimmap could return state labels rather than state integers

SimonGreenhill commented 2 years ago

makeSimmap returns mapped edges labelled according to trait state ID (an integer) rather than the original state value. This makes direct interoperability with phytools::make.simmap and tools harder.

I'm not sure how high interoperability is on your priority list, but given you use phytools to plot simmap results in the examples then it would be nice to have consistency.

Here's some code that shows the problem:

library(corHMM)   # follow example in corHMM docs
library(phytools)

data(primates)
phy <- primates[[1]]
phy <- multi2di(phy)

## make character with character labels
data <- data.frame(
    Genus_sp=phy$tip.label,
    Trait=sample(x=c("A", "B"), size=Ntip(phy), replace=TRUE)
)

##run corhmm 
MK <- corHMM(phy, data, 1)
model <- MK$solution
simmap.corhmm <- makeSimmap(tree=phy, data=data, model=model, rate.cat=1, nSim=1, nCores=1)

phytools::plotSimmap(simmap.corhmm[[1]])

# Note column labels are '1', '2' not the "A", "B" found as our trait states.
head(simmap.corhmm[[1]]$mapped.edge)  

## compare to phytools
simmap.phytools <- phytools::make.simmap(tree=phy, setNames(data$Trait, data$Genus_sp), model="ER")
# Note column labels are "A", "B" not '1' or '2'
head(simmap.phytools$mapped.edge) 

# ... this has downstream consequences e.g. if we want to recolor the trait states in plotSimmap, phytools expects a labelled vector mapping state => color:
mycolors <- setNames(c("steelblue", "tomato"), c('A', 'B'))

# works:
plotSimmap(simmap.phytools, colors=mycolors)
add.simmap.legend(colors=mycolors, prompt=FALSE, x=0, y=-0.5, vertical=FALSE)

# broken as labels in mycolors don't match the trait states
plotSimmap(simmap.corhmm[[1]], colors=mycolors)
add.simmap.legend(colors=mycolors, prompt=FALSE, x=0, y=-0.5, vertical=FALSE)

# to fix: 
mycolors <- setNames(c("steelblue", "tomato"), c(1, 2))
plotSimmap(simmap.corhmm[[1]], colors=mycolors)
add.simmap.legend(colors=mycolors, prompt=FALSE, x=0, y=-0.5, vertical=FALSE) 
# but then it's not clear how 1 & 2 map to our original trait states.

jboyko commented 2 years ago

Hey Simon,

Thanks for the heads up and example. I can definitely make that change to the output.

The main reason for naming them as an integer (rather than the original state names) is to cut down on matching functions internally. I.e., rather than constantly matching the input name to its position in the Q matrix, I can just use the ID integer value for any indexing. But, the output could (and should) be changed to match the input.

I'll get to this within a week or so.

Best, James

jboyko commented 2 years ago

You'll need to install this newest github version, but the output names should now match the input names.

> model <- MK$solution
> simmap.corhmm <- makeSimmap(tree=phy, data=data, model=model, rate.cat=1, nSim=1, nCores=1)
> # Note column labels are '1', '2' not the "A", "B" found as our trait states.
> head(simmap.corhmm[[1]]$mapped.edge)  
             A        B
61,62 5.709144 4.151568
62,63 5.672618 4.851147
63,64 3.420088 3.826214
64,65 0.000000 6.678406
65,66 0.000000 0.000000
66,3  8.431907 7.258906

Best, James

SimonGreenhill commented 2 years ago

Awesome, thanks @jboyko!

thej022214 / corHMM

makeSimmap could return state labels rather than state integers #44