thej022214 / corHMM

Fits a generalized form of the covarion model that allows different transition rate classes on different portions of a phylogeny by treating rate classes as “hidden” states in a Markov process.
11 stars 13 forks source link

`print.corhmm` returns error with missing or polymorphic data #46

Closed rubysaltbush closed 2 years ago

rubysaltbush commented 2 years ago

Hello! Thank you for this excellent package.

I got an unexpected error message when trying to print the results of my corHMM analysis on data with two traits, which contains both missing data in both columns and polymorphic data in one column:

Error in names(UserStates) <- sort(unique(x$data.legend[, 2])) : 
  'names' attribute [6] must be the same length as the vector [4]

I used the primates data in corHMM to make a reproducible example:

# reproducible example of problem with missing/polymorphic data and corHMM function
require(ape)
require(corHMM)

# read in example data from corHMM package
data(primates)

# introduce missing data into primates example data
primates$trait[1,3] <- "?"

# run corHMM as per example in vignette
phy <- primates[[1]]
phy <- multi2di(phy)
data <- primates[[2]]

MK_3state <- corHMM(phy = phy, data = data, rate.cat = 1)
MK_3state

This first model with missing data in only one column produces the following output with strange "3&4" (rather than expected "3") in legend:

> MK_3state <- corHMM(phy = phy, data = data, rate.cat = 1)
State distribution in data:
States: 1   2   3&4 4   
Counts: 29  10  1   20  
Beginning thorough optimization search -- performing 0 random restarts 
Finished. Inferring ancestral states using marginal reconstruction. 
> MK_3state

Fit
      -lnL      AIC     AICc Rate.cat ntax
 -41.78658 99.57317 102.3967        1   60

Legend
    1     2   3&4     4 
"0_0" "0_1" "1_0" "1_1" 

Rates
            (1,R1)     (2,R1)      (3,R1)       (4,R1)
(1,R1)          NA 0.01136352 0.007450090           NA
(2,R1)  0.05789417         NA          NA 7.103047e-03
(3,R1) 18.32398117         NA          NA 1.000000e+02
(4,R1)          NA 0.01664352 0.000000001           NA

Arrived at a reliable solution 

But when you add missing data to the second column:

# missing data in both columns
primates$trait[10,2] <- "?"

# run corHMM as per example in vignette
phy2 <- primates[[1]]
phy2 <- multi2di(phy2)
data2 <- primates[[2]]

MK_3state2 <- corHMM(phy = phy2, data = data2, rate.cat = 1)
MK_3state2
# gives error only when missing data in both columns

printing the results gives the error message from above:

MK_3state2 <- corHMM(phy = phy2, data = data2, rate.cat = 1)
State distribution in data:
States: 1   1&3 2   3&4 4   
Counts: 28  1   10  1   20  
Beginning thorough optimization search -- performing 0 random restarts 
Finished. Inferring ancestral states using marginal reconstruction. 
> MK_3state2

Fit
      -lnL      AIC     AICc Rate.cat ntax
 -41.78655 99.57309 102.3966        1   60

Error in names(UserStates) <- sort(unique(x$data.legend[, 2])) : 
  'names' attribute [5] must be the same length as the vector [4]

I think this comes back to the commit https://github.com/thej022214/corHMM/commit/eb1bc81197ca27e7fa1ab5748fdbde5e80e1aa64 When I redefine the print.corhmm function using the code from before this commit it prints the legend as expected. Thanks again!

jboyko commented 2 years ago

Hi Ruby,

Thanks for pointing this out! Definitely an error on our part, but it should be fixed now! The previous commit had issues handling non-collapsed models (say you had states 1, 2, and 4 without 3) which is why we had to change it. But, this latest github version should be robust to both the issues you pointed out and the issues of the previous version!

Thanks again! James

> MK_3state

Fit
      -lnL      AIC     AICc Rate.cat ntax
 -41.78666 99.57332 102.3969        1   60

Legend
    1     2     3     4 
"0_0" "0_1" "1_0" "1_1" 

Rates
            (1,R1)     (2,R1)      (3,R1)       (4,R1)
(1,R1)          NA 0.01138831 0.006242852           NA
(2,R1) 0.057729151         NA          NA 7.311052e-03
(3,R1) 0.000000001         NA          NA 1.000000e+02
(4,R1)          NA 0.01662574 0.000000001           NA

Arrived at a reliable solution 
> MK_3state2

Fit
     -lnL      AIC     AICc Rate.cat ntax
 -41.7866 99.57319 102.3967        1   60

Legend
    1     2     3     4 
"0_0" "0_1" "1_0" "1_1" 

Rates
             (1,R1)     (2,R1)       (3,R1)       (4,R1)
(1,R1)           NA 0.01138400 6.250377e-03           NA
(2,R1) 5.774256e-02         NA           NA 7.294442e-03
(3,R1) 1.016997e-09         NA           NA 1.000000e+02
(4,R1)           NA 0.01662457 1.000545e-09           NA

Arrived at a reliable solution 
rubysaltbush commented 2 years ago

Hi James,

Thanks for looking at this so swiftly! That does fix it (I can't get the github version to work but have pasted the redefined print.corhmm function into my code and it works as expected).

Thanks again! Ruby