thej022214 / corHMM

Fits a generalized form of the covarion model that allows different transition rate classes on different portions of a phylogeny by treating rate classes as “hidden” states in a Markov process.
11 stars 13 forks source link

Traits dataframe that has more than 20 traits causes an error #16

Closed shamsbhuiyan closed 4 years ago

shamsbhuiyan commented 4 years ago

Hi,

I installed the corHMM directly from this git repository. When I run the following function on my 234 vertebrate phylogeny and my 1,747 traits dataframe: states<- corHMM(vert.tree,vert.trait,rate.cat=2,node.states="joint")

I get the following error: Input data has more than a single column of trait information, converting...Error in rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) : invalid 'times' value

However, when I subset my traits dataframe to just 20 traits or less, the function runs without error. Any suggestions?

jboyko commented 4 years ago

One option is to convert your data frame into two columns where column one is the species names and column two is the unique trait (this is what corHMM is doing internally more or less).

Another option is to send me a sample data set so that I can recreate the error. I can get to work on finding the bug and it would be helpful to future uses of corHMM too. You can reach me at jboyko@uark.edu.

shamsbhuiyan commented 4 years ago

Hi, thanks for your response. I'm not quite sure if I understand option one. Would you be able to elaborate further?

In the meantime, I will follow up on the second option of emailing you a sample dataset!

jboyko commented 4 years ago

Say you have a data.frame that has a bunch of species and you determine the color (red, blue, or green), the presence or absence of limbs (0 or 1) and the presence or absence of fingers (0 or 1). This could be represented as a data.frame with 4 columns (an example row could be: sp1, red, 0, 0). Internally, corHMM takes a data.frame like this and make it a data.frame with 2 columns. Column 1 will be the species name (matching the phylogeny) and column 2 will be the unique combination of traits represented as a single value. For example we could represent c(red, 0, 0) as trait 1. c(red, 0, 1) could be trait 2 and so on until you have every unique combination that appears in the dataset. As long as all the c(red, 0, 0) in the dataset are represented as the same trait it should work well. The only issue with doing it this way (and the benefit of having corHMM handle this kind of data internally) is that you often want to avoid double transitions. I.e. in a single instant of time I don't want something to change from limbless and fingerless to having limbs and having fingers. This is discussed in Pagel (1994) if you're interested.

So, if something is messing up internally in corHMM one option is to input your data in that 2 column form described above and remove double transitions from a custom rate matrix. But, I'm more than happy to take a look at a sample dataset and try to get this fixed for future users as well.

Pagel (1994): Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters