Is this state distribution in data correct? And an error message

thej022214 / corHMM

Fits a generalized form of the covarion model that allows different transition rate classes on different portions of a phylogeny by treating rate classes as “hidden” states in a Markov process.

11 stars 13 forks source link

Is this state distribution in data correct? And an error message #60

Open jamie-thompson opened 1 year ago

jamie-thompson commented 1 year ago

There is zero missing data or "?" in my dataset, but this is the state distribution data:

State distribution in data: States: 1 1&2 1&3 2 2&3 3 ?
Counts: 679 75 11 13 3 54 5

How could this have come about?

There is also this error:

Warning in corHMM(StandardTree, Standard_Strict_Dataset, rate.mat = Strict_ARD_NoDual$rate.mat, : corHMM may have failed to optimize correctly, consider checking inputs and running again. There were 20 warnings (use warnings() to see them)

jboyko commented 1 year ago

It may be due to the formatting of your dataset. Can you send me a sample of the data? Feel free to email me at jboyko [at] umich [dot] edu and I can take a closer look.

jamie-thompson commented 1 year ago

Thanks, I made an embarrassing mistake and forgot I'd edited some species names in a previous script... It all runs now, but occasionally I get a rate of 100 (avg across most models around 0.1-0.5). What could this be caused by?

Guilin123456 commented 3 months ago

 I also noticed some results that seem extremly high. Specifically, the analysis shows that the number of transitions between the  states is nearly zero, yet the transition rate is reported as 100. Moreover, even when one state includes only a single species, it still shows the highest transition rate.
 I’ve tried adjusting the ip, nstarts, and n.cores parameters, but the results remain similar. 
 Could you kindly help me identify what might be going wrong?  I can provide the codes, data and the results.

jboyko commented 2 months ago

Unfortunately, there is likely nothing going wrong. Rates hitting the upper bound of 100 can be quite common and do genuinely represent the MLE (well not exact MLE if 100 since it hit the upper bound). One suggestion I can make for interpretation of the rates is rather than looking at the raw number, look at the relative values of each rate. So if 0->1 is 0.1 and 1->0 is 100, then you can say q01 is 1000x faster than q10.