revbayes / revbayes

Bayesian Phylogenetic Inference Using Graphical Models and an Interactive Model-Specification Language
http://revbayes.com
GNU General Public License v3.0
56 stars 25 forks source link

[Feature Request] Let `rootFrequencies` vary among rate categories in `dnPhyloCTMC` #185

Closed mtnouchi closed 2 years ago

mtnouchi commented 2 years ago

Describe the analysis you'd like to do. In the current version of RevBayes, only Simplex is allowed as the type of the rootFrequencies argument to dnPhyloCTMC. In many cases, the rootFrequencies are the same across categories since heterogeneity within sites is often formulated as a multiplier to adjust the scale of the matrix.

However, I’d like to carry out the ancestral state reconstruction using a mixture model with a free rate transition matrix. In this case, the root frequencies of categories are expected to be different from each other. Could you add support for Simplex[] to rootFrequencies?

Do you have a sample dataset? I think the data addressed in the mixture model tutorial (https://revbayes.github.io/tutorials/morph_tree/V2.html) can be used for testing.

Describe alternatives you’ve considered I’m afraid I have no idea.

Additional context Thank you.

jsigao commented 2 years ago

Hello mtnouchi,

I'm not sure if I fully understand your request, but if you would like to assume that the root frequency associated with each Q matrix is its stationary frequency, then you could simply leave the rootFrequencies unspecified (as how the dnPhyloCTMC is constructed in the mixture model tutorial); (at least my understanding is that) by default revbayes would use the stationary frequency (which would be directly computed from the Q matrix for the FreeRate model) as the root frequency for each Q matrix.

Alternatively, I think it is sensible to assume that there is one root frequency (i.e., the probability of the root in each state unconditional on the data or the Q matrix) and estimate it during the inference (i.e., specifing it as a stochastic variable), or specifying it as a constant if there is very strong biological prior for it.

Best, Jiansi

mtnouchi commented 2 years ago

Hello Jiansi,

Thanks for your advice. I am mostly convinced by your suggestion that the rootFrequencies argument be ignored and the root frequencies be calculated eigenvectors of the Q matrices.

But here remains some question. I also worked through a tutorial on ancestral state estimation (https://revbayes.github.io/tutorials/morph/morph_more.html), where the root frequency is explicitly specified. Why is this so? I think this is because the morphological data is not assumed to be in its stationary state nor compositionally homogeneous. Should I also leave the root frequency unspecified even when applying mixture models to such data?

Regards, mt

milliescient commented 2 years ago

If you would like to model a mixture of root frequencies across sites, you can use the siteMatrices argument. If you provide this argument with a vector of Q matrices, the likelihood will be computed assuming a finite mixture of these Q matrices across sites. If in turn you leave rootFrequencies unspecified, then the root frequencies from each matrix will be used for each mixture component, effectively giving you a mixture of root frequencies. So, for example, you could construct a vector of matrices using the same exchangeability rates for each matrix, but different stationary frequencies. Providing this vector to siteMatrices will result in a mixture of root frequencies across sites, with homogeneous exchangeabilities.

As for your question about when to estimate root frequencies separately from those implied by the transition matrix, yes you would do this if you suspect your data exhibits compositional heterogeneity in time, which should of course be rigorously assessed using model comparison techniques.