tgvaughan / MultiTypeTree

BEAST 2 package which provides support for multi-type trees: phylogenetic trees on structured populations.
http://tgvaughan.github.io/MultiTypeTree
GNU General Public License v3.0
21 stars 16 forks source link

How to set the clock.rate, popsize and migrate rate? #14

Closed ghost closed 5 years ago

ghost commented 5 years ago

@tgvaughan How to set the clock.rate, popsize and migrate rate? I saw you used lognormal for these three parameters, and is that okay?

tgvaughan commented 5 years ago

Clock rate can be set using the standard approach in BEAUti: i.e. turn off estimate in the Clock Model panel. The population size and migration rates can be fixed by unchecking the estimate checkboxes that appear next to these parameters in the Priors panel.

Regarding priors, in general it's not okay to just apply the parameters from the tutorial to your analysis. Instead, you need to put some thought into what you actually know or are willing to assume about these parameters in your situation. E.g. their may be some estimates for the mean clock rate for your locus in the literature already, in which case you might want to use those to inform your analyisis. Even in the case of "no information" you'll probably have some order-of-magnitude information for your kind of organism. For instance, an RNA virus genome probably has a substitution rate of somewhere around 10^-3/site/year, in which case a lognormal prior with an HPD from 10^-4 to 10^-2 might be appropriate.

ghost commented 5 years ago

@tgvaughan Could you Please give me more detailed practical on how to set the MultiTypeTree, thanks.

tgvaughan commented 5 years ago

Hi @lixingguang, there are practical tutorials available at https://tgvaughan.github.io/MultiTypeTree and https://taming-the-beast.org/tutorials. The actual model itself is described in the papers referenced in those tutorials, and books such as Inferring Phylogenies (Felsenstein) and "Gene Genealogies, Variation and Evolution" (Hein, Schierup, Wiuf). For a friendly beginner's guide to Bayesian inference in general, I'd recommend "Information Theory, Inference and Learning Algorithms" (David MacKay, also free online at http://www.inference.org.uk/itila/book.html).

ghost commented 5 years ago

Dear @tgvaughan, Could you please help me generate the xml for MultiTypeTree analysis using my data set, I am very grateful for your help.

ghost commented 5 years ago

@tgvaughan Thank you very much for your suggested tutorials and books. I have readed the two tutorials, but the three books are very difficult for me :). I run my data set using MultiTypeTree package, but I am not sure is it setting correct for the priors.

tgvaughan commented 5 years ago

This is why I recommended the book on Bayesian inference. Priors encode your state of knowledge in the absence of the data you're analyzing. I can't help you set those up really, as I don't know anything about your data. However, an informative prior on the clock rate is usually possible (see above) and setting some upper/lower bounds on migration rate and population size parameters is often also possible, so I'd start there.

ghost commented 5 years ago

@tgvaughan thanks a lots for your kindly reply :). The book on Bayesian inference is very hard for me, my major is biology, BTW, could you please tell me which model is better for setting the clock rate, migration rate and population size paramters, unform, lognormal, or exponenrial?

ghost commented 5 years ago

@tgvaughan,Is there a method that can translate the mean and 95%HPD to get the log(mean) and log(stdev) for lognormal prior, thanks. :)

tgvaughan commented 5 years ago

Hi @lixingguang unfortunately it's impossible to give reliable answers to these questions in general, as the answers depend almost exclusively on the information you have available about your study system. Even then it's difficult, because the model we're talking about here, the structured coalescent, is very different to any real biological system. But leaving that aside, what you need to do is (a) try your best to understand the model you're fitting, and (b) do your best to select distributions for the parameters based on how you think they apply in your case. (E.g. assuming effective population size has something to do with real population size can be used to set upper bounds on the Ne parameters as these are usually going to be less than the actual sizes. Keep in mind here though that what BEAST calls effective population size is actually Ne*g where g is the time between successive generations.)

To find parameters for the lognormal prior given that you have a 95% HPD in mind, you can just try different values in BEAUti - it automatically computes and displays the corresponding HPD interval under the graph on the right side of the window.