xavierdidelot / BactDating

Bayesian inference of ancestral dates on bacterial phylogenetic trees
https://xavierdidelot.github.io/BactDating
MIT License
80 stars 15 forks source link

mixedgamma vs mixedcarc model #39

Closed maesaar closed 3 years ago

maesaar commented 3 years ago

I have just read the article “Additive uncorrelated relaxed clock models for the dating of genomic epidemiology phylogenies”. I used mixedgamma model when analysing my dataset with BactDating. I got pstrict=0 and now I wonder if using mixedcarc model could give more percise results with narrower CIs?

As my previous runs took almost a week I wonder if it would be worth to re-run them using mixedcarc model?

Thank you for the answer in advance.

xavierdidelot commented 3 years ago

It's up to you of course, you could keep your results using mixedgamma which are still completely valid. But it might be worth trying a short run of mixedcarc, in fact there is even a chance that under this model the MCMC will be much faster to converge and mix.

maesaar commented 3 years ago

Thank you @xavierdidelot - I actually started the run using mixedcarc a day ago and it gave better root probability, better ESS and narrower CIs compared to the mixedgamma.

I will have to do more analysis but so far it seems that mixedcarc is superior to mixedgamma as it gives overall better results with my dataset.

xavierdidelot commented 3 years ago

Great, I'm really pleased to hear this! Bad mixing can often be caused by a bad model fit.

maesaar commented 3 years ago

Now almost all runs have completed and I can confirm that mixedcarc is definitely superior.

I have one more question when mixedcarc is selected as model then what happens? Am I correct that then is arc or carc automatically used? When I have pstrict 0 then which model is selected under mixedcarc?

Second what did you @xavierdidelot mean when you said bad mixing?

Thank you

maesaar commented 3 years ago

Explanation: I have selected mixedcarc model for reason that I was just replacing default mixedgamma model as I hoped it works better.

Should I test arc and carc models also and compre DIC to choose best model or is it enough when I use mixedcarc? I mean does using mixedcarc model takes into account if arc or carc model is best? I have not found any documentation of mixedcarc behaviour.

Thank you again @xavierdidelot

maesaar commented 3 years ago

Sorry for multiple posts! 1) What's the difference between other and mixed models? 2) How to choose between discrete and continuous models?

xavierdidelot commented 3 years ago

No worries, these are all good question and it is good to post the answers here since it might help others. The models with prefix "mixed" combine the strict and relaxed clock into a single model, so that one or the other gets used depending on the data. You can see which model was actually used via the res$pstrict argument. If this is equal to zero (as is very often the case) then the relaxed clock model was used exclusively and the run is exactly as if relaxed had been used instead of "mixed".

You could compare the dic for the mixedgamma and mixedcarc models since you have already run both. This is what we did in our latest paper, and based on what you wrote I would expect you will find that mixedcarc is a better fit.

There should not be much difference between discrete models and their continuous equivalent. Continuous models may be a bit more precise (because they do not round the length of branches from the undated phylogeny) whereas discrete models may be a bit more stable.

maesaar commented 3 years ago

Again thank you so much @xavierdidelot for thorough and prompt explanation. I checked and mixedcarc is definitely better (lower DIC) than mixedgamma.

xavierdidelot commented 3 years ago

Great, this is what I was expecting, this case you should probably mention the DIC comparison when you publish and present the results of mixedcarc.

maesaar commented 3 years ago

Little off topic, but I don't want to create new issue. The CIs are in res$CI but where are stored the point estimates? For example if CIs are (2014.002-2015.054) then where is estimation? Could not find it.

xavierdidelot commented 3 years ago

The point estimates are stored in the tree res$tree See also issues #11 , #29 and #35

maesaar commented 3 years ago

Thanks