rsetienne 2018-07-05: Bayes factor JC69 versus GTR

richelbilderbeek commented 6 years ago

On 5 July 2018 at 17:19:54 +02:00, Rampal S. Etienne [...] wrote:

Je gaat de gesimuleerde data analyseren met BEAST2 met een JC of een GTR model. Om te kijken welk model een betere fit geeft, heeft tracer een optie, de Bayes Factor. Deze wordt berekend uit de twee sets van loglikelihoods die BEAST2 uitspuugt (1 voor de GTR setting en 1 voor de JC setting). Ik denk dat het belangrijk is dat je deze analyse aan je raket-paper toevoegt, om te weten of de BEAST2 analyse JC inderdaad als beste model uitkiest of niet.

richelbilderbeek commented 6 years ago

The question is wether we should assess that BEAST2 indeed confirms that a JC69 site model is better (using the Bayes Factor) than a GTR site model, as the DNA alignment is simulated with a JC69 site model.

I think this should not be done, and I will try to convey why.

The research question is 'What is the inference error made when using a BD species model on a phylogeny following a PBD speciation process'. Note that there is no mention of a nucleotide substitution model. A (simplest) nucleotide substitution model is only a mean of this research, not a mean that should be investigated.

Additionally, I find it hard to imagine why to confirm something that is completely expected. Why conform that a JC69 prior fits a JC69-simulated alignment better than a GTR prior using the Bayes factor? I can not come up with a reason why I should doubt this hypothesis.

I thus conclude I should not investigate that (1) is not at the core of the research question, (2) has no reason to be doubted.

richelbilderbeek commented 6 years ago

Sent email.

richelbilderbeek commented 6 years ago

I will close this Issue, unless there is reason to open it up again.

richelbilderbeek commented 6 years ago

Reply from rsetienne:

Mijn argument is de volgende:

Je wil bekijken wat voor fout gemaakt wordt als je in BEAST2 BD gebruikt als species tree prior terwijl de onderliggende species tree zich volgens PBD gedraagt. Deze fout wordt echter niet alleen door de species tree prior bepaald, maar ook door andere submodellen zoals het substitutiemodel. Mogelijk kan het toevoegen van meer flexibilieit door GTR te gebruiken (wat veel gebruikers doen!) de fout reduceren. Stel nu dat je vindt dat GTR inderdaad deze fout reduceert, dan zou je kunnen denken: er is geen probleem als het onderliggende model PBD is ipv van BD, want met GTR wordt dit wel (op enigszins miraculeuze wijze) opgevangen. Maar dan zou je eigenlijk moeten checken of GTR wel het juiste model is. Daarvoor is de Bayes Factor. Die zal de uitkomsten van JC en GTR met elkaar vergelijken en zeggen wat het beste model is. Als nu GTR nog steeds aangewezen wordt, is je conclusie dat er voor de gebruiker eigenlijk geen probleem is, maar als JC wordt aangewezen, dan heeft de gebruiker wel een probleem, want deze check dient de gebruiker wel uit te voeren.

richelbilderbeek commented 6 years ago

Code from @rsetienne to demo the Bayes factor:

load('d:/data/ms/microsnail_BEAST2_results_log_files.RData')
listname <- microsnail_BEAST2_results_log_files
lengthlist <- length(microsnail_BEAST2_results_log_files)
namei <- rep(0,lengthlist)
meanll <- rep(0,lengthlist)
harmmeanll <- rep(0,lengthlist)
for(i in 1:lengthlist)
{
   ll <- as.numeric(listname[[i]]$likelihood)
   minll <- min(ll)
   harmmeanll[i] <- log(length(ll)) + minll - sum(exp(-(ll - minll)))
   meanll[i] <- mean(ll)
   namei[i] <- names(listname)[i]
   namei[i] <- substr(namei[i],1,nchar(namei[i])-4)
}
BF <- rep(0,lengthlist)
for(i in 1:(lengthlist/2))
{
   BF[i] <- harmmeanll[i] - harmmeanll[lengthlist/2 + i]
}
for(i in (lengthlist/2 + 1):lengthlist)
{
   BF[i] <- harmmeanll[i] - harmmeanll[-lengthlist/2 + i]
}
summarytable <- data.frame(namei,meanll,harmmeanll,BF)

richelbilderbeek commented 6 years ago

Reopened, as rsetienne must agree.

richelbilderbeek commented 6 years ago

Re: raket: Bayes Factor test JC69 versus GTR

3 September 2018 14:37 10 KB From: Richel Bilderbeek To: Rampal S. Etienne

Jij stelde voor met een Bayes factor te checken of een GTR nucleotide substitutie model beter is dan JC69 [1]. Ik stelde voor dat niet te doen, omdat (1) dit niet de kern is van het onderzoek, (2) er geen reden is tot twijfel (en vind maar eens artikelen die deze twijfel ondersteunen), (3) het is het niet waard om het experiment hiervoor uit te stellen.

Ben je het ermee eens dat het beter is als ik dit niet doe?

[1] https://github.com/richelbilderbeek/raket_article/issues/10 (is private repository, dus eerst inloggen op GitHub)

richelbilderbeek commented 6 years ago

Some literature on this:

http://www.beast2.org/2018/07/24/TreeModelAdequacy.html
Sullivan J., Joyce P. 2005. Model selection in phylogenetics. Annu. Rev. Ecol. Evol. Syst. 36:445–466. (closed, so won't read)
Sebastian Höhna, Lyndon M Coghill, Genevieve G Mount, Robert C Thomson, Jeremy M Brown; P3: Phylogenetic Posterior Prediction in RevBayes, Molecular Biology and Evolution, Volume 35, Issue 4, 1 April 2018, Pages 1028–1034, https://doi.org/10.1093/molbev/msx286

richelbilderbeek commented 6 years ago

Added code of rsetienne to a raket vignette.

richelbilderbeek commented 6 years ago

Code was incomplete, discarded it.

richelbilderbeek commented 6 years ago

Finished vignette and put one outcome here:

20180906_bayes_factor_314.pdf.zip

Conclusion: whatever I do, the Bayes factor is always in range 0.99-1.01. Assuming a different site has no effect.

richelbilderbeek commented 6 years ago

Q: should it be in the article nevertheless? A: I think yes. This is how it will look like:

20180906_model_comparison

richelbilderbeek commented 5 years ago

I close this Issue, as I will use the weight of the speciation models.

richelbilderbeek / raket_article

rsetienne 2018-07-05: Bayes factor JC69 versus GTR #10