Run PBD with maximum likelihood estimates

Lumphie commented 7 years ago

I'm just gonna leave some estimated parameters here everytime I get new ones.

Lumphie commented 7 years ago

I'm not sure what it all means but this is the output of "calc_max_likelihood.R"

1000 generations

library(ape)
library(PBD)
sink(file = tempfile())
ml <- PBD::pbd_ML(
  brts = ape::branching.times(
    read.tree(text = "(:12,:12):988;")),
  initparsopt = c(0.2,0.01,0.3),
  exteq = 1 # mu_1 == mu_2
)
sink()
write.csv(ml, "calc_max_likelihood.txt")

Paramaters for sado for this run:

histbin 0.1 0.1 0.1 0.1
seed 123
pop0 1000
type0 0 0 0
end 1000
sc 0.6
se 0.5
sk 1.2
sm 0.2
sv 0.03
sq 1.0
eta 1.0
b 4.0
output 1 output.txt
erasure_method swap
initialization_bug 0
gausser_implementation lut
at 0.05
c .00175

Lumphie commented 7 years ago

Okay I see now. Still had to run this with R. Just did: Can you run PBD with these parameters @richelbilderbeek ?

         : 1
b        : 1.4909044289496e-10
mu_1     : 1.54235009286403
lambda_1 : 8.14698982119855e-06
mu_2     : 1.54235009286403 
loglik   : 0 
df       : 3
conv     : 0

"","b","mu_1","lambda_1","mu_2","loglik","df","conv"
"1",1.4909044289496e-10,1.54235009286403,8.14698982119855e-06,1.54235009286403,0,3,0

Lumphie commented 7 years ago

Trying to run this R script:

library(ape)
library(PBD)
sink(file = tempfile())
ml <- PBD::pbd_ML(
  brts = ape::branching.times(
    read.tree(text = "(:15,:15):185;")),
  initparsopt = c(0.2,0.01,0.3),
  exteq = 1 # mu_1 == mu_2
)
sink()
write.csv(ml, "calc_max_likelihood_10_2.txt")

Getting the following error:

Error in if (expdurspec < 1e-07) { : 
  missing value where TRUE/FALSE needed
Calls: <Anonymous> -> sprintf -> pbd_durspec_quantile
Execution halted

richelbilderbeek commented 7 years ago

Okay I see now. Still had to run this with R. Just did: Can you run PBD with these parameters @richelbilderbeek ?

"","b","mu_1","lambda_1","mu_2","loglik","df","conv" "1",1.4909044289496e-10,1.54235009286403,8.14698982119855e-06,1.54235009286403,0,3,0

Running it now. To teach you how to fish, use the following R code:

# "","b","mu_1","lambda_1","mu_2","loglik","df","conv"
# "1",1.4909044289496e-10,1.54235009286403,8.14698982119855e-06,1.54235009286403,0,3,0
b <- 1.4909044289496e-10
mu_1 <- 1.54235009286403
lambda_1 <- 8.14698982119855e-06
mu_2 <- 1.54235009286403

b_1 <- b
b_2 <- b

pars <- c(b_1, lambda_1, b_2, mu_1, mu_2)
age <- 1000 # generations
PBD::pbd_sim(pars, age, plotit = TRUE)

richelbilderbeek commented 7 years ago

Looking at the parameters, and having run it for a long time, I think this can last a very long time:

The extinction rate is 1.5 per lineage per generation, the speciation rates at least 100K less. This makes a phylogeny with 2 species already rare.

richelbilderbeek commented 7 years ago

Note that the ML estimation already stated that the tree is very unlikely:

From the output:

"","b","mu_1","lambda_1","mu_2","loglik","df","conv"
"1",1.4909044289496e-10,1.54235009286403,8.14698982119855e-06,1.54235009286403,0,3,0

observe that loglik equals zero. I think this a feature of computation, as the loglik should be -Inf for unlikely trees. Additionally a loglik of zero is impossible: it means the the phylogeny is sure to be the true tree, which cannot be the case.

Lumphie commented 7 years ago

That's a huge problem, because most if not all my runs give this result. I will look for different options.

richelbilderbeek commented 7 years ago

What strikes me most, is the low estimation of lambda_1: your phylogeny would, I'd think, be more likely if speciation completion time would take longer...

Lumphie commented 7 years ago

Not on linux now. And don't have access to Rscript.

Could you run this one @richelbilderbeek ? This one looks more interesting:

library(ape)
library(PBD)
sink(file = tempfile())
ml <- PBD::pbd_ML(
  brts = ape::branching.times(
    read.tree(text = "((:1,:1):30,:31):869;")),
  initparsopt = c(0.2,0.01,0.3),
  exteq = 1 # mu_1 == mu_2
)
sink()
write.csv(ml, "calc_max_likelihood_10_9.txt")

Lumphie commented 7 years ago

A lot of the other runs I did don't have a Rscript ready. Think they took longer than the for loop you made to make sure it wouldn't run endlessly.

richelbilderbeek commented 7 years ago

Not on linux now. And don't have access to Rscript.

R is free and open-source and platform independent.

Could you run this one @richelbilderbeek ? This one looks more interesting:

Sure, I'll reply ASAP with the answer

A lot of the other runs I did don't have a Rscript ready. Think they took longer than the for loop you made to make sure it wouldn't run endlessly.

Maybe there are some loose ends with the output. We have all data in memory, but perhaps not saved to (different) files.

richelbilderbeek commented 7 years ago

Result:

         b     mu_1 lambda_1     mu_2    loglik df conv
1 0.902547 1.895804 1.454063 1.895804 -2.673832  3    0

2017-03-28-154502_800x588_scrot

Lumphie commented 7 years ago

Loglike -2.7 so more likely to make a tree in the PBD? Could you run it in the PBD? I'd love to learn it tomorrow so I can do it myself from then on and then won't have to bother you?

richelbilderbeek commented 7 years ago

Yes, a loglik is the 10-log likelihood, thus a loglik of -2 denotes a likelihood of 0.01, as 10^-0.2 = 0.01

Generally, likelihoods are very small, thus the log likelihood is used. A loglik of -2.67 is AFAIK impressively high!

Lumphie commented 7 years ago

Yay Then I probably have 1 result! At least I hope.

richelbilderbeek commented 7 years ago

Done!

richelbilderbeek / pbdmms

Run PBD with maximum likelihood estimates #270