Closed richelbilderbeek closed 7 years ago
I think that this behavior follows from regular mathematics. An nLTT statistic is calculated by dividing two integers. If both integers have a limited range, I would expect the results of their division being smooth.
@thijsjanzen : what do you think?
To quote myself:
I think that this behavior follows from regular mathematics. An nLTT statistic is calculated by dividing two integers. If both integers have a limited range, I would expect the results of their division being smooth.
Here I demonstrate my idea:
This code shows my explanation:
library(ggplot2)
n <- 25
xs <- seq(1, n)
df <- data.frame(
num = rep(xs, each = n),
den = rep(xs, times = n)
)
df <- subset(df, num <= den)
df$x = df$num / df$den
qplot(data=df, x = 1, y = x, geom = "point")
I'd conclude my idea is correct, but I'd love some unbiased opinion on that. @franciscorichter maybe?
If no comments, I will close this Issue in March.
HI Richel, I don´t completely understand your idea. I agree when you said ''I would expect the results of their division being smooth'', but I do not see how your example shows that, the data you plot there is not smooth, but this is because your example has integers only and not a lot of them, whereas the nLTT statistic work with real numbers (is that true?) and the number of simulations are of the order of 10^5. I am pretty sure that if you change n<-25 by n<-1000, then you will have a smooth line.
I try to explain the 'integer division' point below. An nLTT is constructed from integer divisions (the number of lineages divided by the max number of lineages).
OTOH, the RPubs workshop by franciscorichter is convincing. I will try now not to divide integers (as below, I just put it there for reference), but use simulated phylogenies instead, to see if some type coercion happens somewhere.
I am working on it :muscle:
I used n <- 25
just to show it visibly not smooth when plotted as a point chart.
I will try to do the same now with a histogram.
n <- 25
n <- 100
n <- 1000
library(ggplot2)
n <- 25
xs <- seq(1, n)
df <- data.frame(
num = rep(xs, each = n),
den = rep(xs, times = n)
)
df <- subset(df, num <= den)
df$x = df$num / df$den
qplot(df$x[df$x > 0.4 & df$x < 0.6], binwidth = 0.001)
Too bad the RPubs workbook is not an mcve, as it is not complete. I guess I'll have to fix it.
Here is the reprex:
devtools::install_github("franciscorichter/dmea")
library(dmea)
library(ggplot2)
library(nLTT)
pp <- proc.time()
s <- sim_phyl(seed=2)
s2 <- phylo2p(drop.fossil(s$newick))
n_trees <- 5000
phylos <- vector('list',length=n_trees)
nLTT <- vector(length=n_trees)
gam <- vector(length=n_trees)
phylos2 <- vector('list',length=n_trees)
nLTT2 <- vector(length=n_trees)
gam2 <- vector(length=n_trees)
dt=0.01
for(i in 1:n_trees){
r <- rec_tree(wt=s2$t)
r1 <- p2phylo(r)
phylos[[i]] <- read.tree(text=r1)
nLTT[i] <- nLTTstat(s$newick, phylos[[i]], distance_method = "abs")
gam[i] <- gammaStat(phylos[[i]])
#
r <- rec_tree(wt=s2$t,rec_method=2)
r2 <- p2phylo(r)
phylos2[[i]] <- read.tree(text=r2)
nLTT2[i] <- nLTTstat(s$newick, phylos2[[i]], distance_method = "abs")
gam2[i] <- gammaStat(phylos2[[i]])
}
nltt_values <- get_nltt_values(phylos, dt = dt)
# Plot the phylognies, where the individual nLTT values are visible
qplot(t, nltt, data = nltt_values, geom = "point",
ylim = c(0,1),
main = "Average nLTT plot of phylogenies", size = I(0.1)
) + stat_summary(
fun.data = "mean_cl_boot", color = "red", geom = "smooth"
)
If you install the last version of dmea package then it should be enough to reproduce everything on the Rpubs doc, isn't it?
And this is the error I get:
Looks like #31 has just made another victim...
@franciscorichter : yes, reproducing the RBook was not that hard, thanks to your great work!
Oh, the trees I am simulating does have extinct species, that means that are not ultrametric and that means that the calculation of nLTTstat function does not make sense, is that true? ... I am not sure if that fact has to do with the gaps on the estimations though. In any case, do you know if is possible to calculate the nLTT statistic for trees considering extinct species?
Weird, the reprex did not work anymore :confused: , sent the Issue above.
OK, thanks to @franciscorichter for fixing his code, resulting into this:
devtools::install_github("franciscorichter/dmea")
devtools::install_github("richelbilderbeek/nLTT", ref = "develop")
library(dmea)
library(Hmisc)
library(ggplot2)
library(nLTT)
pp <- proc.time()
s <- sim_phyl(seed=2)
s2 <- phylo2p(drop.fossil(s$newick))
n_trees <- 5000
phylos <- vector('list',length=n_trees)
nLTT <- vector(length=n_trees)
gam <- vector(length=n_trees)
phylos2 <- vector('list',length=n_trees)
nLTT2 <- vector(length=n_trees)
gam2 <- vector(length=n_trees)
dt <- 0.01
for(i in 1:n_trees){
r <- rec_tree(wt=s2$wt)
r1 <- p2phylo(r)
phylos[[i]] <- r1
nLTT[i] <- ltt_stat(s$newick, phylos[[i]])
gam[i] <- gammaStat(phylos[[i]])
r <- rec_tree(wt=s2$wt,rec_method=2)
r2 <- p2phylo(r)
phylos2[[i]] <- r2
nLTT2[i] <- ltt_stat(s$newick, phylos2[[i]])
gam2[i] <- gammaStat(phylos2[[i]])
}
testit::assert(inherits(phylos[[1]], "phylo")) # Prerequisite for get_nltt_values
nltt_values <- get_nltt_values(phylos, dt = dt)
# Plot the phylognies, where the individual nLTT values are visible
qplot(t, nltt, data = nltt_values, geom = "point",
ylim = c(0,1),
main = "Average nLTT plot of phylogenies", size = I(0.1)
) + stat_summary(
fun.data = "mean_cl_boot", color = "red", geom = "smooth"
)
Sadly, it takes too long on my computer.
I will try to create a simpler mcve
Here it is. It creates n
trees and measures the nLTT statistic at dt
resolution:
library(ggplot2)
dt <- 0.001
n <- 1000
phylos <- list()
for (i in 1:n)
{
phylos[[i]] <- ape::rcoal(i + 1)
}
nltt_values <- nLTT::get_nltt_values(phylos, dt = dt)
# Plot the phylognies, where the individual nLTT values are visible
qplot(t, nltt, data = nltt_values, geom = "point",
ylim = c(0,1),
xlim = c(0.98,1.0),
main = "Average nLTT plot of phylogenies", size = I(0.1)
) + stat_summary(
fun.data = "mean_cl_boot", color = "red", geom = "smooth"
)
Zoom in on the most interesting part:
Note that I there appears to be no strange behavior of the nLTT statistic here.
@franciscorichter : may it be a feature of the dmea
model to result in the nLTT plots as shown in the peculariar plot at the top of this Issue?
when I use ltt_stat
I do not get this gaps anymore (with same phylos)...
by other hand I got this, do you know what is it?
nltt_values <- get_nltt_values(phylos, dt = dt)
Error in get0(oNam, envir = ns) :
lazy-load database '/home/p276048/R/x86_64-pc-linux-gnu-library/3.3/nLTT/R/nLTT.rdb' is corrupt
In addition: Warning messages:
1: In get0(oNam, envir = ns) : restarting interrupted promise evaluation
2: In get0(oNam, envir = ns) : internal error -3 in R_decompress1
Looks like an off-topic question to me. Try Google and read the first hit
As it appears that nLTT works fine, I will close this Issue.
From @franciscorichter :
From his work
And added annotation: