macroevolution / bammtools

BAMMtools is an R package for analysis of BAMM results
10 stars 6 forks source link

whales tree #45

Open drabosky opened 7 years ago

drabosky commented 7 years ago

Whales tree now fails ultrametric test in ape. Pretty sure this didn't happen previously. It's definitely a tiny numerical issue and easily fixed, but we should fix tree by adding trivial branch length to tips to ensure it passes ape ultrametric check.

josephwb commented 7 years ago

I wonder if something is up with ape. I am looking at the C++ code and am confused.

Anyway, whales passes our phyx ultrametricity test, as well as that of phylobase:

require(phylobase);
phy <- as(whales, "phylo4");
isUltrametric(phy);
[1] TRUE
drabosky commented 7 years ago

It seems like the auto tol settings are not doing what they used to. There's a numerical differential in root-to-tip distances of 4e-6, but now causes tree to fail checks. This is an issue because other packages (like diversitree) use is.ultrametric and now reject this (and other) datasets.

Tree first fails with: is.ultrametric(whales, tol = 1e-8) all other manual tol > 1e-8 passes the check.

drabosky commented 7 years ago

default tol = .Machine$double.eps^0.5 = 1.490116e-08 anything different about this?

josephwb commented 7 years ago

Here is the function from apev3.5 (note same tol):

is.ultrametric.OLD <- function(phy, tol = .Machine$double.eps^0.5)
{
    if (!inherits(phy, "phylo"))
        stop('object "phy" is not of class "phylo"')
    if (is.null(phy$edge.length))
        stop("the tree has no branch lengths")

    phy <- reorder(phy)
    n <- length(phy$tip.label)
    e1 <- phy$edge[, 1]
    e2 <- phy$edge[, 2]
    EL <- phy$edge.length

    ## xx: vecteur donnant la distance d'un noeud
    ##     ou d'un tip a partir de la racine
    xx <- numeric(n + phy$Nnode)

    ## the following must start at the root and follow the
    ## edges contiguously; so the tree must be either in cladewise
    ## order (or in pruningwise but the for loop must start from
    ## the bottom of the edge matrix)

    for (i in seq_len(length(e1)))
        xx[e2[i]] <- xx[e1[i]] + EL[i]

    isTRUE(all.equal.numeric(var(xx[1:n]), 0, tolerance = tol))
}

and whales passes:

is.ultrametric.OLD(whales);
[1] TRUE
josephwb commented 7 years ago

It seems the old function is available in apev4:

ape:::.is.ultrametric_ape(whales, tol=.Machine$double.eps^0.5, option=2, n=length(whales$tip.label));
[1] TRUE

The important arg is option. It factors into this switch:

    crit <- switch(option, {
        mn <- min(xx.tip)
        mx <- max(xx.tip)
        (mx - mn)/mx
    }, var(xx.tip))

Setting option to 2 checks whether the variance of the root-to-tip paths are < tol. If option is set to 1 it fails:

ape:::.is.ultrametric_ape(whales, tol=.Machine$double.eps^0.5, option=1, n=length(whales$tip.label));
[1] FALSE

So, for the time being you could wrap ape:::.is.ultrametric_ape to get the old behaviour while you figure things out.

jonchang commented 7 years ago

Right, but I think the issue is that other people will be reading in the whales tree and be surprised when their (non-bammtools) packages report that the tree isn't ultrametric.

I wrote a small routine to try to get the tip lengths to cooperate but it's incredibly slow and I'm not convinced that it's actually going to work.

library(ape)

phy <- read.tree("whaletree.tre")

compute_tip_lens <- function(phy) {
    e1 <- phy$edge[, 1]
    e2 <- phy$edge[, 2]
    EL <- phy$edge.length
    xx <- numeric(n + phy$Nnode)
    for (i in seq_len(length(e1))) xx[e2[i]] <- xx[e1[i]] + EL[i]
    xx.tip <- xx[1:n]
}

compute_ultrametric_stat <- function(xx.tip) {
    mn <- min(xx.tip)
    mx <- max(xx.tip)
    (mx - mn)/mx
}

is_ultrametric <- function(phy, tol = .Machine$double.eps^0.5) {
    xx.tip <- compute_tip_lens(phy)
    isTRUE(all.equal.numeric(compute_ultrametric_stat(xx.tip), 0, tolerance = tol))
}

ctr <- 0
while (!is_ultrametric(phy)) {
    tiplen <- compute_tip_lens(phy)
    idx <- which.min(tiplen)
    phy$edge.length[idx] <- phy$edge.length[idx] - .Machine$double.eps
    ctr <- ctr + 1
    if (ctr %% 1000 == 0) {
        cat(ctr, compute_ultrametric_stat(tiplen), fill = T)
    }
}
drabosky commented 7 years ago

Yeah, the incompatibilities in other packages will be a problem. I ran into this issue with final analyses in our FiSSE paper (non bammtools packages) and had to include a function check_and_fix_ultrametric to pre-process trees before using in diversitree etc: https://github.com/macroevolution/fisse/blob/master/run_fisse/traitDependent_functions.R#L3-L29

drabosky commented 7 years ago

I'm going to send a link to this thread to Emmanuel

emmanuelparadis commented 7 years ago

Hi guys,

Yes is.ultrametric() has changed with ape 4. This is explained in the help page (and also why the default has changed). Maybe you want to use is.ultrametric(phy, option = 2)?

dwbapst commented 7 years ago

Ah, the new ((max - min/max)) default is invariant to the linear scale of the branch lengths. Well, that makes sense, I guess.

...So, why are supposedly ultrametric trees failing under the new default?

drabosky commented 7 years ago

the default check seems to be overly sensitive relative to trees produced by BEAST, TreePL, etc at least w a few trees I've tried which I can imagine could cause problems for some users who will just use the defaults. Looks like Rich FitzJohn just made some changes to diversitree on Github on account of this but I haven't looked deeper to see if the package is now ape 4.0 compatible.

josephwb commented 7 years ago

Doh, looked like none of use thought to RTFM! :bowtie:

dwbapst commented 7 years ago

FYI, this may be relevant, if you haven't seen it already: Liam has just posted a new function for phytools that will force a tree to be ultrametric, so to pass is.ultrametric. Apparently a key issue is trees read from files with low numerical precision.

http://blog.phytools.org/2017/03/forceultrametric-method-for-ultrametric.html

emmanuelparadis commented 7 years ago

Thanks Dave! I've just posted a comment on Liam's blog.

FabianRoger commented 7 years ago

Hej,

I just wanted to let you know that the change seems to have broken my code, too. I for instance calculate phylogenetic diversity with the entropart package which contains a function Preprocess.Tree that in turn internal calls ape::as.hclust.phylo which calls is.ultrametric which now fails. I checked and the tree is ultrametric if I set the tol to tol=.Machine$double.eps^0.31 but not for lower tolerances. It works also with option = 2, but again it's a function that calls a function that calls is.ultrametric so i would have to redefine the function to make it work. I saw the suggestion from Liam, but the nnls option takes very long to calculate for my tree (16S tree with 8187 tips).

Update: for what it's worth, @drabosky function works in a reasonable time but it's not an ideal solution...

richfitz commented 7 years ago

the default check seems to be overly sensitive relative to trees produced by BEAST, TreePL, etc at least w a few trees I've tried which I can imagine could cause problems for some users who will just use the defaults. Looks like Rich FitzJohn just made some changes to diversitree on Github on account of this but I haven't looked deeper to see if the package is now ape 4.0 compatible.

Diversitree is in "appease CRAN only" maintenance mode and this my changes were pretty much to stop the function running on some tests (that aren't run on CRAN anyway). Thanks for everyone for finding the actual issue :grinning: