liamrevell / phytools

GNU General Public License v3.0
198 stars 56 forks source link

midpoint.root stealing/moving edge labels #107

Closed joshrest closed 12 months ago

joshrest commented 2 years ago

For all the trees we have looked at, midpoint.root fails to duplicate the nodel.label (in our case, a support value from IQ-TREE) on the rooted edge. It instead appears to pull a node.label to one of the newly created edges from a neighboring edge; this theft repeats all the way up the tree until there is a missing node.label somewhere near a tip. This behavior is obviously problematic and likely a bug or error. Looking back retrospectively, we think that this issue has occurred on different installations for some time. Here is a repeatable example:

library(phytools)
unrooted <- read.newick("sample_unrooted.tre")

sample_unrooted.tre.zip

note: edge numbers in parentheses:

The midpoint of the tree will be along an edge with labels (support) of "98.1/100" (274). This edge has two children: one with edge label "87.8/89" (275) and one with edge label "leaf_113" (113).

Rooting this with the phangorn function, and everything works as expected:

library(phangorn)
rootedPhang <- midpoint(unrooted) 

note: edge numbers in parentheses not comparable to unrooted edge numbers:

As expected, the edge with label "98.1/100" (274) has been duplicated by the addition of the root node (new edges 274 and 275) number. The children of one edge continue to be "87.8/89" (276) and "leaf_113" (113). The children of the second edge are "79.4/83" (273) and "81.2/92" (367).

Compare this with the behavior of midpoint.root:

rooted <- midpoint.root(unrooted)

note: edge numbers in parentheses not comparable to unrooted edge numbers:

The children of 98.1/100 (383) remain exactly the same as in the unrooted tree (384,238). The newly created edge (240) is assigned a 'stolen' edge value "79.4/83" (240) with children "81.5/80" (241) and "81.2/92" (368).
This theft continues its way up the tree until the branch leading to clade (leaf_1,(leaf_237,leaf_238)), which is simply missing an edge label; it's edge label (89.8/100) has been stolen and moved up to its parent clade.

Here is the session info, before loading phangorn

sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] phytools_1.0-1 maps_3.4.0     ape_5.6-1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8              codetools_0.2-18        quadprog_1.5-8
 [4] lattice_0.20-45         MASS_7.3-55             grid_4.1.2
 [7] nlme_3.1-155            magrittr_2.0.2          coda_0.19-4
[10] scatterplot3d_0.3-41    phangorn_2.8.1          combinat_0.0-8
[13] Matrix_1.4-0            fastmatch_1.1-3         igraph_1.2.11
[16] plotrix_3.8-2           numDeriv_2016.8-1.1     parallel_4.1.2
[19] compiler_4.1.2          pkgconfig_2.0.3         mnormt_2.0.2
[22] tmvnsim_1.0-2           clusterGeneration_1.3.7 expm_0.999-6
liamrevell commented 12 months ago

This is a known issue with re-rooting in R. I think it is addressed in phangorn::midpoint now, so I'd recommend using that.