mwpennell / geiger-v2

A suite of methods and models for studying evolutionary radiations
22 stars 17 forks source link

rescale.phylo returns NaN under "delta" model for some trees #21

Closed dwbapst closed 4 years ago

dwbapst commented 8 years ago

Hello all,

This only crops up with a small number of simulated trees I've encountered (maybe 5% of the time) so I'll include a reproducible example.

library(geiger)

tree<-"(((((((t18:1.463679077,t12:0.1729684733):0.1556820862,(((t24:1.845880473,t4:0.3034377797):1.650824079,t20:0.4856926412):0.6032191589,t30:0.4777796259):0.2100940662):0.05625117488,t5:1.296163165):0.5768372864,t9:0.659113761):1.781101809,((t22:0.1188352299,(t25:0.736267727,t14:3.616042752):0.8135201037):0.105873215,(t21:0.198981436,((t8:0.2432500795,t28:0.0247130706):0.684336435,t27:0.4200404161):0.8257523197):1.184559852):0.4631210882):3.936306288,((t1:1.781101809,t6:0.5768372864):3.936306288,(t7:0.1556820862,t11:1.463679077):0.05625117488):3.765426552):3.765426552,(t3:0.2100940662,(((t2:0.3034377797,t13:0.4856926412):1.845880473,(t17:1.296163165,(t26:0.4631210882,t15:0.105873215):0.659113761):0.4777796259):1.650824079,(((t16:3.616042752,(t19:0.198981436,t10:0.8257523197):1.184559852):0.736267727,t23:0.684336435):0.8135201037,t29:0.2432500795):0.1188352299):0.6032191589):0.1729684733);"

tree<-read.tree(text=tree)
delta<-0.5 # delta value doesn't actually matter
treeDelta<-rescale(tree,"delta",0.5)    

What's the problem you might ask? Look at the edge lengths.

> treeDelta$edge.length
 [1]        NaN 3.17042529 1.15547898 0.35048350 0.03363380 0.09260120
 [7] 0.83842487 0.10206409 0.12480027 0.35143321 0.91496868 0.95451899
[13] 0.16150088 0.27597020 0.27916735 0.75257833 0.39964675 0.31228861
[19] 0.07013792 0.07819226 0.52460540 0.45487244 2.08798767 0.76061844
[25] 0.12293341 0.50201096 0.40078955 0.13940497 0.01423367 0.24754161
[31] 3.05283603 2.43871389 0.96281821 0.31952592 0.03885464 0.10679030
[37] 0.96466208 1.57963064 0.77112191 1.76659662 2.57086049 1.93405803
[43] 0.27399210 0.43420656 0.55626904 1.31141818 0.69690710 0.45165461
[49] 0.10572436 0.24703684 1.37134463 0.97413647 3.41184374 1.29707438
[55] 0.19570647 0.78099644 0.91072418 0.45897703

Whoops! Where's that come from? I did a little tracking of the NaN value and it is produced early in the .delta.rescale code (https://github.com/mwpennell/geiger-v2/blob/master/R/utilities-phylo.R#L1595-L1601). Let's take a closer look...

> phy<-tree
> ht=geiger:::heights.phylo(phy)
> N=Ntip(phy)
> Tmax=ht$start[N+1]
> mm=match(1:nrow(ht), phy$edge[,2])
> ht$t=Tmax-ht$end
> ht$e=ht$start-ht$end
> ht$a=ht$t-ht$e
> ht$a
 [1]  1.027161e+01  1.027161e+01  1.258006e+01  1.258006e+01  1.092924e+01
 [6]  1.032602e+01  1.005967e+01  9.482835e+00  8.270727e+00  9.084247e+00
[11]  9.084247e+00  9.349414e+00  1.085950e+01  1.085950e+01  1.017517e+01
[16]  1.146716e+01  1.146716e+01  7.587104e+00  7.587104e+00  1.729685e-01
[21]  4.272892e+00  4.272892e+00  2.904791e+00  3.563905e+00  3.563905e+00
[26]  2.444811e+00  3.629371e+00  3.629371e+00  1.708543e+00  8.950229e-01
[31]  0.000000e+00 -1.776357e-15  3.765427e+00  7.701733e+00  9.482835e+00
[36]  1.005967e+01  1.011592e+01  1.011592e+01  1.032602e+01  1.092924e+01
[41]  7.701733e+00  8.164854e+00  8.270727e+00  8.164854e+00  9.349414e+00
[46]  1.017517e+01  3.765427e+00  7.530853e+00  7.530853e+00  0.000000e+00
[51]  1.729685e-01  7.761876e-01  2.427012e+00  2.427012e+00  2.904791e+00
[56]  7.761876e-01  8.950229e-01  1.708543e+00  2.444811e+00

See that negative -1.776357e-15 at ht$a[32]? There's the problem. Essentially its a floating point error: sometimes subtracting ht$t-ht$e produces a small negative value very close to 0 (that isn't zero), and when you take that to a power...

> bl=(ht$a+ht$e)^delta-ht$a^delta
> bl
 [1] 0.220745611 0.026872057 0.251311579 0.042520936 0.072659116 0.073500882
 [7] 0.198143411 0.105221433 0.020586935 0.119761588 0.549738122 0.032366658
[13] 0.036703391 0.003747528 0.065174265 0.253496648 0.084126732 0.028116400
[19] 0.253982114 0.203025677 0.072138310 0.114320550 0.345278174 0.118914381
[25] 0.027835754 0.898291021 0.051526793 0.205625505 0.239781014 0.120842270
[31] 0.000000000         NaN 0.834728900 0.304221551 0.092277433 0.008855312
[37] 0.024380607 0.032858177 0.092527477 0.240898533 0.082221249 0.018466340
[43] 0.138121309 0.200260259 0.132172506 0.105522315 0.803769283 0.642079460
[49] 0.010229886 0.415894786 0.465120328 0.676871823 0.509210589 0.146457903
[55] 0.183485951 0.065041365 0.361055974 0.256476589 0.341501650

You get NaNs.

One kludge to fix this would be to ask for the abs(), as apparently R doesn't mind taking the exponential of very tiny positive values so much.

> ht$a=abs(ht$t-ht$e)
> bl=(ht$a+ht$e)^delta-ht$a^delta
> bl
 [1] 0.220745611 0.026872057 0.251311579 0.042520936 0.072659116 0.073500882
 [7] 0.198143411 0.105221433 0.020586935 0.119761588 0.549738122 0.032366658
[13] 0.036703391 0.003747528 0.065174265 0.253496648 0.084126732 0.028116400
[19] 0.253982114 0.203025677 0.072138310 0.114320550 0.345278174 0.118914381
[25] 0.027835754 0.898291021 0.051526793 0.205625505 0.239781014 0.120842270
[31] 0.000000000 1.940470662 0.834728900 0.304221551 0.092277433 0.008855312
[37] 0.024380607 0.032858177 0.092527477 0.240898533 0.082221249 0.018466340
[43] 0.138121309 0.200260259 0.132172506 0.105522315 0.803769283 0.642079460
[49] 0.010229886 0.415894786 0.465120328 0.676871823 0.509210589 0.146457903
[55] 0.183485951 0.065041365 0.361055974 0.256476589 0.341501650

But that's a crazy kludge. There's probably better ways to deal with this floating value issue, but I'm not a good enough programmer to know them.

I haven't investigated very extensively, but its possible this issue affects other rescale.phylo models too, if they take the power of a different between ht$t-ht$e.

I hope this is helpful.

lukejharmon commented 4 years ago

Fixed.