richelbilderbeek / razzo_article

Article by Giovanni Laudanno, Richel J.C. Bilderbeek and Rampal S. Etienne
GNU General Public License v3.0
1 stars 0 forks source link

Mention: bigger trees may increase error #33

Open richelbilderbeek opened 4 years ago

richelbilderbeek commented 4 years ago

From this article:

@article{revell2005under,
  title={Under-parameterized model of sequence evolution leads to bias in the estimation of diversification rates from molecular phylogenies},
  author={Revell, Liam J and Harmon, Luke J and Glor, Richard E},
  journal={Systematic Biology},
  volume={54},
  number={6},
  pages={973--983},
  year={2005},
  publisher={Taylor \& Francis}
}

mention:

It may also be useful to consider the length of the tree, the number of taxa, and the number of characters used in the analysis when considering the potential for bias in the γ-statistic as a consequence of model inadequacy. Trees of very short length were not particularly susceptible to type I error in the γ-test, nor were trees containing few taxa. For long trees, in which branches contain many superimposed substitutions, the consequences on γ due to model misspecification are much more severe. For such trees, adding more taxa actually increases the power of the γ-statistic to detect spurious results such that even mild apparent deviations from constant-rate speciation, whether real or an artifact of model underparameterization, are statistically significant. Thus, our results suggest that trees of long length, trees containing many taxa, or trees featuring both of these properties are particularly susceptible to bias in the estimation and hypothesis testing of diversification rates using the γ-statistic.

richelbilderbeek commented 4 years ago

Figure from SupMat 3, in which the difference between lambda and 2r increases:

screen02

The effect of the number of taxa on estimates of net diversification rate and speciation rate.

The x-axis (125, 75, or 50) corresponds to the number of taxa, and the y-axis corresponds to the value of the parameter estimated (λ or r). Since posterior distributions were generated under a birth-death process with the relative extinction rate equal to 0.5, estimates of λ should approach 2r. While the difference from the expectation improves with the number of taxa, the discrepancy can be attributed to a combination of sample size and model misspecification (i.e., estimating λ assuming a Yule process when the generating model was Birth-Death).

(I do think the plots could have been more expressive)