kopperud / Pesto.jl

Phylogenetic Estimation of Shifts in the Tempo of Origination
MIT License
4 stars 0 forks source link

Error when estimating eta #27

Open YushaLiu opened 1 year ago

YushaLiu commented 1 year ago

Thanks for developing this great package!

I'm trying to analyze some phylogenetic data using Pesto (attached below -- it is not from evolutionary data but it is also present in the tree form), following the tutorial (https://kopperud.github.io/Pesto.jl/dev/analysis/extended/). I get the following error when estimating eta, the rate of shift:

η = optimize_eta(λ, µ, pdac)
ERROR: KeyError: key 10247 not found
Stacktrace:
  [1] getindex(h::Dict{Int64, Int64}, key::Int64)
    @ Base ./dict.jl:484
  [2] partition_postorder_indices(data::SSEdata)
    @ Pesto ~/.julia/packages/Pesto/yDVQr/src/utils.jl:217
  [3] postorder_nosave(model::SSEconstant, data::SSEdata, ....

Do you have any idea why this error occurs and how to fix it? Thanks very much for your help! M1_1_v3.txt

kopperud commented 1 year ago

Hi Yusha,

How did you install the package? It might be that you need to install the development version like so

import Pkg
Pkg.add(url="https://github.com/kopperud/Pesto.jl")

We haven't finished the manuscript yet, and we're still ironing out some of the bugs. I think it should run if you install the development version.

Otherwise, we are assuming the following for the tree:

I quickly plotted your tree and saw that these conditions are not met -- so the model results will not make sense, sorry. I'll try to add some checks and print errors if this is not the case, thank you for the example. If you somehow can resolve the polytomies, and transform the units of your branches to time, and make the tree ultrametric, then you can re-run the model.

Sorry if it's not the response you hoped for, but thanks for your interest. Bjørn

YushaLiu commented 1 year ago

Thanks very much! I'm able to run the model after I make the tree ultrametric and bifurcating. Regarding the condition that the branch lengths are units of time, does it matter if the branch lengths in my tree are different in scale (magnitude) from those in typical evolution data? Will the validity of your method be affected, or it shouldn't matter as long as the branch lengths are positive values? And another minor question -- is zero branch length allowed?

kopperud commented 1 year ago

In the examples that we have tested, we have been using trees with branch lengths in units of million years, something between a few myr and up to 400-500 myr. In these cases the rates (speciation, extinction, shift rate) are on the order of 0.001-1.0, in units of events per lineage per myr, something like that. If you use a different scale, like years instead of million years, then the branch lengths will be longer, but the rates smaller. Mathematically speaking the method should be invariant to the choice of time units, but there can potentially be some problems with numerical stability if you choose numbers that are very extreme, either very large or very small. Also we set some initial values, and upper/lower limits when estimating the rates, i.e. these steps

λml, μml = estimate_constant_bdp(primates)

and

η = optimize_eta(λ, µ, primates)

Those are not in the vignette but the limits and initial values can be controlled with keyword arguments, you can see from the source https://github.com/kopperud/Pesto.jl/blob/3ffd98f70e148a53d594958e4a11e593f217050b/src/bd_shift/optimize.jl#L26-L39 https://github.com/kopperud/Pesto.jl/blob/3ffd98f70e148a53d594958e4a11e593f217050b/src/bd_constant/constantBDP.jl#L110-L112

However if your branch lengths are not in units of time, and instead something like number of nucleotide substitutions, then I'm not sure the model makes sense at all

kopperud commented 1 year ago

And I'm not sure if zero-length branch lengths work. Theoretically it should but I'm not sure if we coded it correctly in that edge-case