ssolo / ALE

Amalgamated likelihood estimation (ALE) is a probabilistic approach to exhaustively explore all reconciled gene trees that can be amalgamated as a combination of clades observed in a sample of gene trees. We implement the ALE approach in the context of a reconciliation model (cf. http://arxiv.org/abs/1211.4606 ), which allows for the duplication, transfer and loss of genes. We use ALE to efficiently approximate the sum of the joint likelihood over amalgamations and to find the reconciled gene tree that maximizes the joint likelihood among all such trees.
44 stars 15 forks source link

Error when sampling reconciled gene trees #16

Closed GRealesM closed 10 months ago

GRealesM commented 6 years ago

Hi all,

When running ALEml I got an error message that I don't know how to interpret.

Reconciliation model initialised, starting DTL rate optimisation..
Optimizing... - 26
ML rates:  delta=1.15597; tau=0.00781893; lambda=1e-06.
LL=-707.287
Sampling reconciled gene trees..

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
*terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 18446744073709551615) >= this->size() (which is 5)
Aborted (core dumped)

I used an ALE object created with ALEobserve, which contained 25871 trees.

Any clue what could be causing the error? Thanks a lot!

flass commented 5 years ago

Hi, I had a similar error when using a Gene tree chain which species set was not fully covered by the provided Species tree. Maybe that's your issue too? You could check labels and how they are to be split in the G trees see if they match the ones in the S tree. Best, Florent For the record, the error with the tree that had an incorrect species set was slightly different, as follow:

Reconciliation model initialised, starting DTL rate optimisation..
^MOptimizing... / 1^MOptimizing... - 4^MOptimizing... - 6^MOptimizing... / 9^MOptimizing... \ 11
ML rates:  delta=0.766667; tau=0.166667; lambda=1e-06.
LL=-708.311
Sampling reconciled gene trees..

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
*terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check

hence omitting the specific error message.

My build of ALE is made with source from commit 23c4a6a896593ef1dc9bd67ace08616e3ad6e0e8.

flass commented 5 years ago

Hi again, actually, I just reproduced more accurately the bug originally reported by @GRealesM. I had it occurring with a few gene families, each with a G tree chain that has the right set of species with respect to the S tree, so it does not seem to be the cause I suggested above. Interestingly, I get the same detail of error, which seems triggered by the check of an object length that should be only 5-long but is apparently much longer (and always the same length):

Reconciliation model initialised, starting DTL rate optimisation..
^MOptimizing... / 1^MOptimizing... - 4^MOptimizing... - 6^MOptimizing... / 9^MOptimizing... \ 11^MOptimizing... - 14^MOptimizing... / 17^MOptimizing... \ 19^MOptimizing... - 22^MOptimizing... / 25^MOptimizing... \ 27^MOptimizing... - 36^MOptimizing... / 45^MOptimizing... - 54^MOptimizing... \ 63^MOptimizing... - 72^MOptimizing... / 81^MOptimizing... - 90
ML rates:  delta=1.72222; tau=1e-06; lambda=4.73827.
LL=-707.073
Sampling reconciled gene trees..

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
*terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 18446744073709551615) >= this->size() (which is 5)

The only thing that I found in common to my trees that create this problem is that they have some (many, sometimes all) zero-length branches.

flass commented 5 years ago

Hi again,

still having the same issue. I tried to generate a minimum working example. I can replicate the problem with a sample of just one G tree. This time I was using the Docker version of ALE (image boussau/alesuite:latest). I ran the following analyses:

1) collapsed G tree sample (8000 trees, removed first 2000 as burnin) with collapsed S tree (see Pantagruel approach):

docker run -v $PWD:$PWD -w $PWD boussau/alesuite ALEobserve  $PWD/BRADYC021242-collapsed-Gtrees.nwk burnin=2000
docker run -v $PWD:$PWD -w $PWD boussau/alesuite ALEml $PWD/BRADYC021242-collapsed-Stree.nwk  $PWD/BRADYC021242-collapsed-Gtrees.nwk.ale 1000_ &> BRADYC021242-collapsed-Gtrees.nwk.ale.log

2) then I reproduced the same, taking only one tree (the last) from the original G tree sample (no burnin of course):

tail -n 1 BRADYC021242-collapsed-Gtrees.nwk > BRADYC021242-collapsed-Gtree_-1.nwk
docker run -v $PWD:$PWD -w $PWD boussau/alesuite ALEobserve  $PWD/BRADYC021242-collapsed-Gtree_-1.nwk
docker run -v $PWD:$PWD -w $PWD boussau/alesuite ALEml $PWD/BRADYC021242-collapsed-Stree.nwk $PWD/BRADYC021242-collapsed-Gtree_-1.nwk.ale 1000_ &> BRADYC021242-collapsed-Gtree_-1.nwk.ale.log

3) then to see if it was not an artefact introduced by the collapsing of the Species tree and the Gene trees as done in Pantagruel, I made another run with the full species tree (based on a RAxML tree, made ultrametric with LSD) and the ML tree for this family (the original output of RaxML for the full gene family) as G (one single G tree, so no burnin) - so no chance that my tree manipulations has caused it:

docker run -v $PWD:$PWD -w $PWD boussau/alesuite ALEobserve  $PWD/RAxML_rootedTree.BRADYC021242.codes_noBS
docker run -v $PWD:$PWD -w $PWD boussau/alesuite ALEml $PWD/core-genome-based_reference_tree_Brady2019.full.lsd.nwk $PWD/RAxML_rootedTree.BRADYC021242.codes_noBS.ale 1000_ &> RAxML_rootedTree.BRADYC021242.codes_noBS.ale.log_withfullStree

In all cases I got the same error as above; the only thing that possibly changes is the number in (which is xxx). Here is the log for the last analysis:

ALEml using ALE v0.4
Read species tree from: /rhizho-pangerec/Brady2019/test/core-genome-based_reference_tree_Brady2019.full.lsd.nwk..
Read summary of tree sample for 1 trees from: /rhizho-pangerec/Brady2019/test/RAxML_rootedTree.BRADYC021242.codes_noBS.ale..
# the S tree (...)

Reconciliation model initialised, starting DTL rate optimisation..
Optimizing... / 21
ML rates:  delta=1.16914; tau=0.144446; lambda=0.0271612.
LL=-707.841
Sampling reconciled gene trees..

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
*terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 18446744073709551615) >= this->size() (which is 1637)

I can send over an archive of the all files for reproducing the bug.

It would be great if you could help as this is seriously preventing me from applying ALE to large scale datasets, as I have a significant fraction of gene families that fail.

Best wishes, Florent

Gullumluvl commented 4 years ago

I get this error as well, with the docker and with the compiled (gcc-5) version. I ran gdb to locate the problem and I found that it occurs at line 825 of sample_scaled.cpp.

The traceback:

(gdb) bt
#0  0x00007ffff6f78428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff6f7a02a in __GI_abort () at abort.c:89
#2  0x00007ffff7ad78f7 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff7adda46 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff7adda81 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff7addcb4 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff7ad97f5 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00000000005dcf30 in std::vector<step, std::allocator<step> >::_M_range_check (
    this=0x7fffffff88f0, __n=18446744073709551615) at /usr/include/c++/5/bits/stl_vector.h:803
#8  0x00000000005dc855 in std::vector<step, std::allocator<step> >::at (this=0x7fffffff88f0,
    __n=18446744073709551615) at /usr/include/c++/5/bits/stl_vector.h:824
#9  0x00000000005d729d in exODT_model::sample (this=0xa14010, S_node=false, g_id=-1, t_i=0, rank=0,
    e=3, branch_length=0, branch_events="", transfer_token="", max_rec=false)
    at /users/ldog/glouvel/install/ALE/src/sample_scaled.cpp:825
#10 0x00000000005db4c8 in exODT_model::sample (this=0xa14010, S_node=false, g_id=-1, t_i=1, rank=0,
    e=3, branch_length=0, branch_events="", transfer_token="", max_rec=false)
    at /users/ldog/glouvel/install/ALE/src/sample_scaled.cpp:1213
#11 0x00000000005db4c8 in exODT_model::sample (this=0xa14010, S_node=false, g_id=-1, t_i=2, rank=0,
    e=3, branch_length=0, branch_events="", transfer_token="", max_rec=false)
    at /users/ldog/glouvel/install/ALE/src/sample_scaled.cpp:1213
#12 0x00000000005d37b0 in exODT_model::sample[abi:cxx11](bool) (this=0xa14010, max_rec=false)
    at /users/ldog/glouvel/install/ALE/src/sample_scaled.cpp:98
#13 0x000000000056cec6 in main (argc=5, argv=0x7fffffffdf88)
    at /users/ldog/glouvel/install/ALE/src/ALEml.cpp:266
(gdb) l
820                 break;
821             }
822         }
823       if (max_rec)
824         step_i=max_i;
825       step back_step=sample_steps.at(step_i);
826       sample_steps.clear();
827       sample_ps.clear();
828       /*
829         if (back_step.e==-1)

At this point it appears that step_i has value -1 which causes the index error. Seems like there's something unexpected with the indexing of time slices and the recursion.

We'll need the ALE experts on this :)

Best,

Gullumluvl commented 4 years ago

My bad... I had a non binary species tree, and it seems that using a binary tree with non null branch lengths fixes this problem.

ssolo commented 10 months ago

ALERax (finally) solves this issue due to numerical underflow. ALE will not have a solution in the foreseeable future.