xflouris / libpll

Phylogenetic Likelihood Library
GNU Affero General Public License v3.0
26 stars 6 forks source link

negative numbers in pmatrix #125

Closed xflouris closed 7 years ago

xflouris commented 7 years ago

The parameter optimization routine may evaluate non-sensible parameters that cause the pmatrix to be close to the identity matrix but, due to numerical underflow, cause some cells to have small negative values.

Problem:

After computing the expd[1..states] array (exponential of rate*branch*eigenvalue[state]), we may end up with expd = {1,1,...,1} where each value is 1 +- a very small number. Hence we will be multiplying the eigenvectors matrix with the inverse eigenvectors matrix, with the expectation of obtaining the identity matrix. However, since each column of the inverse eigenvectors matrix was multiplied with the respective element from expd, the floating point representation changes and the resulting pmatrix may contain small negative numbers.

Proposed fix:

If for all elements x of expd[1..states] holds that 1 - T < x < 1 + T, where T is a threshold value of, for example, 1e-6, then set the identity matrix as the pmatrix.

xflouris commented 7 years ago

Added the check. I've set the threshold T to 1e-15 which seems to be the smallest value of T that can still work for the datasets I tried. For T=1e-16 we still get negative numbers. I've also noticed that we obtain the log-likelihood retains its value (upto the 8th decimal point) even for larger thresholds, upto the value of T=1e-12. For larger values (e.g T=1e-11) I observe a difference of several integer units in the log-L.

@amkozlov : would be good if you could double-check the implementation.

amkozlov commented 7 years ago

@xflouris: I just added a test for negative values (#129), and unfortunately, it shows that current fix doesn't work in all situations.

In particular, I still get negative values with substitution rates 0.001000 1000.000000 0.001000 1000.000000 0.001000 1.000000. Although this could be fixed with increasing T to e.g. 10e-12, it will still fail with other settings (e.g., on AA data I got negative values even with T=10e-8).

The only way I found to get rid of all negative values in all my test cases was to use 'brute force' a posteriori check for negative values after pmatrix was computed, and then to set pmatrix to identity if at least one negative value was detected. Of course, this "solution" could also lead to logL differences, and could potentially conceal bugs if we get negative values in pmatrix for some other reason.

stamatak commented 7 years ago

I seem to remember that there was some similar test in Mrbayes, maybe worth looking at the source code there?

Alexis

On 25.05.2017 03:10, Alexey Kozlov wrote:

@xflouris https://github.com/xflouris: I just added a test for negative values (#129 https://github.com/xflouris/libpll/issues/129), and unfortunately, it shows that current fix doesn't work in all situations.

In particular, I still get negative values with substitution rates |0.001000 1000.000000 0.001000 1000.000000 0.001000 1.000000|. Although this could be fixed with increasing |T| to e.g. |10e-12|, it will still fail with other settings (e.g., on AA data I got negative values even with |T=10e-8|).

The only way I found to get rid of all negative values in all my test cases was to use 'brute force' a posteriori check for negative values /after/ pmatrix was computed, and then to set pmatrix to identity if at least one negative value was detected. Of course, this "solution" could also lead to logL differences, and could potentially conceal bugs if we get negative values in pmatrix for some other reason.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/xflouris/libpll/issues/125#issuecomment-303894834, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1w-nNZM2toO3R6c02tmEkGIYUbt_RMks5r9NT7gaJpZM4MVgeo.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University of Arizona at Tucson

www.exelixis-lab.org

amkozlov commented 7 years ago

@xflouris @stamatak: I think we should implement an extremely simple and elegant solution suggested by Ben here: https://github.com/xflouris/libpll/issues/129#issuecomment-304004005

I tested it with the scalar pmatrix kernel, and it fixed all the negs.

The only small caveat is that expm1 is not part of ANSI C (although it is available in GNU/gcc). But this could be easily solved by including the back-up implementation in libpll. e.g. https://github.com/SurajGupta/r-source/blob/master/src/nmath/expm1.c

amkozlov commented 7 years ago

I committed the above fix for all pmatrix kernels (449871928377e96ad27c44b5633489472e6ca518), seems to work fine. Just in case, I left the old threshold-checking code in the comments, we can remove it later.

@xflouris: could you please double-check the implementation, and then hopefully we can close this nasty issue. I'm still impressed how simple the solution was, we should use math magic more often :)

xflouris commented 7 years ago

checked, seems to work fine