Closed xflouris closed 7 years ago
Added the check. I've set the threshold T to 1e-15 which seems to be the smallest value of T that can still work for the datasets I tried. For T=1e-16 we still get negative numbers. I've also noticed that we obtain the log-likelihood retains its value (upto the 8th decimal point) even for larger thresholds, upto the value of T=1e-12. For larger values (e.g T=1e-11) I observe a difference of several integer units in the log-L.
@amkozlov : would be good if you could double-check the implementation.
@xflouris: I just added a test for negative values (#129), and unfortunately, it shows that current fix doesn't work in all situations.
In particular, I still get negative values with substitution rates 0.001000 1000.000000 0.001000 1000.000000 0.001000 1.000000
. Although this could be fixed with increasing T
to e.g. 10e-12
, it will still fail with other settings (e.g., on AA data I got negative values even with T=10e-8
).
The only way I found to get rid of all negative values in all my test cases was to use 'brute force' a posteriori check for negative values after pmatrix was computed, and then to set pmatrix to identity if at least one negative value was detected. Of course, this "solution" could also lead to logL differences, and could potentially conceal bugs if we get negative values in pmatrix for some other reason.
I seem to remember that there was some similar test in Mrbayes, maybe worth looking at the source code there?
Alexis
On 25.05.2017 03:10, Alexey Kozlov wrote:
@xflouris https://github.com/xflouris: I just added a test for negative values (#129 https://github.com/xflouris/libpll/issues/129), and unfortunately, it shows that current fix doesn't work in all situations.
In particular, I still get negative values with substitution rates |0.001000 1000.000000 0.001000 1000.000000 0.001000 1.000000|. Although this could be fixed with increasing |T| to e.g. |10e-12|, it will still fail with other settings (e.g., on AA data I got negative values even with |T=10e-8|).
The only way I found to get rid of all negative values in all my test cases was to use 'brute force' a posteriori check for negative values /after/ pmatrix was computed, and then to set pmatrix to identity if at least one negative value was detected. Of course, this "solution" could also lead to logL differences, and could potentially conceal bugs if we get negative values in pmatrix for some other reason.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/xflouris/libpll/issues/125#issuecomment-303894834, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1w-nNZM2toO3R6c02tmEkGIYUbt_RMks5r9NT7gaJpZM4MVgeo.
-- Alexandros (Alexis) Stamatakis
Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University of Arizona at Tucson
www.exelixis-lab.org
@xflouris @stamatak: I think we should implement an extremely simple and elegant solution suggested by Ben here: https://github.com/xflouris/libpll/issues/129#issuecomment-304004005
I tested it with the scalar pmatrix kernel, and it fixed all the negs.
The only small caveat is that expm1
is not part of ANSI C (although it is available in GNU/gcc). But this could be easily solved by including the back-up implementation in libpll. e.g.
https://github.com/SurajGupta/r-source/blob/master/src/nmath/expm1.c
I committed the above fix for all pmatrix kernels (449871928377e96ad27c44b5633489472e6ca518), seems to work fine. Just in case, I left the old threshold-checking code in the comments, we can remove it later.
@xflouris: could you please double-check the implementation, and then hopefully we can close this nasty issue. I'm still impressed how simple the solution was, we should use math magic more often :)
checked, seems to work fine
The parameter optimization routine may evaluate non-sensible parameters that cause the pmatrix to be close to the identity matrix but, due to numerical underflow, cause some cells to have small negative values.
Problem:
After computing the
expd[1..states]
array (exponential ofrate*branch*eigenvalue[state]
), we may end up withexpd = {1,1,...,1}
where each value is 1 +- a very small number. Hence we will be multiplying the eigenvectors matrix with the inverse eigenvectors matrix, with the expectation of obtaining the identity matrix. However, since each column of the inverse eigenvectors matrix was multiplied with the respective element fromexpd
, the floating point representation changes and the resulting pmatrix may contain small negative numbers.Proposed fix:
If for all elements
x
ofexpd[1..states]
holds that1 - T < x < 1 + T
, whereT
is a threshold value of, for example,1e-6
, then set the identity matrix as the pmatrix.