xflouris / libpll

Phylogenetic Likelihood Library
GNU Affero General Public License v3.0
26 stars 6 forks source link

Add tests that could generate negative values in the pmatrix. #129

Closed xflouris closed 7 years ago

xflouris commented 7 years ago

If the exponentials E of the products of eigenvalues[i] * branch_length * rate for every state i are very close to 1, then the pmatrix is the matrix product of the eigenvectors matrix multiplied with the inverse eigenvectors matrix, which is equiv. to the identity matrix. However, for E very close to 1, this can generate negative numbers in the pmatrix.

Add tests that cause such problems, to verify that our fix 9a4e48e17c1916b4b6535c9309aa5d64216ffc83.

See also issue #125 .

amkozlov commented 7 years ago

i added a comprehensive test (23f80cca787288cb5f18fc9d4648757ebaea9a02) which covers different

in all combinations.

NOTE: Currently, this test will fail since solution from #125 doesn't seem to work in all cases. Please see my comment in #125 for details.

bredelings commented 7 years ago

Hi,

 Is the problem in #125 that you get inaccurate values of exp(Q*t) 

when t is very small? I just ran into this problem, and was able to find a solution by computing [exp(Q*t) - I] instead. If its of interest, I can explain my solution.

-BenRI

On 05/24/2017 08:50 PM, Alexey Kozlov wrote:

i added a comprehensive test (23f80cc https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_xflouris_libpll_commit_23f80cca787288cb5f18fc9d4648757ebaea9a02&d=DwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=yDQgdhRIVlGWlo1dehDX4FDHdLhgFHI_XNBo6vRcIuw&m=3lr1_nXe1UBdFuGNzHTki7xkqFD5S_kyR_hscHect1U&s=hrx5J21K0Zpa__U9iwYhvMDeI5fp4-Wk9sHf69MgR3c&e=) which covers different

  • datatypes (DNA, AA, 5-state model)
  • stationary frequencies (equal, slightly skewed, extremely skewed)
  • substitution matrices (equal, slightly skewed, extremely skewed)
  • category rates (low, medium, high)
  • branch lengths (short, medium, long)

in all combinations.

NOTE: Currently, this test will fail since solution from #125 https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_xflouris_libpll_issues_125&d=DwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=yDQgdhRIVlGWlo1dehDX4FDHdLhgFHI_XNBo6vRcIuw&m=3lr1_nXe1UBdFuGNzHTki7xkqFD5S_kyR_hscHect1U&s=ilOpDji6jb1Qz38NKslfqm3esuJC7r2azG56VpI83Zs&e= doesn't seem to work in all cases. Please see my comment in #125 https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_xflouris_libpll_issues_125&d=DwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=yDQgdhRIVlGWlo1dehDX4FDHdLhgFHI_XNBo6vRcIuw&m=3lr1_nXe1UBdFuGNzHTki7xkqFD5S_kyR_hscHect1U&s=ilOpDji6jb1Qz38NKslfqm3esuJC7r2azG56VpI83Zs&e= for details.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_xflouris_libpll_issues_129-23issuecomment-2D303892243&d=DwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=yDQgdhRIVlGWlo1dehDX4FDHdLhgFHI_XNBo6vRcIuw&m=3lr1_nXe1UBdFuGNzHTki7xkqFD5S_kyR_hscHect1U&s=sVexmDV_0al2L4Y7i07y78Bz7jFAFuEWoDLcCbUzsK4&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAIj7PXscmNRXEn4aIIDbqbP1SthHI-2Dxks5r9NBDgaJpZM4MiJWj&d=DwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=yDQgdhRIVlGWlo1dehDX4FDHdLhgFHI_XNBo6vRcIuw&m=3lr1_nXe1UBdFuGNzHTki7xkqFD5S_kyR_hscHect1U&s=e4x5ZsD1CV7SH_IPSxqVTmT0-oGcCOxc4aWiR6-hXKk&e=.

amkozlov commented 7 years ago

Hi Ben,

yes, that's the problem we have, could you please explain your solution?

bredelings commented 7 years ago

Hi Alexey,

The C function expm1(double x) exists to calculate exp(x)-1 when x is small. For these cases exp(x)-1 should be approximately x. If we compute exp(x) - 1.0 we will get 0 because of roundoff error, but expm1(x) will be about x.

The general idea of my solution is to compute a matrix version of expm1(Qt) = exp(Qt) - I. Then we can add I to it and get exp(Qt) = I + expm1(Qt). For small values of t, expm1(Q*t) should be proportional to t, so that when t=0 we will recover the identity matrix exactly.

Surprisingly, when using an eigenvalue decomposition where the eigenvalues are d[i], we can just compute expm1(d[i]) instead of exp(d[i]) to get expm1(Q*t). The last bit of the attached PDF shows why this is allowed.

You can see an example implementation here: https://github.com/bredelings/BAli-Phy/blob/master/src/math/exponential.C

I hope that helps!

I have not checked how this affects the negative values directly, so I'd be interested to hear how it affects that, as well as the other problem cases for Q.

-BenRI

On 05/24/2017 09:11 PM, Alexey Kozlov wrote:

Hi Ben,

yes, that's the problem we have, could you please explain your solution?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_xflouris_libpll_issues_129-23issuecomment-2D303894957&d=DwMCaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=yDQgdhRIVlGWlo1dehDX4FDHdLhgFHI_XNBo6vRcIuw&m=D-sNfeajiHrSvqeHZj7pqxqMwJBW2DEqaUWUZnjSkDA&s=f0i3Zk7PyH6M586UYzxgAKkfjw3DKCaVvQ73JRaBLJg&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAIj7Gwv-5FhNP8EVUHKNaaAxvvFj8TvqBks5r9NUrgaJpZM4MiJWj&d=DwMCaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=yDQgdhRIVlGWlo1dehDX4FDHdLhgFHI_XNBo6vRcIuw&m=D-sNfeajiHrSvqeHZj7pqxqMwJBW2DEqaUWUZnjSkDA&s=2p3V3hufWU6l2eI6ZrVEf-FDJSAIArvDbC1IUxquoT0&e=.

amkozlov commented 7 years ago

Thanks a bunch, Ben - you saved my day! :) your solution seems to work perfectly on all our test cases, all negative values are gone.

Just for our reference: could you please attach the PDF you've mentioned above? I can't see it.

Thanks again!

bredelings commented 7 years ago

Yay! (Here's the PDF). expm1.pdf

amkozlov commented 7 years ago

great, thanks!

ziheng-yang commented 7 years ago

good idea. the spectral decomposition part is the same as section 2.6 of my book, but it seems that calculation of exp Qt - I helps with numerical stability. i suppose exp Dt - I is diag{ exp d_i t - 1}. the order of multiplications may affect computation. ziheng

At 16:54 25/05/2017 -0700, Benjamin Redelings wrote:

Yay! (Here's the PDF). https://github.com/xflouris/libpll/files/1030469/expm1.pdfexpm1.pdf

— You are receiving this because you are subscribed to this thread. Reply to this email directly, https://github.com/xflouris/libpll/issues/129#issuecomment-304154901view it on GitHub, or https://github.com/notifications/unsubscribe-auth/AMTSEEg7M-VU0bWeLfyrxBLeZn8IOS4iks5r9hSqgaJpZM4MiJWjmute the thread.