zholos / qml

math library for kdb+
http://althenia.net/qml
Other
65 stars 33 forks source link

Cholesky Decomposition Fails for Larger Matrixes #7

Closed pindash closed 7 years ago

pindash commented 7 years ago

A simple test case: a:500 500#250000?1.0 a:.qml.mm[a;flip a] .qml.mchol[a]

In particular I get a matrix full of 0n for larger matrix

A simple solution I came up with borrows from J software's implementation of Cholesky. found here: http://code.jsoftware.com/wiki/Essays/Cholesky_Decomposition

Here is the code I used which recursively calculates the cholesky for larger matrices using smaller matrices The Code assumes upper triangular Cholesky: I added this function to .qml .qml.mchollarge:{[m] $[100>l:count m; .qml.mchol[m]; [a:.5*l; A:m[b;b:til l-floor[a]]; B:m[c:ceiling[a]+til l-ceiling[a];b]; C:m[c;c]; L0:.qml.mchollarge A; L1:.qml.mchollarge C-.qml.mm[T: .qml.mm[B;.qml.minv A];flip B]; Z:(count first L3;count L3:flip .qml.mm[T;L0])#0f; (l;l)#raze (L0,'L3),(Z,'L1)]]};

pindash commented 7 years ago

Hey Sorry, after trying to track down the error. It might be simply that my virtual machine that I'm using is screwing up the IEEE 754 floating point standard.

pindash commented 7 years ago

The issue has to do with unlucky picking of Floating point numbers. Apparently, there is a difference between how Linux (rhel specifically) and windows interact with Lapack. The solution that works well is to simply multiply the matrix by some scaling value before applying the cholesky this shifts the floating point representation and will usually cause the matrix decompose nicely. here is the code that closes the issue: .qml.pmchol:{[m] $[0<sum over null r:.qml.mchol[m];.qml.pmchol[p*m]%sqrt[p:first 2+1?10];r]};

zholos commented 7 years ago

For me the test case works both on Linux (Debian) and Windows. If you're linking to system LAPACK on RHEL (i.e. not using ./configure --build-blas), then perhaps the difference is due to different LAPACK versions.

pindash commented 7 years ago

I was building blas. It's randomly happening, I would need to give you the test matrix I'm using which is a bit impractical as it's 500 by 500.

If I can find a smaller piece of it that fails I will post it.

On Jun 9, 2017 6:22 PM, "Andrey Zholos" notifications@github.com wrote:

For me the test case works both on Linux (Debian) and Windows. If you're linking to system LAPACK on RHEL (i.e. not using ./configure --build-blas), then perhaps the difference is due to different LAPACK versions.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/zholos/qml/issues/7#issuecomment-307513580, or mute the thread https://github.com/notifications/unsubscribe-auth/ACDWYfvvPNUjBQjHAAhkYoHSaNbjbaNPks5sCcW_gaJpZM4N0gNP .

zholos commented 7 years ago

You can attach the kdb file with the matrix (i.e. `:a set a) here if you like.

I think it's probably a numerical problem, but it could also be an issue with how LAPACK is compiled on different machines (e.g. different compiler flags), so I'd like to investigate.

pindash commented 7 years ago

Pretty sure it is a numerical problem, but it behaves differently on RHEL and Windows. SpecCovMat.zip

zholos commented 7 years ago

Thanks! .qml.mchol fails on this matrix for me too (on Debian). I think the matrix is just too big for double precision:

q).qml.mdet 203#'203#a
0w

I also tried using the recursive Cholesky function in LAPACK (dpotrf2 instead of dpotrf), which I think is the same as your recursive q code at the top, but that also fails.

pindash commented 7 years ago

The way I solved it for my matrix is I added the following: .qml.pmchol:{[m;p] $[0<sum over null r:.qml.mchol[p*m];.qml.pmchol[m;first 1+1?100];r%sqrt[p]]}; For some reason multiplying by a small number will shift the matrix to be cholesky compatible. You can check if it works for you in debian. it's obviously an ugly solution.

Thanks again,

zholos commented 7 years ago

Found the problem. This submatrix is not positive-definite (if that's a covariance matrix, you have perfect correlation between two variables):

q)a[202 203;202 203]
78.62156 77.39634
77.39634 76.1902 
q).qml.mdet a[202 203;202 203]
0f

Thus, the full matrix isn't positive-definite either, and .qml.mchol returns null as expected.

Fiddling around with a scaling factor will make it positive-definite numerically, kind of like:

q).qml.mchol (1 2;2 4)

q).qml.mchol (1 2;2 4)*1.000001
1 2.000001    
0 2.980232e-08

Dropping either of those variables works: .qml.mchol (a _202)_'202

I guess .qml.mchol does work on matrices that large after all.

pindash commented 7 years ago

I did that as well, but I don't think it actually has an eigen value of zero. I think it is just very close to zero: https://www.wolframalpha.com/input/?i=upper+triangular+matrix+%7B%7B78.62156,+77.39634%7D,+%7B77.39634,+76.1902%7D%7D&lk=2

I guess how you treat matrixes that are close to being semi positive definite instead of positive definite is a question.